- And it’s not fun work but it is rewarding work because then you can easily plot using
ggplot2 or whatever you use!
- In fact it’s not fun but data cleaning is also an act of judgement and analysis because it’s almost always a judgment call as to what an ‘outlier’ is and what outliners to throw out!
- I’m far from the first to observe this of course :-)
ggplot2 is great fun because you can easily create many many types of graphs with that 1 cleaned dataset with minimal lines of code.
- An example was the previous post on the Operating System graph for Firefox Desktop questions: e.g. what is ‘linux’? should I track every distro? You can see my arbitrary-but-hopefully-mostly-correct choices in the ruby code to generate the dataset:
when /^Windows 7/i
os = "Windows 7"
when /^Windows 10/i
os = "Windows 10"
when /^Windows 8/i
os = "Windows 8"
when /^Windows XP/i
os = "Windows XP"
when /^Mac OS/i, /^macos/i
os = "Mac OS"
when /^Linux/i, /^ubuntu/i, /^centos/i, /^arch/i, /^lfs/i, /^fedora/i
os = "Linux"
logger.debug "SETTING os:" + os + " to other"
os = "Other"
Leave a comment on github