Data cleaning takes up a lot of time during a data science process; it’s not necessarily a bad thing and time spent on cleaning data is worthwhile in most cases; To that end, I was researching some framework that might help me make this process a little bit faster. As a part of my research, I found the Journal of statistical software paper written by Hadley Wickham which had a really good framework to “tidy” data — which is part of data cleaning process.
Author does a great job of defining tidy data:
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.
And then applying it to 5 examples:
1. Column headers are values, not variable names.
2. Multiple variables are stored in one column.
3. Variables are stored in both rows and columns.
4. Multiple types of observational units are stored in the same table.
5. A single observational unit is stored in multiple tables
Question (on Quora) Is the R data science course from datacamp worth the money?
It depends on your learning style.
If you like watching videos then coursera/udacity might be better.
If you like reading then a book/e-book might be better.
If you like hands-on then something like Data Camp is a great choice. I think they have monthly plans so it’s much cheaper to try them out. When I subscribed to it, it was like 30$/Month or so. I found it was worth it. Also, if you want to see if “hands-on” is how you learn best. Try this: swirl: Learn R, in R. — it’s free! Also, Data Camp has a free course on R too so you could try that as well.