Get $100 off any Springboard course

Standard

It’s been close to a year that I have mentored students on the Springboard platform — I found that It’s a great way for students to accelerate their learning through mentorship & structured course material — And so If you considering data analytics or data science courses, I would recommend to check out Springboard as well!

And Here’s the link to get $100 discount code for any Springboard Course:

$100 OFF DATA SCIENCE COURSES

Here are some benefits:

  1. Learn online, with 1-on-1 mentorship from an industry expert. Get mentored by top industry experts, from companies like Google, Facebook, and Airbnb.
Springboard $100 Discount

2. Graduate with an online portfolio of projects that will help you land your dream job.

Springboard $100 Discount

3. Join a community: network with peers, and get the support you need from student advisors.

Springboard $100 Discount

4. Learn with the best. Our alumni work at Boeing, Amazon, and Pandora, and rate us 4.9 stars (of 5).

Springboard $100 Discount

SPRINGBOARD $100 DISCOUNT CODE: SK1U8

Let me know if you have any questions in the comments section.

Journal of statistical software paper on tidying data:

Standard

Data cleaning takes up a lot of time during a data science process; it’s not necessarily a bad thing and time spent on cleaning data is worthwhile in most cases; To that end, I was researching some framework that might help me make this process a little bit faster. As a part of my research, I found the Journal of statistical software paper written by Hadley Wickham which had a really good framework to “tidy” data — which is part of data cleaning process.

Author does a great job of defining tidy data:

1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.

And then applying it to 5 examples:

 1. Column headers are values, not variable names.
2. Multiple variables are stored in one column.
3. Variables are stored in both rows and columns.
4. Multiple types of observational units are stored in the same table.
5. A single observational unit is stored in multiple tables

It also contains some sample R code; You can read the paper here: http://vita.had.co.nz/papers/tidy-data.pdf

Single variable linear regression: Calculating baseline prediction, SSE, SST, R2 & RMSE:

Standard

Introduction:

This post is focused on basic concepts in linear regression and I will share how to calculate baseline prediction, SSE, SST, R2 and RMSE for a single variable linear regression.

Dataset:

The following figure shows three data points and the best-fit regression line: y = 3x + 2.

The x-coordinate, or “x”, is our independent variable and the y-coordinate, or “y”, is our dependent variable.

Baseline Prediction:

Baseline prediction is just the average of values of dependent variables. So in this case:

(2 + 2 + 8) / 3 = 4

It won’t take into account the independent variables and just predict the same outcome. We’ll see in a minute why baseline prediction is important.

Here’s what the baseline model would look like:

regression baseline model

SSE:

SSE stands for Sum of Squared errors.

Error is the difference between actual and predicted values.

So SSE in this case:

= (2 – 2)^2 + (2 – 5)^2 + (8 – 5)^2

= 0 + 9 + 9

= 18

SST:

SST stands for Total Sum of Squares.

Step 1 is to take the difference between Actual values and Baseline values of the dependent variables.

Step 2 is to Square them each and add them up.

So in this case:

= (2 – 4)^2 + (2 – 4)^2 + (8 – 4)^2

= 24

R2:

Now R2 is 1 – (SSE/SST)

So in this case:

= 1 – (18/24)

= 0.25

RMSE:

RMSE is Root mean squared error. It can be computed using:

Square Root of (SSE/N) where N is the # of dependent variables.

So in this case, it’s:

SQRT (18/3) = 2.44

 

Is the R data science course from datacamp worth the money?

Standard

DataCamp R Data Science

Question (on Quora) Is the R data science course from datacamp worth the money?

Answer:

It depends on your learning style.

If you like watching videos then coursera/udacity might be better.

If you like reading then a book/e-book might be better.

If you like hands-on then something like Data Camp is a great choice. I think they have monthly plans so it’s much cheaper to try them out. When I subscribed to it, it was like 30$/Month or so. I found it was worth it. Also, if you want to see if “hands-on” is how you learn best. Try this: swirl: Learn R, in R. — it’s free! Also, Data Camp has a free course on R too so you could try that as well.

Also, if you want to have free unlimited access for 2-days then try this link: https://www.datacamp.com/invite/G8yVkTrwR3Khn

VIEW QUESTION ON QUORA

How Marketable is R programming?

Standard

Someone asked this on Quora: How Marketable is R programming?

Answer:

Let’s step back!

Why do you want to learn R? OR why do people learn R?

To solve problems that R can address. Right?

What problems do you have? OR what problems does your COMPANY have? OR what PROBLEMS your Dream company that you want to join have?

<< LIST THEM DOWN HERE>>

example:

  • I want to predict customers that are going to churn next quarter.
  • I want to identify Marketing channel that drove the revenue growth last quarter.
  • etc..

What’s Next?

NOW, take all of these problems and find ways to solve them.

R may or may not help.

You could just do it in Excel. Then do that.

OR R helps you a little bit in the process but you need something else.

In some case, R is a perfect solution! Like building a model to predict customer churn!

So, What?

you see, learning R is important and you might get a job by showing that you have “R” chops but that will not be enough for career growth. You should be focused on learning to solve business problems using data. use R sometimes. use Excel sometimes. use Python sometimes. use SQL. use Tableau. use << INSERT A TOOL HERE>>. Learn them. Apply them. Figure out their strengths and weakness. BUT learn to use all of these technology platforms to solve problems! Solve problems that are thorny. Solve problems that move the business needle. Solve problems that get your bosses boss promoted.

If you do that, marketing your skills wouldn’t be a concern anymore.

It’s NOT easy. And it WILL take time.

TL;DR: Go for it! Learn R! But more importantly, learn to solve problems with data.

VIEW QUESTION ON QUORA

Book Review: R in a Nutshell

Standard

R is a popular tool among data scientists because it’s just like a Swiss Army knife (or may be more!) for them!

R Language Data scientist swiss army knife tool

Analogy credit: Tapping the Data Deluge with R by Jeffrey Breen

Sometime back I worked on a research project that involved writing some R code – we were searching for tools ways to pull data from multiple social networks, perform text analysis and create effective data visualizations. R seemed like a great tool & so I was searching for a book/guides that teaches me fundamentals I needed to know to get few R related things done. One of the books that I used often during the research project was “R in nutshell”. I didn’t read it cover-to-cover but it was a great reference book for me. I used to read guides online/other-books and then I used to combine information from this book to get stuff done. The section I liked the most was on Data visualization which included some great code snippets to create effective data visualization using ggplot2 library. I used to take code snippets from this book & apply it on data-sets that I had.

text analysis sentiment

Fun stuff!

Also, I liked it that the book has some end-to-end examples that cover the entire life cycle of data analysis/statistical-analysis.

Summary:

I recommend this book as a “reference” for someone who started working with R.

Note:

I received a copy of this book as part of OREILLY’s Blogger program. Thanks OREILLY! If you are a blogger, you should check out that program!

How to start Analyzing Twitter Data w/ R?

Standard

Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)