How can I start learning and exploring the field of Big Data Algorithms?


Someone asked this on Quora about how to learn & explore the field of Big Data Algorithms? Also, mentioned having some background in python already and wanted ideas to work on a good project so with that context, here is my reply:

There are two broad roles available in Data/Big-Data world:

  1. Engineering-oriented: Date engineers, Data Warehousing specialists, Big Data engineer, Business Intelligence engineer— all of these roles are focused on building that data pipeline using code/tools to get the data in some centralized location
  2. Business-oriented: Data Analyst, Data scientist — all of these roles involve using data (from those centralized sources) and helping business leaders make better decisions. *

*smaller companies (or startups) tend to have roles where small teams(or just one person) do it all so the distinction is not that apparent.

big data

Now given your background in python and programming, you might be a great fit for “Data engineer” roles and I would recommend learning about Apache spark (since you can use python code) and start building data pipelines. As you work with a little bit more than you can learn about how to build and deploy end-to-end machine learning projects with python & Apache spark. If you acquire these skills and keep learning — then I am sure you will end up with a good project.

Hope that helped and good luck!


How to add Sparkline data visualization to Google spread sheets?


I like using spark lines data viz when it makes sense! It’s a great way to visualize trends in the data without taking too much space. Now, I knew how to add sparklines in Excel but recently, I wanted to use that on Google sheet and I had to figure it out so here are my notes:

1. Google has an inbuilt function called “SPARKLINE” to do this.

2. Sample usage: =SPARKLINE(B2:G2) — by default you can put line chart in your cells.

3. Then there are other options including changing the chart type. You can find them documented here:

4. One of the best practices that I advocate when you spark-line to “compare” trends is to make sure that you have the consistent axis definition. So the sample usage for that could like this:


(if you want to do this for excel then here’s the post: )

After you’re done, here’s what a finished version could like on Google sheet:

Google Sheet Data visualization spark line

Here’s the working google sheet: