SPEED is one of the important aspect of Data Analysis. Wouldn’t it be great if you query a data source, you get your answers as soon as possible? Yes? Right! Of course, it depends on factors like the size of the data you are trying to query but wouldn’t it be great if it’s at “SPEED OF THOUGHT“?
So Here’s the Problem:
Databases are mostly disk based and so the bottleneck here is the speed at which can get access to data off the disks.
So what can you do?
Let’s put data in a RAM (memory) because data-access via memory is faster.
If it’s sounds so easy, why didn’t people do it earlier? And why are we talking about “In Memory” NOW?
1) BIGGER Data Size/sets and so today with more data, it takes more time to query data from databases. And so researchers have looked at other approaches. One of the effective approach they found is: In-memory
(And I am not ignoring the advances in Database Technologies like Parallel databases, But for the purpose of understanding “Why In-memory”, it’s important to realize the growing size of data sets and a viable alternative we have to tackle the problem: In memory. And also I am not saying that it’s the ONLY way to go. I am just trying to understand the significance of in-memory technologies. We, as data professionals, have lot’s of choices! And only after evaluating project requirements, we can talk about tools and techniques)
2) PRICE of Memory: Was the price of RAM/memory higher than what it is today? So even though it was a great idea to put data in memory, it was cost-prohibitive.
So Let’s connect the dots: Data Analysis + In Memory Technologies:
What’s common between Microsoft’s PowerPivot, SAP HANA, Tableau and Qlikview?
1) Tools for Data-Analysis/Business-Intelligence 2) Their Back End data architecture is “In Memory”
So since Data Analysis needs SPEED and In-Memory Technologies solves this need – Data Analysis and Business Intelligence Tools adopted “In-memory” as their back-end data architecture. And next time, when you hear a vendor saying “in-memory”, you don’t have to get confused about what they’re trying to say. They’re just saying that we got you covered by giving you ability to query your data at “speed of thought” via our In-memory technologies so that you can go back to your (data) analysis.
That’s about it for this post. Here’s a related post: What’s the benefit of columnar databases?
your comments are very welcome!
0 thoughts on “Data Analysis and In Memory Technologies, let’s connect the dots:”
Very good insights.
Not to forget that the in-memory databases have become reality a due to availability of huge amounts of main memory and the lightweight compression techniques. In-memory databases achieve their optimal performance by building up cache-aware algorithms based on cost models for memory hierarchies.
Nice write up!
Thanks for pointing that out!
Your insights are well thought out. a question – how does “in memory” technologies impact the desire for real time data analysis? While large data set analysis will always have meaning (the importance of such analysis cannot be overstated as more and more super sets of data become available- exp. World Bank data sets. etc), I find that the allure of real time data analysis is getting all of the attention from both media and venture capitalists.
Great question! You’re right, I’ve read about “in-memory” and “real time” analysis in media and vendor sites. Though my thoughts are not as baked as it should be to connect the dots for “in – memory” and “real – time”, but nonetheless, here’s my current thought process:
First up, let’s define Real Time Analysis because it may mean different things to different vendors. So I want to be sure that we’re on same page here:
Let’s take an Example of Twitter Stream:
If we want to come up “Trending Topics” based on last 5 minute’s Twitter Social Stream which requires analyzing Tweets as they are posted, we do need a “real time analysis” system.
BUT if we’re using Twitter Stream for one year Twitter social stream, and mining macro trends, then we do not need real time analysis.
One more example:
A customer is in our retail store. We want to recommend him some product while he or she is in the store based on what they’re searching for right NOW in the store + data we can find about them from various sources like their past transactions with us, their facebook likes, their twitter stream, their amazon wishlist etc – then that’s real time!
BUT, if we want to email him/her coupons based on the data we’ve about them then that’s not real time.
in my mind, real-time analysis requires analysis of data as soon as it is generated/fetched because we want actionable insights NOW. So to that end, we can’t wait to LOAD data into a warehouse/database and then analyze it. We don’t have the time to complete that lifecycle. And this in these cases, we need a system that’s let’s as query the data as fast as possible. In-memory solutions provide that agility and hence real-time analysis tools can have (but not always) “in-memory” as their back end component.
And to come back to my blog about “data analysis” – in-memory is helpful because in interactive/exploratory – you want to slice/dice data and combine data sources as fast as you can and thus “in-memory” helps there.
At the end of the day, it’s about the business questions you’re trying to answer…not every “data analysis” needs real-time and not every “analysis” has to be interactive/exploratory.
I hope that helps
I would like to suggest a link with your recent piece on visual analytics (http://parasdoshi.com/2013/05/06/how-conditionally-formatting-your-data-in-excel-can-help-you-save-time-in-answering-business-questions/), along these lines…
I suggest that Visual Analytics is dependent on 3 interdependent capabilities:
1) Data Visualisation
Your previous piece (URL provided above) highlighted the value of presenting numeric data in a visual form, and you described this as visual analytics… I would suggest that it’s actually data visualisation, and that it’s an important capability within visual analytics (it would be hard for any analytics approach to be visual *without* data visualisation!), but that two further components are essential…
… and this is where your in-memory comments come to the fore; in-memory approaches to data analysis provide sufficient speed that the analytical experience can become instantaneous (dependent on many factors, but practical even with large data sets on regular computers). Instant interactivity changes the dynamic of the visual analyst’s activity, because it enables and encourages exploration of information (built from large data sets); the instant feedback is essential to encourage exploration (Brett Victor has a series of excellent presentations on the need for instant feedback for creative technical tasks – and information analysis is just such a task).
3) Data Blending
Very few data sets (databases, business systems, spreadsheets etc.) contain all of the information required to make the best of an analysis – it’s highly likely that the visual analyst will need to blend data from two, three or more sources in order to identify patterns and present credible hypotheses for them. Great visual analytics draws on many sources of available data, and uses appropriate tools to combine, sort, merge, aggregate, profile, link and utilise these data to build rich information for exploration and communication to others.
In my experience, all three of these are *essential* for effective visual analytics.
Great follow-up comment! Thanks for distinguishing visual analytics and data visualization, Appreciate it.