Neologism is the new challenge for IT professionals, Here’s why:

Standard

What is Neologism?

Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.

Let’s take couple of example(s):

Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:

Big Data Unstrucutred External Text Public Data

Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.

And Neologism is challenge because:

1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean

2) It means different things in different context

3) Every person can have their own “interpretation” and no one is wrong.

4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.

Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!

Related Articles:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Three V’s of Big Data with Example:

Standard

In this blog-post, we would see the Three V’s of Big Data with Example:

1. Volume:

TB’s and PB’s and ZB’s of data that gets created:

From the webinar “How to Walk The Path from BI to Data Science: An interview with Michael Driscoll, data scientist and CEO of Metamarkets” – A global surge in Data

2. Velocity:

The speed at which information flows.

Example: 50 Million tweets per day!

twitter 50 million tweets per day

(This is back in Nov. of 2010 – the number must have increased!)

UPDATE 23 Nov 2012: on, wikipedia it says – 340 million tweets per day!

twitter 2012 340 million tweets per day

3. Variety:

All types of data is now being captured which may be in structured format or not.

Example: Text from PDF’s, Emails, Social network updates, voice calls, web traffic logs, sensor data, click streams, etc

data variety big data

Image courtesy

And this may be followed by other V’s like V for Value.

Conclusion:

In this blog-post, we saw Three V’s of Big Data with Example.

Related Posts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Examples to help clarify what’s unstructured data and what’s structured?

Standard

I have been reading and researching about BigData and BigData on cloud. One of the concept that’s repeated is that “Big Data is about analyzing unstructured data…” and in this blog post, I just want to show few examples that would help you differentiate between Structured data & Unstructured data.

Before we begin, here’s the definition of Unstructured data:

Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables – Wikipedia

Also I just wanted to point that it’s not unstructured because you cannot fit the data into a schema/model but even after fitting it into the model – it would not help. Example. Consider email body as an example of unstructured data. You can create a column “EMAIL BODY”. Now think of questions that are likely to be asked. Do they get answered? if not – then fitting it into model and calling it structured does not make sense, does it? With that, Here are the examples:

1 Word Doc & PDF’s & Text files

Unstructured data

Examples: Books, Articles

2. Audio files

Unstructured data

Example: Call center conversations.

3. email body

Unstructured data

Example: you don’t need an example here!

4. Videos

Unstructured data

Example: Video footage of criminal interrogation

5. A Data Mart / Data Warehouse

Structured Data

6. XML

Semi Structured Data

Couple of Applications for your brain cells:

1. Map disease patterns by analyzing medical records (Text)

2. Tuning customer support by analyzing calls (Audio)

Few Quotes about Unstructured data that I liked:

80 percent of business-relevant information originates in unstructured form –  Justin Langseth. URL (Wikipedia Article says that even Merrill Lynch cited this)

BUT some-one else had a nice perspective about this 80%:

but managing it (this 80%) really isn’t a significant problem……………the innovation isn’t in structuring text, it’s in applying models to discover and exploit their inherent structure. Source

My Experience with Unstructured Data (in context of BigData) and Cloud:

I have been playing with MapReduce on Windows Azure (Project Daytona), Elastic Map reduce (Amazon Web Services) and Google’s BigQuery platform. To give you one example. I’ll use the example of Microsoft’s project daytona. Here I uploaded data in unstructured format in form of TEXT. And the goal was to run the “Word Count”. It helps you answer questions like: which word has the highest frequency? or which is the least popular word? and you could tweak the algorithm to consider words with length greater than four (among other constraints) – Now this is what happens when you run the algo: amazing MapReduce framework (App deployed on Windows Azure in this case) does some analysis on unstructured data (TEXT  in this case) and it helps you answer the question that you were looking for. So I hope you know how it works.

That’s about it for this post. Do you have an example or application of unstructured data? Please do post it in the comments!