Data Mining: Classification VS Clustering (cluster analysis)

Standard

For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:

CLASSIFICATIONCLUSTERING
  • We have a Training set containing data that have been previously categorized
  • Based on this training set, the algorithms finds the category that the new data points belong to
  • We do not know the characteristics of similarity of data in advance
  • Using statistical concepts, we split the datasets into sub-datasets such that the Sub-datasets have “Similar” data
Since a Training set exists, we describe this technique as Supervised learningSince Training set is not used, we describe this technique as Unsupervised learning
Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not.Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm

If you want to learn about Data Mining, check out the “free Book in PDF format: Mining the massive data-sets”.

12 thoughts on “Data Mining: Classification VS Clustering (cluster analysis)

  1. Ann

    So the biggest difference between the two is that classification is predetermined while cluster isn’t? And the sub-datasets with cluster, are they similar with each other or within each other? Thanks!

    • Right, segments are pre-determined for classification.

      If you do a hands-on session on clustering then that might help you with your second question – Let me know if you have any questions.

      • Md. Shahidul Islam

        Dear , I want to analysis on semi supervised data streams. By using a windows approach I want to establish this. So please , can you help me ? How can I do this ? Please let me suggest some tips.

        Thanks.
        Md. Shahidul Islam

What do you think? Leave a comment below.