Data Mining: Classification VS Clustering (cluster analysis)


For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:

  • We have a Training set containing data that have been previously categorized
  • Based on this training set, the algorithms finds the category that the new data points belong to
  • We do not know the characteristics of similarity of data in advance
  • Using statistical concepts, we split the datasets into sub-datasets such that the Sub-datasets have “Similar” data
Since a Training set exists, we describe this technique as Supervised learning Since Training set is not used, we describe this technique as Unsupervised learning
Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not. Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm

12 thoughts on “Data Mining: Classification VS Clustering (cluster analysis)

  1. Ann

    So the biggest difference between the two is that classification is predetermined while cluster isn’t? And the sub-datasets with cluster, are they similar with each other or within each other? Thanks!

    • Right, segments are pre-determined for classification.

      If you do a hands-on session on clustering then that might help you with your second question – Let me know if you have any questions.

      • Md. Shahidul Islam

        Dear , I want to analysis on semi supervised data streams. By using a windows approach I want to establish this. So please , can you help me ? How can I do this ? Please let me suggest some tips.

        Md. Shahidul Islam

