For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:
|Since a Training set exists, we describe this technique as Supervised learning||Since Training set is not used, we describe this technique as Unsupervised learning|
|Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not.||Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm|
If you want to learn about Data Mining, check out the “free Book in PDF format: Mining the massive data-sets”.
12 thoughts on “Data Mining: Classification VS Clustering (cluster analysis)”
Where does it say that clustering does not use training sets?
So the biggest difference between the two is that classification is predetermined while cluster isn’t? And the sub-datasets with cluster, are they similar with each other or within each other? Thanks!
Right, segments are pre-determined for classification.
If you do a hands-on session on clustering then that might help you with your second question – Let me know if you have any questions.
Dear , I want to analysis on semi supervised data streams. By using a windows approach I want to establish this. So please , can you help me ? How can I do this ? Please let me suggest some tips.
Md. Shahidul Islam
Very nicely put, great work…