Quick Note about SAS’s acroynm SEMMA:


SEMMA is an acronym introduced by SAS which stands for:

Sample, Explore, Modify, Model and Assess.

I had recently posted about the Data Mining & Knowledge Discovery Process which had following sequential steps:

Raw Data => cleaning => sampling => Modeling => Testing

SEMMA follows the similar sequential steps as we had seen in the data mining process. So while Data Mining process is applicable to any data mining tool out their, SEMMA helps when you use SAS enterprise miner. In fact, it has helped me quickly find the data mining functions available in SAS tool:

sas sample explore modify model assess

Back to basics: Data Mining and Knowledge Discovery Process


Once in a while I go back to basics to revisit some of the fundamental technology concepts that I’ve learned over past few years. Today, I want to revisit Data Mining and Knowledge Discovery Process:

Here are the steps:

1) Raw Data

2) Data Pre processing (cleaning, sampling, transformation, integration etc)

3) Modeling (Building a Data Mining Model)

4) Testing the Model a.k.a assessing the Model

5) Knowledge Discovery

Here is the visualization:

knowledge discovery process data miningAdditional Note:

In the world of Data Mining and Knowledge discovery, we’re looking for a specific type of intelligence from the data which is Patterns. This is important because patterns tend to repeat and so if we find patterns from our data, we can predict/forecast that such things can happen in future.


In this blog post, we saw the Knowledge Discovery and Data Mining process.