Data mining is like the novel Moby Dick. Everybody has heard of it, but few have read it.

Vendors like to toss about the term “data mining” to get our attention. However, when it really comes down to it, there are very few practitioners using data mining techniques to derive value. Some are just writing queries and sorting on the data to “visually” mine the results.

Data Mining versus Data Sorting. What is the difference?

At the basic level, both mining and sorting are simply the search for valuable information (the hidden gold). The quest to extract useful information and patterns from data. Data sorting is the process of querying data and sorting for common characteristics – but that is as far as it goes. Through visualization techniques, the reader of the data is left to derive their own conclusions.

Data mining is using the computational power of systems to develop pattern-discovering algorithms with minimal intervention from the user. The most important feature that separates data mining from data sorting is the ability to “predict” future behaviors using patterns found in the data.

Data mining techniques can be divided into two main categories: Discovery techniques and Predictive techniques. Discovery techniques are used to find patterns that preexist in the data, but with ** no prior knowledge **that the patterns exist. One can think of these patterns as serendipitously discovered. The most popular techniques of discovery are: (1) clustering, (2) association and (3) sequential.

Predictive Mining is the process of using the patterns found during discovery, applying regression analysis to predict a categorical or numerical value.