Data mining process is the discovery through large data sets of patterns, relationships and insights that guide enterprises measuring and managing where they are and predicting where they will be in the future.
Large amount of data and databases can come from various data sources and may be stored in different data warehousess. And, data mining techniques such as machine learning, artificial intelligence (AI) and predictive modeling can be involved.
The data mining process requires commitment. But experts agree, across all industries, the data mining process is the same. And should follow a prescribed path.
Here are the 6 essential steps of the data mining process.
1. Business understanding
In the business understanding phase:
- First, it is required to understand business objectives clearly and find out what are the business’s needs.
- Next, assess the current situation by finding the resources, assumptions, constraints and other important factors which should be considered.
- Then, from the business objectives and current situations, create data mining goals to achieve the business objectives within the current situation.
- Finally, a good data mining plan has to be established to achieve both business and data mining goals. The plan should be as detailed as possible.
2. Data understanding
- The data understanding phase starts with initial data collection, which is collected from available data sources, to help get familiar with the data. Some important activities must be performed including data load and data integration in order to make the data collection successfully.
- Next, the “gross” or “surface” properties of acquired data need to be examined carefully and reported.
- Then, the data needs to be explored by tackling the data mining questions, which can be addressed using querying, reporting, and visualization.
- Finally, the data quality must be examined by answering some important questions such as “Is the acquired data complete?”, “Is there any missing values in the acquired data?”
3. Data preparation
The data preparation typically consumes about 90% of the time of the project. The outcome of the data preparation phase is the final data set. Once available data sources are identified, they need to be selected, cleaned, constructed and formatted into the desired form. The data exploration task at a greater depth may be carried during this phase to notice the patterns based on business understanding.
- First, modeling techniques have to be selected to be used for the prepared data set.
- Next, the test scenario must be generated to validate the quality and validity of the model.
- Then, one or more models are created on the prepared data set.
- Finally, models need to be assessed carefully involving stakeholders to make sure that created models are met business initiatives.
In the evaluation phase, the model results must be evaluated in the context of business objectives in the first phase. In this phase, new business requirements may be raised due to the new patterns that have been discovered in the model results or from other factors. Gaining business understanding is an iterative process in data mining. The go or no-go decision must be made in this step to move to the deployment phase.
The knowledge or information, which is gained through data mining process, needs to be presented in such a way that stakeholders can use it when they want it. Based on the business requirements, the deployment phase could be as simple as creating a report or as complex as a repeatable data mining process across the organization. In the deployment phase, the plans for deployment, maintenance, and monitoring have to be created for implementation and also future supports. From the project point of view, the final report of the project needs to summary the project experiences and review the project to see what need to improved created learned lessons.
These 6 steps describe the Cross-industry standard process for data mining, known as CRISP-DM. It is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model.
Do these 6 steps help you understand the data mining process? What is your organization’s readiness for date mining?