Data Mining Techniques
Data Mining is not a new technology. Its roots have been traced to the 1930s. But it is the term that came into usage in the 1990s as businesses attempted to grapple with an increasing amount of data on the society which was producing value from it. The advent of modern computers and the application of data mining techniques meant the business could analyze exponential amounts of data and extract non-intuitive, valuable insights forecasting likely business outcomes, mitigating risks, and taking advantage of recognized opportunities. The data mining techniques are:
- Clustering– This is a technique that is used to represent data visually- such as in graphs that show buying trends or sales demographics for a particular product.
What is clustering in Data mining?
Clustering refers to the process of making a group of a series of different data points based on their properties. Data miners can seamlessly divide the data into subsets that will allow for more informed decisions in terms of broad demographics and also their behaviors.
There are methods of Clustering:
- Partitioning method- This involves dividing a data set into a group of specific clusters for evaluation based on the criteria of each cluster.
- Hierarchical method- In the hierarchical method, data points will be a single cluster which is required to group based on the similarities. These newly created clusters can be analyzed separately from each other.
- Density-based method- A machine learning method where data points plotted together are further analyzed, but the data points themselves are labeled.
- Grid-based method- This involves categorizing data into cells on a table or grid, which then can be clustered by individual cells rather than by the entire database.
- Model-based method- Models are created for each data cluster to locate the best data to fit that particular model.
- Association- Association is the set of rules that is used to find the correlations or associations between variables in databases. Association is employed to help companies determine marketing research and strategy.
Methods for data mining association:
- Single-dimensional association- This includes looking for one repeating instance of a data point or attribute.
- Multi-dimensional association- This involves finding many data points in a data set. The same retailer might want to know more information than what a customer purchased such as their age, and method of purchase.
- Data cleaning- This is the process of preparing the data which is to be mined. This involves organizing data, eliminating duplicate or corrupted data, and filing in any null values.
Methods that are used in data cleaning:
- verifying the data- This involves verifying the format of every datapoint in the data set.
- Converting data type- This makes sure data is uniform across the data set. For instance, numeric variables only contain numbers, while string variables can contain letters, numbers, and characters.
- Removing irrelevant data- This clears unused or redundant data so that full importance can be placed on necessary data points.
- Eliminating duplicate data points- This helps to speed up the mining process by boosting efficiency and reducing errors.
- Removing errors- This eliminates typos, spelling errors, and input errors that could adversely affect analysis outcomes.
- Completing missing values- This provides an estimated value for all the data and reduces missing values that will lead to skewed or incorrect results.
- Data visualization- It is the translation of data into graphic form to illustrate its meaning to business stakeholders.
Methods of data visualization are:
- Comparison charts- Charts and tables express relationships in data, such as monthly product sales over one year.
- Maps- Data maps are used to visualize data of specific geographic locations. Maps data can be used to show population density and changes by comparing populations of neighboring states, counties, and also countries. It detects how the populations will be spread over a geographic region and also compare characteristics in one region to those in other regions.
- What is Clustering?
- Explain the use of the Association technique.