Page 109 - Code & Click - 8
P. 109
Step 1: Define the Problem
Before any data is touched, extracted, cleaned, or analysed, it is important to understand the
underlying entity and the project at hand. What are the goals the company is trying to achieve by
mining data? What is their current business situation?
Step 2: Identify Required Data
Once the business problem has been
clearly defined, it’s time to start thinking
about data. This includes what sources
are available, how it will be secured
and stored, how information will be
gathered, and what the final outcome
or analysis may look like. This step also
critically thinks about what limits there are to data, storage, security, and collection, and assesses
how these constraints will impact the data mining process.
Step 3: Prepare and Pre-process
During this stage, data is gathered, uploaded, extracted, or calculated. It is then cleaned, standardised,
scrubbed for outliers, assessed for mistakes, and checked for reasonableness.
Step 4: Model the Data
Data scientists use several types of data mining techniques to search for relationships, trends,
associations, or sequential patterns.
Step 5: Train and Test
The data-centered aspect of data mining concludes by assessing the findings of the data model(s).
The outcomes from the analysis may be aggregated, interpreted, and presented to decision-makers.
Step 6: Verify and Deploy
The data mining process concludes with management taking steps in response to the findings of the
analysis. The company may decide the information was not strong enough or the findings were not
relevant to change course. Alternatively, the company may strategically take decisions based on the
findings.
Data Warehousing
Warehousing is an important aspect of data mining. Warehousing is when companies centralise their
data into one database or program. With a data warehouse, an organisation may spin off segments
of the data for specific users to analyse and use.
Cloud data warehouse solutions use the space and power of a cloud provider to store data from
multiple data sources. This allows smaller companies to leverage digital solutions for storage,
security, and analytics.
Data Mining Techniques
Data mining uses algorithms and various techniques to convert large collections of data into useful
output. The most popular types of data mining techniques include:
• Association Rules: Also referred to as Market Basket Analysis, this technique searches for
relationships between variables. This relationship in itself creates additional value within the
data set as it strives to link pieces of data.
107