|
Article · October 2022 doi: 10. 17303/jcssdBog'liq 2.1 ga oid
1. Data Selection:
selecting proper data and relevant
variables, on which discovery has to be performed.
2. Data Processing
: this step aims to make the data
clean by replacing missing values, removing noise and outliers.
3. Data Transformation
: reducing and projecting the
data in order to obtain a suitable form that data mining algo-
rithms can be implement.
4. Data Mining:
choosing a proper data mining meth-
od (classification, clustering or regression), suitable algorithm to
perform the task, and extracting the patterns.
5. Evaluation and Interpretation
: this is the last step,
the patterns extracted and now the user interprets and extracts
the knowledge from the patterns. This step includes visualization
of extracted patterns and models, or visualization of data using
the extracted models [19,20].
Data Mining Algorithms
In present’s world of big data, a large database is becoming
a necessity. Just imagine there present a database with many tera-
bytes. As Facebook alone handles 600 terabytes of new data every
single day. Also, the primary challenge of big data is how to make
sense of it. Moreover, the big volume is not the only problem. Also,
big data need to diverse, unstructure and fast changing. Consid-
er audio and video data, social media posts, 3D data or geospatial
data. This kind of data is not easily categorized or organized. addi-
tional, to meet this challenge, a many of algorithms for extracting
information or data mining. In this section, we discuss a variety
of learning algorithms including k-means, decision trees, classifica-
tion algorithms, neural network, Naive Bayes, K Nearest Neighbors
Algorithm, association, regression, and ID3 algorithm. And here,
We’ll talk about the details of the most commonly used algorithms:
Classification
Classification is a more complex data mining algo-
rithm that forces you to collect various attributes together into
discernible categories, which you can then use to draw further
conclusions, or serve some function. For example, if you are
evaluating data on individual customers’ financial backgrounds
and purchase histories, you might be able to classify them as
low, medium, or high credit risks. You could then use these
classifications to learn even more about those customers
Decision Trees
A graphical representation of a collection of classifica-
tion rules. Given a data record, the tree directs the record from
the root to a leaf. Each internal node denotes a test on an attri-
bute, each branch denotes the outcome of a test, and each leaf
node holds a class label. The topmost node in the tree is the root
node.
|
| |