Establishing a top data mining algorithms list is no easy thing due to the fact that all algorithms have their clear purpose and excel in solving certain problems.
Moreover, there are several cases in which a bundle of algorithms is used for achieving the correct answer to a specific problem.
Factors that determine what is the best data mining algorithm include popularity, usefulness or research merit. Thus, lets see below the most used data mining algorithms.
Related course: Complete Machine Learning Course with Python
Algorithms
C 4.5
One of the top most influential data mining algorithm is C 4.5 algorithm. C 4.5 sets up a classifier in the form of a decision tree. For this to be achieved, the C 4.5 algorithm requires an initial set of data representing items that are already classified.
k-means
K-means data mining algorithm follows closely, being used for creating k groups from a set of objects in order to group similar items. It is frequently used in cluster analysis techniques for analysing a data set more thoroughly.
Support Vector Machine
The support vector machine or SVM data mining algorithm uses a hyperplane in order to separate data into 2 classes. It is quite similar to the C 4.5 algorithm, with the sole difference that SVM does not use a decision tree.
Apriori algorithm
The apriori algorithm is also popular in the field as it learns association rules that can be applied to a database that features a vast number of transactions.
Expectation-maximization data mining algorithm or EM is great as a clustering algorithm being usually employed for knowledge discovery.
PageRank
PageRank represents a link analysis algorithm constructed with the scope of determining the presumed importance of some objects connected within a network of objects.
AdaBoost
AdaBoost is also a popular data mining algorithm that sets up a classifier. A classifier is meant to get some data and attempt to predict which set of new data element belongs to.
CART data mining algorithm stands for both classification and regression trees. Basically, it is a decision tree learning technique that outputs either classification or regression trees. Similar to C 4.5, CART is considered to be a classifier.
PCA
Principal Components Analysis or PCA is great for identifying important variables in the data set, being a great tool when it comes to data analysis.
Collaborative filtering
Collaborative filtering for building recommendation systems is basically a problem consisting of similarity matching. It is great for using in marketing statistics due to the fact that it provides a item based
recommendation analysis.
Bootstrap Aggregating
Bootstrap Aggregating is a machine learning algorithm which takes random samples from a dataset, while trying to identify the statistic in order to give the output.
Bottom line, there are a multitude of data mining algorithms available, while their importance and usage depends mostly on the result you are trying to achieve.