Random Forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time, creating a forest of those trees.

Random Forest is ensemble learning because uses different types of algorithms or same algorithm multiple times to form a more powerful prediction model.

On the other hand, Decision trees are a predictive model to go from observations about an item to conclusions about the item’s target value.

Random Decision Forest correct for Decision Trees habit of overfitting to their training set.

Related Course:
Zero To One - A Beginner Tensorflow Tutorial on Neural Networks

How do Random Forests work?

The general method for Random Decision Forests was proposed by Ho in 1995 and its supervised machine learning.

How it work?

  1. Import a library or libraries to work with.
  2. Select random samples from the dataset.
  3. Make a decision tree for each sample to get a prediction result from each decision tree.
  4. Perform a vote for each predicted result.
  5. Select the prediction result with the most votes as the final prediction.

In code:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=4,
... n_informative=2, n_redundant=0,
... random_state=0, shuffle=False)
clf = RandomForestClassifier(n_estimators=100, max_depth=2,
... random_state=0)
clf.fit(X, y)

Advantages of Random Forest Algorithm

  • The number of Decision Trees to create a Random Forest makes it a highly accurate and robust method.
  • Random Forests don’t suffer from overfitting because takes the average of all the predictions.
  • It can also handle missing values; using median values to replace continuous variables and computing the proximity-weighted average of missing values.
  • Since the Random Forest has multiple trees and each tree is trained on a subset of data, the algorithm is not biased.
  • Even if new data is introduced in the dataset the overall algorithm, the Random Forest is not affected much since new data may impact one tree.
  • It works well when you have both categorical and numerical features.
  • It works well when data has missing values or it has not been scaled well (although we have performed feature scaling in this article just for the purpose of demonstration).

Disadvantages of Random Forest Algorithm

Random Forest Algorithm has more advantages than disadvantages.

  • The biggest disadvantage is the complexity, requiring much more resources, due to the number of decision trees.
  • Making each Decision Tree to create the Random Forest requires much more time to train than other algorithms.

Random Forest vs Decision Trees

Random Forests are a set of multiple decision trees. Random Forests prevent the overfitting that the Decision Trees usual suffer from.

Decision trees are computationally faster than Random Forests.

Random Forest can be difficult to interpret, comparing it to Decision Trees. Random forest Algorithm relies on the power of each Decision Tree.

Depending on the task, different programming languages and libraries can be used for Random Forests Algorithm.

Download examples