Decision trees in Python with Scikit-Learn

A decision tree is one of the many machine learning algorithms. A decision tree is a decision tool. Its similar to a tree-like model in computer science. (root at the top, leaves downwards).

In this article we’ll implement a decision tree using the Machine Learning module scikit-learn. Its one of the many machine learning modules, TensorFlow is another popular one.

Related course: Complete Machine Learning Course with Python

Classifiers

Imagine writing a program that has to predict if a picture contains a male or female. You would have to write tons of programming rules. If I’d give you another group of two images, you’d have to create new programming rules all over again. Machine Learning is a better way to solve these problems.

Instead of programatically defining each rule, we use an algorithm that creates the rules for us. This type of algorithm is named a classifier. It takes data as input and shows a label as output.

A practical example of this would be, given an image of a person, the classifier would predict if it’s female or male.

The classifier has these steps:

collect data
train classifier
make predictions

We train the classifier by giving the algorithm data and labels. This type of machine learning is called supervised learning.

In this example we’ll use simple arrays as data. In practice you’d often want to have large datasets to make good predictions.

Decision Tree Visual Example

At every node of the tree, we can turn left or right. Based on numbers we walk the branches. At the end of branches are outcomes. Once the classifier is trained based on this data. We can then use the classifier to make predictions.

A graphical example of a decision tree:

Based on numeric input, a computer can decide the output. If its input would be [False, True], it would predict ‘You’.

Related course:

Install sklearn

If you have not installed sklearn, install it with

sudo pip install sklearn

also install scipy

sudo pip install scipy

Decision tree

We import tree from sklearn and create the model

from sklearn import tree
clf = tree.DecisionTreeClassifier()

Then we create the training data for the classifier / decision tree:

#[height, hair-length, voice-pitch]                                             
X = [ [180, 15,0],                                                              
      [167, 42,1],                                                              
      [136, 35,1],                                                              
      [174, 15,0],                                                              
      [141, 28,1]]                                                              

Y = ['man', 'woman', 'woman', 'man', 'woman']

Putting it all together:

from sklearn import tree                                                        
clf = tree.DecisionTreeClassifier()                                             

#[height, hair-length, voice-pitch]                                             
X = [ [180, 15,0],                                                              
      [167, 42,1],                                                              
      [136, 35,1],                                                              
      [174, 15,0],                                                              
      [141, 28,1]]                                                              

Y = ['man', 'woman', 'woman', 'man', 'woman']                                   

clf = clf.fit(X, Y)                                                             
prediction = clf.predict([[133, 37,1]])                                         
print(prediction)

Download examples