tag: decision tree | Python Tutorial

Tag: decision tree

Decision tree

A decision tree is one of the many machine learning algorithms. A decision tree is a decision tool. Its similar to a tree-like model in computer science. (root at the top, leaves downwards).

In this article we’ll implement a decision tree using the Machine Learning module scikit-learn. Its one of the many machine learning modules, TensorFlow is another popular one.

Related course:

Classifiers

Imagine writing a program that has to predict if a picture contains a male or female. You would have to write tons of programming rules. If I’d give you another group of two images, you’d have to create new programming rules all over again. Machine Learning is a better way to solve these problems.

Instead of programatically defining each rule, we use an algorithm that creates the rules for us. This type of algorithm is named a classifier. It takes data as input and shows a label as output.

A practical example of this would be, given an image of a person, the classifier would predict if it’s female or male.

The classifier has these steps:

  • collect data
  • train classifier
  • make predictions
    supervised learning
    We train the classifier by giving the algorithm data and labels. This type of machine learning is called supervised learning.

In this example we’ll use simple arrays as data. In practice you’d often want to have large datasets to make good predictions.

Decision Tree Visual Example

At every node of the tree, we can turn left or right. Based on numbers we walk the branches. At the end of branches are outcomes. Once the classifier is trained based on this data. We can then use the classifier to make predictions.

A graphical example of a decision tree:

decision-tree
Based on numeric input, a computer can decide the output. If its input would be [False, True], it would predict ‘You’.

Related course:

Install sklearn

If you have not installed sklearn, install it with

sudo pip install sklearn

also install scipy

sudo pip install scipy  

Decision tree

We import tree from sklearn and create the model

from sklearn import tree
clf = tree.DecisionTreeClassifier()

Then we create the training data for the classifier / decision tree:

#[height, hair-length, voice-pitch]                                             
X = [ [180, 15,0],
[167, 42,1],
[136, 35,1],
[174, 15,0],
[141, 28,1]]

Y = ['man', 'woman', 'woman', 'man', 'woman']

Putting it all together:
from sklearn import tree                                                        
clf = tree.DecisionTreeClassifier()

#[height, hair-length, voice-pitch]
X = [ [180, 15,0],
[167, 42,1],
[136, 35,1],
[174, 15,0],
[141, 28,1]]

Y = ['man', 'woman', 'woman', 'man', 'woman']

clf = clf.fit(X, Y)
prediction = clf.predict([[133, 37,1]])
print(prediction)
Download examples

Decision tree visual example

A decision tree can be visualized. A decision tree is one of the many Machine Learning algorithms.
It’s used as classifier: given input data, it is class A or class B? In this lecture we will visualize a decision tree using the Python module pydotplus and the module graphviz

Related course:

If you want to do decision tree analysis, to understand the decision tree algorithm / model or if you just need a decision tree maker - you’ll need to visualize the decision tree.

Decision Tree

Install
You need to install pydotplus and graphviz. These can be installed with your package manager and pip.
Graphviz is a tool for drawing graphics using dot files. Pydotplus is a module to Graphviz’s Dot language.

Data Collection
We start by defining the code and data collection. Let’s make the decision tree on man or woman. Given input features: “height, hair length and voice pitch” it will predict if its a man or woman.

We start with the training data:
training data

In code that looks like:

import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
import collections

# Data Collection
X = [ [180, 15,0],
[177, 42,0],
[136, 35,1],
[174, 65,0],
[141, 28,1]]

Y = ['man', 'woman', 'woman', 'man', 'woman']

data_feature_names = [ 'height', 'hair length', 'voice pitch' ]

Train Classifier
The next step is to train the classifier (decision tree) with the training data.
Training is always necessary for supervised learning algorithms

# Training
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X,Y)

Decision Tree Visualization
We then visualize the tree using this complete code:

# Visualize data
dot_data = tree.export_graphviz(clf,
feature_names=data_feature_names,
out_file=None,
filled=True,
rounded=True)
graph = pydotplus.graph_from_dot_data(dot_data)

colors = ('turquoise', 'orange')
edges = collections.defaultdict(list)

for edge in graph.get_edge_list():
edges[edge.get_source()].append(int(edge.get_destination()))

for edge in edges:
edges[edge].sort()
for i in range(2):
dest = graph.get_node(str(edges[edge][i]))[0]
dest.set_fillcolor(colors[i])

graph.write_png('tree.png')

This will save the visualization to the image tree.png, which looks like this:

decision tree machine learning

If you want to make predictions, check out the decision tree article.


1