Python

Tag: Machine Learning

Machine Learning

What is Machine Learning?
The word ‘Machine’ in Machine Learning means computer, as you would expect. So how does a machine learn?

Given data, we can do all kind of magic with statistics: so can computer algorithms.

These algorithms can solve problems including prediction, classification and clustering. A machine learning algorithm will learn from new data.

Types of learning

There are two types of learning: supervised learning and unsupervised learning.. Say what?

Let’s suppose we have consumer data. I tell the computer: these customers have a high income, those customers have median income. The training phase.
Then we can ask this computer:

 
You: Does this customer have a high or median income?
Computer: Based on the training data, I predict a high income.

With unsupervised learning algorithms, you have no idea. You give the data to the computer and expect answers. Surprisingly, these work rather well.

Decision tree visual example

A decision tree can be visualized. A decision tree is one of the many Machine Learning algorithms.
It’s used as classifier: given input data, it is class A or class B? In this lecture we will visualize a decision tree using the Python module pydotplus and the module graphviz

Related course:
Related course: Data Science and Machine Learning with Python – Hands On!

If you want to do decision tree analysis, to understand the decision tree algorithm / model or if you just need a decision tree maker - you’ll need to visualize the decision tree.

Decision Tree

Install
You need to install pydotplus and graphviz. These can be installed with your package manager and pip.
Graphviz is a tool for drawing graphics using dot files. Pydotplus is a module to Graphviz’s Dot language.

Data Collection
We start by defining the code and data collection. Let’s make the decision tree on man or woman. Given input features: “height, hair length and voice pitch” it will predict if its a man or woman.

We start with the training data:
training data

In code that looks like:

import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
import collections

# Data Collection
X = [ [180, 15,0],
[177, 42,0],
[136, 35,1],
[174, 65,0],
[141, 28,1]]

Y = ['man', 'woman', 'woman', 'man', 'woman']

data_feature_names = [ 'height', 'hair length', 'voice pitch' ]

Train Classifier
The next step is to train the classifier (decision tree) with the training data.
Training is always necessary for supervised learning algorithms

# Training
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X,Y)

Decision Tree Visualization
We then visualize the tree using this complete code:

# Visualize data
dot_data = tree.export_graphviz(clf,
feature_names=data_feature_names,
out_file=None,
filled=True,
rounded=True)
graph = pydotplus.graph_from_dot_data(dot_data)

colors = ('turquoise', 'orange')
edges = collections.defaultdict(list)

for edge in graph.get_edge_list():
edges[edge.get_source()].append(int(edge.get_destination()))

for edge in edges:
edges[edge].sort()
for i in range(2):
dest = graph.get_node(str(edges[edge][i]))[0]
dest.set_fillcolor(colors[i])

graph.write_png('tree.png')

This will save the visualization to the image tree.png, which looks like this:

decision tree machine learning

If you want to make predictions, check out the decision tree article.

kmeans clustering algorithm

Do you have observed data?

You can cluster it automatically with the kmeans algorithm.

In the kmeans algorithm, k is the number of clusters.

Clustering is an _unsupervised machine learning task. _ Everything is automatic.

Related course: Data Science and Machine Learning with Python – Hands On!

kmeans data

We always start with data. This is our observed data, simply a list of values.
We plot all of the observed data in a scatter plot.

 # clustering dataset
from sklearn.cluster import KMeans
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

x1 = np.array([3, 1, 1, 2, 1, 6, 6, 6, 5, 6, 7, 8, 9, 8, 9, 9, 8])
x2 = np.array([5, 4, 6, 6, 5, 8, 6, 7, 6, 7, 1, 2, 1, 2, 3, 2, 3])

plt.plot()
plt.xlim([0, 10])
plt.ylim([0, 10])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()

Result:
kmeans dataset

kmeans clustering example

We will cluster the observations automatically.

K can be determined using the elbow method, but in this example we’ll set K ourselves.

Note: K is always a positive integer. We cannot have -1 clusters (k).

The k-means clustering algorithms goal is to partition observations into k clusters.

Each observation belong to the cluster with the nearest mean.

 # clustering dataset
from sklearn.cluster import KMeans
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

x1 = np.array([3, 1, 1, 2, 1, 6, 6, 6, 5, 6, 7, 8, 9, 8, 9, 9, 8])
x2 = np.array([5, 4, 6, 6, 5, 8, 6, 7, 6, 7, 1, 2, 1, 2, 3, 2, 3])

plt.plot()
plt.xlim([0, 10])
plt.ylim([0, 10])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()

# create new plot and data
plt.plot()
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
colors = ['b', 'g', 'r']
markers = ['o', 'v', 's']

# KMeans algorithm
K = 3
kmeans_model = KMeans(n_clusters=K).fit(X)

plt.plot()
for i, l in enumerate(kmeans_model.labels_):
plt.plot(x1[i], x2[i], color=colors[l], marker=markers[l],ls='None')
plt.xlim([0, 10])
plt.ylim([0, 10])

plt.show()

Result:
kmeans clustering algorithm

If you see the above result, Kmeans has clustered the observations automatically.

Download examples

Speech Recognition

Speech recognition is the process of converting spoken words to text. Python supports many speech recognition engines and APIs, including Google Speech Engine, Google Cloud Speech API,
Microsoft Bing Voice Recognition and IBM Speech to Text.

In this tutorial we will use Google Speech Recognition Engine with Python.

Related course: Data Science and Machine Learning with Python – Hands On!

Installation

A library that helps is named “SpeechRecognition”. You should install it with pyenv, pipenv or virtualenv. You can also install it system wide:

pip install SpeechRecognition

The SpeechRecognition module depends on pyaudio, you can install them from your package manager.
On Manjaro Linux these packages are called “python-pyaudio” and “python2-pyaudio”, they may have another name in your system.

Speech Recognition demo
You can test the speech recognition module, with the command:

python -m speech_recognition

Results show in terminal.

Speech Recognition with Google
The example below uses Google Speech Recognition engine, which I’ve tested for the English language.

For testing purposes, it uses the default API key.
To use another API key, use

 
`r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`

Copy the code below and save the file as speechtest.py.
Run it with Python 3.

#!/usr/bin/env python3                                                                                

import speech_recognition as sr

# get audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Speak:")
audio = r.listen(source)

try:
print("You said " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Could not understand audio")
except sr.RequestError as e:
print("Could not request results; {0}".format(e))

You could try the examples below:
Download Speech Recognition examples