tag: machine learning | Python Tutorial

Tag: machine learning

Machine Learning

What is Machine Learning?
The word ‘Machine’ in Machine Learning means computer, as you would expect. So how does a machine learn?

Given data, we can do all kind of magic with statistics: so can computer algorithms.

These algorithms can solve problems including prediction, classification and clustering. A machine learning algorithm will learn from new data.

Related course:

Types of learning

There are two types of learning: supervised learning and unsupervised learning.. Say what?

Supervised learning

Let’s suppose we have consumer data. I tell the computer: these customers have a high income, those customers have median income. The training phase.
Then we can ask this computer:

 
You: Does this customer have a high or median income?
Computer: Based on the training data, I predict a high income.

Python code

So does training data have to be large and complex? No, this is also works for small data sets.

Take for instance, this set:

x = [[2, 0], [1, 1], [2, 3]]
y = [0, 0, 1]

So what does this mean:

  • Look at y, there are two possible outputs. Either it’s class 0 or class 1.
  • Then x are the measurements.

The trainining data (x,y) is then used to feed the algorithm, with the feed() method.
n

your_amazing_algorithm.fit(x, y)

Then if you have new measurements, you can predict the output (class 0 or class 1).

print (clf.predict([[2,0]]))

Unsupervised learning

With unsupervised learning algorithms, you have no idea. You give the data to the computer and expect answers. Surprisingly, these work rather well.

If you have data points x, where each value of x is a two dimensional point.
Want to make predictions?

Load an algorithm:

kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

Predict:

kmeans.predict([[12, 3]])

Yes, it can be that easy.

Download examples

Machine Learning Classifier

Machine Learning Classifiers can be used to predict. Given example data (measurements), the algorithm can predict the class the data belongs to.

Start with training data. Training data is fed to the classification algorithm. After training the classification algorithm (the fitting function), you can make predictions.

Related course:

Machine Learning Classification

In the example below we predict if it’s a male or female given vector data.

We start with training data. In this example we have a set of vectors (height, weight, shoe size) and the class this vector belongs to:

#{height, weights, shoe size}
X = [[190,70,44],[166,65,45],[190,90,47],[175,64,39],[171,75,40],[177,80,42],[160,60,38],[144,54,37]]
Y = ['male','male','male','male','female','female','female','female']

Define a vector for your prediction in the same format (height, weight, size). If you want, you can also get this from console input:

P = [[190,80,46]]

Then we fit the training data and predict in this style:

c = Classifier()
c = c.fit(X,Y)
print "\nPrediction : " + str(c.predict(P))

predict

That gives us this code:

from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier

#{height, weights, shoe size}
X = [[190,70,44],[166,65,45],[190,90,47],[175,64,39],[171,75,40],[177,80,42],[160,60,38],[144,54,37]]
Y = ['male','male','male','male','female','female','female','female']

#Predict for this vector (height, wieghts, shoe size)
P = [[190,80,46]]

#{Decision Tree Model}
clf = DecisionTreeClassifier()
clf = clf.fit(X,Y)
print "\n1) Using Decision Tree Prediction is " + str(clf.predict(P))

#{K Neighbors Classifier}
knn = KNeighborsClassifier()
knn.fit(X,Y)
print "2) Using K Neighbors Classifier Prediction is " + str(knn.predict(P))

#{using MLPClassifier}
mlpc = MLPClassifier()
mlpc.fit(X,Y)
print "3) Using MLPC Classifier Prediction is " + str(mlpc.predict(P))

#{using MLPClassifier}
rfor = RandomForestClassifier()
rfor.fit(X,Y)
print "4) Using RandomForestClassifier Prediction is " + str(rfor.predict(P)) +"\n"

Download examples

bag of words

If we want to use text in Machine Learning algorithms, we’ll have to convert then to a numerical representation. It should be no surprise that computers are very well at handling numbers.

We convert text to a numerical representation called a feature vector. A feature vector can be as simple as a list of numbers.

The bag-of-words model is one of the feature extraction algorithms for text.

Related course:

Feature extraction from text

The bag of words model ignores grammar and order of words.
We start with two documents (the corpus):

‘All my cats in a row’,
‘When my cat sits down, she looks like a Furby toy!’,

A list in then created based on the two strings above:

{‘all’: 0, ‘cat’: 1, ‘cats’: 2, ‘down’: 3, ‘furby’: 4, ‘in’: 5, ‘like’: 6, ‘looks’: 7, ‘my’: 8, ‘row’: 9, ‘she’: 10, ‘sits’: 11, ‘toy’: 12, ‘when’: 13 }

The list contains 14 unique words: the vocabulary. That’s why every document is represented by a feature vector of 14 elements. The number of elements is called the dimension.

Then we can express the texts as numeric vectors:

[[1 0 1 0 0 1 0 0 1 1 0 0 0 0]
[0 1 0 1 1 0 1 1 1 0 1 1 1 1]]

Lets take a closer look:

‘All my cats in a row’ = [1 0 1 0 0 1 0 0 1 1 0 0 0 0]

If we follow the order of the vocabulary:

bag of words, feature extraction

we’ll get a vector, the bag of words representation.

Bag of words code

We’ll define a collection of strings called a corpus. Then we’ll use the CountVectorizer to create vectors from the corpus.

  # Feature extraction from text
# Method: bag of words
# https://pythonprogramminglanguage.com

from sklearn.feature_extraction.text import CountVectorizer

corpus = [
'All my cats in a row',
'When my cat sits down, she looks like a Furby toy!',
'The cat from outer space',
'Sunshine loves to sit like this for some reason.'
]

vectorizer = CountVectorizer()
print( vectorizer.fit_transform(corpus).todense() )
print( vectorizer.vocabulary_ )


Download examples

bag of words euclidian distance

If we represent text documents as feature vectors using the bag of words method, we can calculate the euclidian distance between them.

Vectors always have a distance between them, consider the vectors (2,2) and (4,2). We can use the euclidian distance to automatically calculate the distance.

Related course:

Text similarity

Because we represent the text as vectors, this tells us how similar the text documents are.
 

We start with the corups, then calculate the feature vectors from the corpus and finally calculate the euclidian distance. In this example we compare everything to the first document.

  # Feature extraction from text
# Method: bag of words
# https://pythonprogramminglanguage.com

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import euclidean_distances

corpus = [
'All my cats in a row',
'When my cat sits down, she looks like a Furby toy!',
'The cat from outer space',
'Sunshine loves to sit like this for some reason.'
]

vectorizer = CountVectorizer()
features = vectorizer.fit_transform(corpus).todense()
print( vectorizer.vocabulary_ )

for f in features:
print( euclidean_distances(features[0], f) )

Download examples

Decision tree

A decision tree is one of the many machine learning algorithms. A decision tree is a decision tool. Its similar to a tree-like model in computer science. (root at the top, leaves downwards).

In this article we’ll implement a decision tree using the Machine Learning module scikit-learn. Its one of the many machine learning modules, TensorFlow is another popular one.

Related course:

Classifiers

Imagine writing a program that has to predict if a picture contains a male or female. You would have to write tons of programming rules. If I’d give you another group of two images, you’d have to create new programming rules all over again. Machine Learning is a better way to solve these problems.

Instead of programatically defining each rule, we use an algorithm that creates the rules for us. This type of algorithm is named a classifier. It takes data as input and shows a label as output.

A practical example of this would be, given an image of a person, the classifier would predict if it’s female or male.

The classifier has these steps:

  • collect data
  • train classifier
  • make predictions
    supervised learning
    We train the classifier by giving the algorithm data and labels. This type of machine learning is called supervised learning.

In this example we’ll use simple arrays as data. In practice you’d often want to have large datasets to make good predictions.

Decision Tree Visual Example

At every node of the tree, we can turn left or right. Based on numbers we walk the branches. At the end of branches are outcomes. Once the classifier is trained based on this data. We can then use the classifier to make predictions.

A graphical example of a decision tree:

decision-tree
Based on numeric input, a computer can decide the output. If its input would be [False, True], it would predict ‘You’.

Related course:

Install sklearn

If you have not installed sklearn, install it with

sudo pip install sklearn

also install scipy

sudo pip install scipy  

Decision tree

We import tree from sklearn and create the model

from sklearn import tree
clf = tree.DecisionTreeClassifier()

Then we create the training data for the classifier / decision tree:

#[height, hair-length, voice-pitch]                                             
X = [ [180, 15,0],
[167, 42,1],
[136, 35,1],
[174, 15,0],
[141, 28,1]]

Y = ['man', 'woman', 'woman', 'man', 'woman']

Putting it all together:
from sklearn import tree                                                        
clf = tree.DecisionTreeClassifier()

#[height, hair-length, voice-pitch]
X = [ [180, 15,0],
[167, 42,1],
[136, 35,1],
[174, 15,0],
[141, 28,1]]

Y = ['man', 'woman', 'woman', 'man', 'woman']

clf = clf.fit(X, Y)
prediction = clf.predict([[133, 37,1]])
print(prediction)
Download examples


1235