kmeans clustering algorithm

Do you have observed data?

You can cluster it automatically with the kmeans algorithm.

In the kmeans algorithm, k is the number of clusters.

Clustering is an _unsupervised machine learning task. _ Everything is automatic.

Related course: Complete Machine Learning Course with Python

kmeans data

We always start with data. This is our observed data, simply a list of values.
We plot all of the observed data in a scatter plot.

 # clustering dataset
from sklearn.cluster import KMeans
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

x1 = np.array([3, 1, 1, 2, 1, 6, 6, 6, 5, 6, 7, 8, 9, 8, 9, 9, 8])
x2 = np.array([5, 4, 6, 6, 5, 8, 6, 7, 6, 7, 1, 2, 1, 2, 3, 2, 3])

plt.plot()
plt.xlim([0, 10])
plt.ylim([0, 10])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()

Result:
kmeans dataset

kmeans clustering example

We will cluster the observations automatically.

K can be determined using the elbow method, but in this example we’ll set K ourselves.

Note: K is always a positive integer. We cannot have -1 clusters (k).

The k-means clustering algorithms goal is to partition observations into k clusters.

Each observation belong to the cluster with the nearest mean.

 # clustering dataset
from sklearn.cluster import KMeans
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

x1 = np.array([3, 1, 1, 2, 1, 6, 6, 6, 5, 6, 7, 8, 9, 8, 9, 9, 8])
x2 = np.array([5, 4, 6, 6, 5, 8, 6, 7, 6, 7, 1, 2, 1, 2, 3, 2, 3])

plt.plot()
plt.xlim([0, 10])
plt.ylim([0, 10])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()

# create new plot and data
plt.plot()
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
colors = ['b', 'g', 'r']
markers = ['o', 'v', 's']

# KMeans algorithm 
K = 3
kmeans_model = KMeans(n_clusters=K).fit(X)

plt.plot()
for i, l in enumerate(kmeans_model.labels_):
    plt.plot(x1[i], x2[i], color=colors[l], marker=markers[l],ls='None')
    plt.xlim([0, 10])
    plt.ylim([0, 10])

plt.show()

Result:
kmeans clustering algorithm

If you see the above result, Kmeans has clustered the observations automatically.

Download Machine Learning examples