The k-nearest neighbors or simply KNN algorithm represents an easy-to-use supervised machine learning tool that can aid you in solving both classification and regression problems.

So, what does actually mean a KNN algorithm? Well, a supervised machine learning algorithm represents a tool that depends on labelled input data in order to learn a function that leads to an accurate output when offered new data.

Related course: Complete Machine Learning Course with Python


Classification problems have a certain value as its output. As an example, he likes cheese on his sandwich vs. he does not like cheese on his sandwich are discrete values. Consider the following:

Age Likes cheese on sandwich
32 1
12 0
27 1
68 1
76 1
20 0

So, with this example in mind, imagine it as a simple example of what classification data looks like.

Thus, we have a predictor, the age, and a label. With this data, we can establish whether or not someone likes cheese based on their age.


In other words, we make use of the predictor in order to predict a label.

On the other hand, a regression problem refers to a real number as its output. Take into account this example:

Height (Inches) Weight (Pounds)
65.78 112.99
69.80 141.49
71.52 136.49

The data you will need for a regression analysis will be similar to the one in the example above. Hence, a regression analysis will make use of an independent variable and a dependent variable. For our example, the independent variable is height, while the dependent variable is weight.

Supervised machine learning tries to acknowledge a function that will permit us to make predictions with the aid of new unlabelled data, while unsupervised learning tries to gain basic understanding of the structure of the data in order to provide more insight on the mentioned data.

K-Nearest Neighbors

The basic principle on which the KNN algorithm functions is the fact that it presumes similar things exist in close proximity to each other.

The idea of similarity is also referred to as distance or proximity, can be establish by making use of basic mathematics in order to calculate distance between points. But, keep in mind that there are a variety of ways in which distance can be established, while the most popular is the Euclidean distance.

How does the KNN Algorithm function?

  • The data is loaded
  • Start K to your preferred number of neighbors
  • For every example in your data:
    • Establish the distance between the query example and the available example from the data
    • Provide the distance and the index of example to a collection
    • Sort the data in ascending order according to the distances
    • Select the primary K entry from the arranged collection
    • If regression occurs, return the mean of the K labels
    • If classification, return the mode of K labels

One major disadvantage of this algorithm is the fact that if the volume of data increases, the algorithm becomes significantly slower, which makes it not suitable for situations were predictions must be done at a higher rate. However, KNN is useful in solving problems that have solutions that rely on discovering similar objects.

Download examples