bag of words euclidian distance
If we represent text documents as feature vectors using the bag of words method, we can calculate the euclidian distance between them.
Vectors always have a distance between them, consider the vectors (2,2) and (4,2). We can use the euclidian distance to automatically calculate the distance.
Because we represent the text as vectors, this tells us how similar the text documents are.
We start with the corups, then calculate the feature vectors from the corpus and finally calculate the euclidian distance. In this example we compare everything to the first document.
# Feature extraction from text