Speech recognition is the process of converting spoken words to text. Python supports many speech recognition engines and APIs, including Google Speech Engine, Google Cloud Speech API,
Microsoft Bing Voice Recognition and IBM Speech to Text.

In this tutorial we will use Google Speech Recognition Engine with Python.

Related course: Complete Machine Learning Course with Python


A library that helps is named “SpeechRecognition”. You should install it with pyenv, pipenv or virtualenv. You can also install it system wide:

pip install SpeechRecognition

The SpeechRecognition module depends on pyaudio, you can install them from your package manager.
On Manjaro Linux these packages are called “python-pyaudio” and “python2-pyaudio”, they may have another name in your system.

Speech Recognition demo
You can test the speech recognition module, with the command:

python -m speech_recognition

Results show in terminal.

Speech Recognition with Google
The example below uses Google Speech Recognition engine, which I’ve tested for the English language.

For testing purposes, it uses the default API key.
To use another API key, use

`r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`

Copy the code below and save the file as speechtest.py.
Run it with Python 3.

#!/usr/bin/env python3                                                                                

import speech_recognition as sr

# get audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)

print("You said " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Could not understand audio")
except sr.RequestError as e:
print("Could not request results; {0}".format(e))

If you are new to Machine Learning, I highly recommend this book

Download examples