Easy Speech-to-Text with Python

Easy Speech-to-Text with Python

Speech is the most common means of communication and the majority of the population in the world relies on speech to communicate with one another. Speech recognition system basically translates spoken languages into text. There are various real-life examples of speech recognition systems. For example, Apple SIRI which recognize the speech and truncates into text.

How does Speech recognition work?

Image for post
Speech Recognition process

Hidden Markov Model (HMM), deep neural network models are used to convert the audio into text. A full detailed process is beyond the scope of this blog. In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.

Speech Recognition API supports several API’s, in this blog I used Google speech recognition API. For more details, please check this. It helps to translate for converting speech into text.


Python Libraries

Convert an audio file into text

Steps:

  1. Import Speech recognition library
  2. Initializing recognizer class in order to recognize the speech. We are using google speech recognition.
  3. Audio file supports by speech recognition: wav, AIFF, AIFF-C, FLAC. I used ‘wav’ file in this example
  4. I have used ‘taken’ movie audio clip which says “I don’t know who you are I don’t know what you want if you’re looking for ransom I can tell you I don’t have money”
  5. By default, google recognizer reads English. It supports different languages, for more details please check this documentation.

Code

Output

Image for post

How about converting different audio language?

For example, if we want to read a french language audio file, then need to add language option in the recogonize_google. Remaining code remains the same. Please refer more on the documentation

Output

Image for post

Microphone speech into text

Steps:

  1. We need to install PyAudio library which used to receive audio input and output through the microphone and speaker. Basically, it helps to get our voice through the microphone.

2. Instead of audio file source, we have to use the Microphone class. Remaining steps are the same.

Code

I just talked “How are you?”

Output

Image for post

How about talking in a different language?

Again, we need to add the required language option in the recognize_google(). I am talking in Tamil, Indian language and adding “ta-IN” in the language option.

I just said “how are you” in Tamil and it prints the text in Tamil accurately.

Output

Image for post

Note:

Google speech recognition API is an easy method to convert speech into text, but it requires an internet connection to operate.

In this blog, we have seen how to convert the speech into text using Google speech recognition API. This would be very helpful for NLP projects especially handling audio transcripts data. If you have anything to add, please feel free to leave a comment!

Thanks for reading. Keep learning and stay tuned for more!