Software Architect / Microsoft MVP (AI) and Technical Author

AI, Azure, Chatbots, Cognitive Services, Speech

Using Azure Text to Speech

Chatbots let you perform tasks such as interacting with business processes, accessing your data, or searching for information. Most of the time this is done with you sitting at the keyboard.  With newer voice technologies and SDKs, it’s becoming easier to augment your chatbots existing capabilities with speech services. Adding support for speech to your chatbot can create different opportunities and use cases.

I’ve been working with speech services for a few weeks now and in this blog post we look at Azure Speech services. Specifically, we’ll look at the Text to Speech (TTS) offering.

Standard, Neural and Custom Voices

You have a few different options when selecting the voice that you want the machine to speak.

  • Standard
  • Neural
  • Custom

Standard voices are synthetic voices. There is support for 75+ of these in over 45 languages. If you don’t want to use any of the predefined voices you also have the option of using a Neural voice.

Neural voices are powered by neural networks and are almost impossible to separate from human recordings. You can use neural voices to make interactions with chatbots more natural and engaging. One example might be reading an audio book. You obviously must consider responsible and ethical uses of AI like this.

Finally, Custom voice lets you supply your own audio which is used to train a speech model with machine learning.

Audio Output

When parsing text to speech there are a few different output formats for the audio.  You can synthesize the speech to a file, the speaker, or an in-memory stream.

File output can be useful if you need to create audio files for replaying later. Speaker output will give you instant output whereas the in-memory stream can be useful if you need to pass the raw audio byes to another system.

Demo

I’ve created a short demo on YouTube which shows you Text to Speech in action.  In this demo you’ll see me specifying the gender and accent.

I then supply some text and the machine speaks using the selected gender and accent. You can view the demo here.

Summary

In this blog we’ve introduced Azure Text to Speech. We’ve seen some of the options that are available when it comes to generating synthetic speech. We’ve also touched on some of the output options for any speech that is generated.

You can view the demo here.

JOIN MY EXCLUSIVE EMAIL LIST
Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.

Leave a Reply