Software Architect / Microsoft MVP (AI) and Pluralsight Author

Azure Communication Services, Bot Framework, Chatbots, Cognitive Services, Machine Learning

What caught my eye at Microsoft Ignite? – 3 Letters. ACS.

What Are Azure Communication Services (ACS)?

ACS lets you add telephony to features to your applications.  This can include voice, chat, SMS and more.

Why? – My Current pain point Chatbots and Telephony

I do a lot of chatbot development using the Microsoft Bot Framework. Adding speech capabilities to chatbots isn’t always straightforward.

Provisioning the services themselves in Azure is straightforward – but the integration of each component within an end-to-end chatbot solution involves stitching together various services and writing a lot of code to orchestrate a smooth conversational experience.

A typical chatbot solution might involve the following:

  • Azure Speech to Text
  • Azure Text to Speech
  • Azure Bot Service
  • Bot Framework SDK
  • Bot Framework Composer
  • Natural Language Understanding (LUIS)

 

You might also perform real-time sentiment analysis of the human’s voice or tonality and adjust your bot responses in real-time – again adding more complexity.

Surfacing a chatbot over telephony channel involves further integration and presents its own set of challenges. For example – integration with telephony providers such as Twilio.  The following shows a typical flow and the main components:

If you’re comfortable development bots the column with the Azure components is straightforward enough. It’s the integration of the telephony aspects that are a real pain.

I know because I’ve been working through that for the last few weeks. Webhooks, parsing audio, transcribing raw socket data, and rehydrating audio bytes are just some of the things you need to do.

Enter ACS

The diagram above is massively simplified.  A lot takes place in that middle column and this is where ACS comes into play. You provision the ACS service, purchase the phone number, and ACS places a phone number in front of you chatbot. To do this a new Telephony Channel is available in the Azure Bot Service Channels screen:

You add the Telephone channel to your bot and set the relevant ACS keys and your chatbot is now activated to support telephony.  You still need to modify your bot and dialogues to generate SSML in each of the activities.

For example, here we can see a simple method that does this:

protected override async Task OnMembersAddedAsync(IList < ChannelAccount > membersAdded, ITurnContext < IConversationUpdateActivity > turnContext, CancellationToken cancellationToken)

{

  var welcomeText = "Hello and welcome!";

  foreach(var member in membersAdded)

  {

    if (member.Id != turnContext.Activity.Recipient.Id)

    {

      await turnContext.SendActivityAsync(

        MessageFactory.Text(

          welcomeText,

          SimpleConvertToSSML(

            welcomeText,

            "Microsoft Server Speech Text to Speech Voice",

            "en-US")

        ),

        cancellationToken);

    }

  }

}

private string SimpleConvertToSSML(string text, string voiceId, string locale)

{

  try

  {

    string ssmlTemplate = $ "<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='{locale}'><voice name='{voiceId}'>{text}</voice></speak>";

    return ssmlTemplate;

  } catch (Exception ex)

  {

    throw ex;

  }

}

Note: At the time of writing only US regions are supported and you can find more code examples here.

What Does All This Mean?

I’ve went through a bit of pain to place a phone number in front of a chatbot. Mainly because many services must be integrated. Some telephony providers send audio in different forms than Speech to Text expects so additional configuration is needed etc.

The recent introduction of the Bot Framework telephony channel + ACS means that I can remove the manual and low-level integration code I’ve had to build.

This is code that’s responsible for:

  • setting up and maintaining web sockets
  • processing audio IO
  • converting audio to byte arrays
  • maintaining a phone connection

 

Integrating ACS with means that you can have add telephony capabilities to your chatbot with a programming model you’re already familiar with. It also helps keep everything under one roof.  This diagram shows how this can all hang together:

In the above you can see the telephony and webchat channels are exposed.  Other main components are:

  • Speech to Text (STT) is used to understand the humans voice from the phonecall
  • Dialogs used by the chatbot are built either using code first (C#) or Composer (Adaptive)
  • Text to Speech takes the bots response and sends the back over the phone to the human

At the time of writing ACS and Bot Framework integration is in preview mode and only has support for US numbers.

Summary

In this blog post we’ve introduced ACS and how it lets you place a phone number in front of your chatbot. We’ve also seen how this can hang together with other products and services such as Bot Framework Composer, Azure Cognitive Services Speech to Text and Text.

Further Resources

You can find out more information and additional code examples here:

https://news.microsoft.com/ignite-march-2021-book-of-news/

A full code example is here.

JOIN MY EXCLUSIVE EMAIL LIST
Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.

Leave a Reply