Software Architect / Microsoft MVP (AI) and Technical Author

AI, ASP.NET Core, Audio Notes, Azure AI Services, C#, Speech, Text Analytics

Audio Notes: Creating an Interface to Record Content

This is part 6 of a “build in public” mini-series that shows how I’ve create a new SaaS MVP called Audio Notes.

This is an experiment that uses artificial intelligence, speech to text, and text analytics to automatically summarise audio and create concise notes from your recordings.

You can find more information about the project in these earlier blog posts.

  1. Introduction
  2. Using Azure AI Speech to Perform Continual Speech to Text
  3. Transcription Using Azure AI Language to Perform Document Summarization
  4. Blending Azure AI Speech and Azure Language to Create a Micro SaaS
  5. Creating an Interface to Browse Content

 

In this blog post (Part 6), a new UI is created to that lets you record a new audio note.

Read on to see how all this hangs together.

~

The Original Screen

The original screen was basic. You can see it here:

The server side functionality worked though, letting your start speech transcription, stop speech transcription and summarize the transcript.

It looks horrible.

New ‘Record a Note’ Screen

The look and feel has been modified and you can now save a summarised audio note:

You can read some of the earlier blog posts to learn how recognition is started, stopped, and how transcribed text is summarised.

Under the hood, the Save Note simple calls a controller which relays the request to a service class AudioNoteService:

[HttpPost]
public async Task<ActionResult> SaveNote(SpeechNoteViewModel model)
{
    AudioNoteService audioNoteService = new AudioNoteService(_dbContext);
    System.Security.Claims.ClaimsPrincipal currentUser = this.User;

    var id = _userManager.GetUserId(currentUser);

    //convert SpeechNoteViewModel to AudioNoteModel
    var audioNote = new AudioNoteModel
    {
        AspNetUser_id = id,
        NoteUid = model.Id,
        Title = model.Title,
        RecognisedText = model.RecognizedText,
        SummarisedText = model.SummarizedText,
        DateCreated = DateTime.Now
    };

    await audioNoteService.SaveAudioNoteAsync(audioNote);

    return RedirectToAction("YourNotes");
}

~

Audio Note Service

I hardly wrote any of the code for this class.  After typing a few characters, GitHub Copilot completed the rest for me:

Image

A good time saver.

The data is then saved to the database like you’d expect.

~

Securing the Azure AI Speech API Key

I don’t want to expose the Azure Speech services API key in the clientside html or JavaScript.

Fortunately, you can make a call to the following endpoint in Azure:

https://{speechRegion}.api.cognitive.microsoft.com/sts/v1.0/issueToken

 

Sending a POST request to this endpoint will return a short-lived access token that can be used by the client side JavaScript when performing real-time speech-to-text transcription.

You can this in action here:

public async Task<IActionResult> GenerateToken()
{
 var settings = _configuration.GetSection("ConfigSettings").Get<ConfigSettings>();

 var speechKey = settings.SpeechApiKey;
 var speechRegion = settings.SpeechRegion;     
       
 try
 {
     Response.Headers.Add("Access-Control-Allow-Origin", Request.Host.Value);

     using (var httpClient = new HttpClient())
     {
         var url = $"https://{speechRegion}.api.cognitive.microsoft.com/sts/v1.0/issueToken";

         httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", speechKey);

         // Send POST request with empty JSON body
         var response = await httpClient.PostAsync(url, new StringContent("{}", System.Text.Encoding.UTF8, "application/json"));

         var result = await response.Content.ReadAsStringAsync();
         return Content(result);
     }
 }
 catch (Exception ex)
 {
   
     return BadRequest("An error occurred: " + ex.Message);
 }
}

 

JavaScript runs periodically to invoke GenerateToken and create a token:

<script>
     var authorizationEndpoint = "/SpeechToText/GenerateToken";

     function RequestAuthorizationToken() {
         if (authorizationEndpoint) {
             var a = new XMLHttpRequest();

             a.open("GET", authorizationEndpoint);
             a.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
             a.send("");
             a.onload = function () {

                 var token = JSON.parse(atob(this.responseText.split(".")[1]));
                 authorizationToken = this.responseText;
             }
         }
     }
</script>

<script>
     (async () => {
         const renewToken = async () => {
             await RequestAuthorizationToken();
             if (!!reco) {
                 reco.authorizationToken = authorizationToken;
             }
             setTimeout(renewToken, 540000); // refresh token every 9 minutes.
         };
         await renewToken();
     })();
 </script>

 

The token is then bound to the speech configuration:

<script>
    function getSpeechConfig(sdkConfigType) {
        var speechConfig;

        if (authorizationToken) {
            speechConfig = sdkConfigType.fromAuthorizationToken(authorizationToken, 'uksouth');
        } else if (!authorizationToken) {
            alert("Unable to connect to AI cloud.");
            return undefined;
        } else {
            alert("Unable to connect to AI cloud.");
            return undefined;
        }

        speechConfig.speechRecognitionLanguage = 'en-US';

        return speechConfig;
    }
</script>

 

Note: this MVP will only support speech-to-text transcription for English.  If it takes off, support for multiple languages can be easily added.

~

Next Steps

At this point, everything is in place to log in, logout, integrated with SendGrid for membership and email capabilities.

You can also view all existing notes, record a new note, and edit an existing audio note.

Next steps will involving securing API keys, publish to Azure App Service, and migrating the database to Azure SQL.

Stay tuned.

~

Further Reading

Previous blogs for this series can be found here:

  1. Introduction
  2. Using Azure AI Speech to Perform Continual Speech to Text
  3. Transcription Using Azure AI Language to Perform Document Summarization
  4. Blending Azure AI Speech and Azure Language to Create a Micro SaaS
  5. Creating an Interface to Browse Content

~

JOIN MY EXCLUSIVE EMAIL LIST
Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.

Leave a Reply