Software Architect / Microsoft MVP (AI) and Technical Author

AI, Azure, Azure AI Services, C#, Document Summarisation, Speech, Speech to Text, Text Analytics

How I Used AI To Consume a 2 Hour Lex Friedman Podcast in 10 Minutes

If you are reading this blog, you probably listen to one or many podcasts.

Some of my personal favourites include Seth Godin’s Akimbo, Lex Friedmans, Mike Tysons, and Andrew Hubermans.

Most of these podcasts span multiple hours, and despite wanting to listen to or watch them, I just don’t have that amount of free time.

~

The Idea

This got me thinking.  I could use AI to create a tool that would use speech-to-text to transcribe the audio.  I could then use AI to perform document summarisation on the transcription.

 

This would let me skim read the summarised points to get a feel for the main points, and outcome of the discussion points in the podcast.

 

I could use the summarisations to help me decide if the podcast was worth listening to.

~

CLI Tool

In this blog post, you will see how I built a CLI tool using a combination of Azure AI services to save me hours each week.

 

Specifically, you learn about:

  • Creating a CLI tool that accepts a podcast URL
  • Batch transcription of audio using speech to text
  • Using Azure AI Speech Services to batch transcribe an entire podcast
  • Generating a JSON file with the transcribed audio content
  • Generating summarised points from the transcript
  • Summarising a transcript using Azure AI Language Services.

 

 

The CLI tool was used to process a 2hr Lex Friedman and Sam Altman podcast in 20 mins.

I read the summary in 10 minutes, thereby saving me almost 2 hours.

~

Overview of The Process

The high-level end-to-end process consists of the following steps:

  1. Create the batch transcription request.
  2. Monitoring the status of the batch transcription.
  3. When complete, fetch the URL.
  4. Download the transcript.
  5. Pass content for summarisation.

~

Creating the Batch Transcription Request

The first step involves creating the batch transcription request.  You can see this in the code below.

 

In this code you can see the audio file URL is supplied.

 

You can also see request payload is created.  The time to live property instructs Azure AI speech services to persist the batch transcription for 12 hours in Microsoft Azure.

 

A default container is used to persist the transcription file.  You can specify your own if you prefer, this involves additional configuration.

 

When a batch transcription request is being created the location of the process is returned.  You can use this she performed periodic checks of the batch transcription operation.

static async Task Main(string[] args)
 {
     string subscriptionKey = "";
     string serviceRegion = ""; 

     // Construct the batch transcription request URL
     string requestUrl = $"https://{serviceRegion}.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions";


     // Audio files URLs to transcribe
     var audioFileUrls = new List<string>
     {
         "https://content.blubrry.com/takeituneasy/lex_ai_sam_altman_2.mp3"
     };

     // Construct the batch transcription request JSON payload
     var requestPayload = new
     {
         contentUrls = audioFileUrls,
         locale = "en-US",
         displayName = "Lex Friedman and Sam Altman",
         timeToLive = "PT12H",
         properties = new
         {
             wordLevelTimestampsEnabled = true,

         }
     };

     // Convert the payload to JSON string
     string requestBody = Newtonsoft.Json.JsonConvert.SerializeObject(requestPayload);

     // Prepare the HTTP request
     using var httpClient = new HttpClient();
     using var request = new HttpRequestMessage(HttpMethod.Post, requestUrl);

     // Set headers
     request.Headers.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
     request.Headers.Add("ContentType", "application/json");

     Console.WriteLine("Batch transcription request sent successfully.");

     // Set request content
     request.Content = new StringContent(requestBody, Encoding.UTF8, "application/json");

     // Send the request
     var response = await httpClient.SendAsync(request);

     // Check if the request was successful
     if (response.IsSuccessStatusCode)
     {
         // Get the URL for the batch transcription operation
         string operationLocation = response.Headers.GetValues("Location").FirstOrDefault();

         // Check the status of the batch transcription operation
         await CheckBatchTranscriptionStatus(operationLocation, subscriptionKey);
     }
     else
     {
         // Request failed
         string errorMessage = await response.Content.ReadAsStringAsync();
         Console.WriteLine($"Failed to send batch transcription request: {errorMessage}");
     }
 }

~

Running the Batch Transcription

Sending a request returns Jason in the following format.  The JSON contains the location of the process metadata on the status of the batch transcription process.  You can see that here in the console application

~

Monitoring the Batch Transcription

A man is implemented to periodically check the transcription job status fart in particular batch transcription ID.  The ID is known as the operation location.

You can see this method here:

private async Task CheckTranscriptionJobStatus(string operationLocation, string subscriptionKey)
{
    // Prepare the HTTP client with authentication
    using var httpClient = new HttpClient();
    httpClient.BaseAddress = new Uri(operationLocation);
    httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);

    // Loop until the operation is completed
    while (true)
    {
        // Send a GET request to the operation location URL
        var response = await httpClient.GetAsync(operationLocation);

        // Check if the request was successful
        if (response.IsSuccessStatusCode)
        {
            // Get the operation status
            var statusResponse = await response.Content.ReadAsStringAsync();
            Console.WriteLine($"Batch transcription operation status: {statusResponse}");

            // Check if the operation is completed
            if (statusResponse.Contains("\"status\":\"Succeeded\""))
            {
                break;
            }
        }
        else
        {
            // Request failed
            string errorMessage = await response.Content.ReadAsStringAsync();
            Console.WriteLine($"Failed to retrieve batch transcription status: {errorMessage}");
        }

        // Wait for a while before checking again (e.g., every 10 seconds)
        await Task.Delay(TimeSpan.FromSeconds(10));
    }

    Console.WriteLine("Batch transcription operation completed.");
}

 

Inspecting this method in the Visual Studio debugger shows the status for this batch transcription is still running:

~

Fetching the Batch Transcription Results and Report

After the batch transcription operation has completed successfully the results and an accompanying report are included.

The location of each of these as available and the following JSON nodes:

 

The following method is used to download these files and write them to the file system:

public async Task DownloadTranscriptionJobResults(string fileUrl, string filePath)
{
    // Prepare the HTTP client
    using var httpClient = new HttpClient();

    // Send a GET request to the file URL
    var response = await httpClient.GetAsync(fileUrl);

    // Check if the request was successful
    if (response.IsSuccessStatusCode)
    {
        // Read the response content as a stream
        using var stream = await response.Content.ReadAsStreamAsync();

        // Create a FileStream to write the downloaded file
        using var fileStream = File.Create(filePath);

        // Copy the response stream to the FileStream
        await stream.CopyToAsync(fileStream);
    }
    else
    {
        // Request failed
        string errorMessage = await response.Content.ReadAsStringAsync();
        Console.WriteLine($"Failed to download file: {errorMessage}");
    }
}

Results File

You can see some of the contents you can expect to find in the results file in the screenshots below: