In Part 3 of this mini-series, we create the Azure Function “TextAnalyticsFunction”.
This function will be responsible for fetch tweet data from the container raw-tweets and using Azure Cognitive Services to surface insights in each tweet.
After each tweet has been processed, the function creates a new insights file.
The insights file is stored in the container processed-tweets.
The original is then archived in a container archived-tweets.
Prerequisites
To implement this, you will need:
- To have implemented Part 2 of this miniseries
- Visual Studio 2022
- Twitter API Developer Account
- Azure Subscription
Read Part2 of this mini-series if you need more information about these.
Process Overview
The sequence diagram below shows how this Azure Function (TextAnalyticsFunction) operates:
A timer trigger sends request to the Twitter API Recent Search API endpoint via the Social Opinion API/SDK.
A list of strongly typed tweet objects is then returned. This data is then serialized to JSON and pushed into a blob storage account in Azure “raw-tweets”.
Twitter Function Logic
The logic for the Azure Function is relatively simple.
- Fetch raw-tweet JSON from document in container raw-tweets
- Run text analytics via TextAnalyticsManager class
- Generates insights JSON
- Copy insights JSON file to processed-tweets container
- Copy raw tweet JSON file to archived-tweets container
- Delete original raw tweet JSON file from raw-tweets
You also need your Azure Cognitive Services for Language API secret/key.
TextAnalytics Function Code
To implement this Azure Function, references to the following NuGet packages are needed:
- Storage.Blobs
- NET.Sdk.Functions
- SocialOpinionAPI
- Azure.CognitiveServices.Language.TextAnalytics
You can see the bulk of the Text Analytics Function code here.
First, the blob storage container names and container connection string are defined as private properties at the class scope in the Azure Function:
// Name of the container where the tweets are stored private static string sourceContainerName = "raw-tweets"; // Name of the container where the processed tweets are stored (have text analytics applied) private static string targetContainerName = "processed-tweets"; // Name of the container where the archived tweets are stored (historical processed data) private static string archivedContainerName = "archived-tweets"; private static string containerConnectionString = "CONN-STRING";
The core method that is invoked every 30 minutes using a timer trigger:
private static void ProcessTweets() { BlobServiceClient serviceClient = new BlobServiceClient(containerConnectionString); BlobContainerClient sourceContainerClient = serviceClient.GetBlobContainerClient(sourceContainerName); BlobContainerClient targetContainerClient = serviceClient.GetBlobContainerClient(targetContainerName); BlobContainerClient archivedContainerClient = serviceClient.GetBlobContainerClient(archivedContainerName); Console.WriteLine("Fetching blobs..."); foreach (BlobItem blobItem in sourceContainerClient.GetBlobs()) { BlobClient sourceBlobClient = sourceContainerClient.GetBlobClient(blobItem.Name); BlobClient targetBlobClient = targetContainerClient.GetBlobClient(blobItem.Name); BlobClient archivedBlobClient = archivedContainerClient.GetBlobClient(blobItem.Name); Console.WriteLine("Perform text analytics on the tweet text"); // reads on text in the source blob item and perform text analytics BlobClient blobClient = sourceContainerClient.GetBlobClient(blobItem.Name); var response = blobClient.Download(); using (var streamReader = new StreamReader(response.Value.Content)) { while (!streamReader.EndOfStream) { var line = streamReader.ReadToEnd(); // cast to an object to help us grab the text property of the tweet SocialOpinionAPI.Models.RecentSearch.Datum datum = JsonConvert.DeserializeObject<SocialOpinionAPI.Models.RecentSearch.Datum>(line); // Run Text Analytics API / perform text analytics on datum.text string analyticsJSON = PerformTextAnalytics(datum.text); // upload analytics json to processed blob var analyticsResponse = targetContainerClient.UploadBlob(blobItem.Name, new MemoryStream(Encoding.UTF8.GetBytes(analyticsJSON))); if (analyticsResponse.GetRawResponse().Status == 201) { // upload original tweet to archived blob for historical purposes and traceability/option to get more data var archivedResponse = archivedContainerClient.UploadBlob(blobItem.Name, new MemoryStream(Encoding.UTF8.GetBytes(line))); if (archivedResponse.GetRawResponse().Status == 201) { // delete original tweet from raw blob to prevent process the same tweet > 1 times sourceBlobClient.Delete(); } } } } } }
The above code contains a method PerformTextAnalytics:
private static string PerformTextAnalytics(string tweetJson) { TextAnalyticsManager analyticsManager = new TextAnalyticsManager(); var insights = analyticsManager.GetInsights("1", tweetJson); return JsonConvert.SerializeObject(insights); }
Under the hood, PerformTextAnalytics invokes Cognitive Services for Language to leverage text analytics and AI for each tweet JSON file.
A custom object: TextAnalyticsInsight represents the main insights we are interested in:
public class TextAnalyticsInsight { public List<string> KeyPhrases { get; set; } public double SentimentScore { get; set; } public List<TextAnalyticsFunction.Entity.CognitiveServices.EntityRecord> EntityRecords { get; set; } public TextAnalyticsInsight() { this.EntityRecords = new List<TextAnalyticsFunction.Entity.CognitiveServices.EntityRecord>(); this.KeyPhrases = new List<string>(); } }
And here we have the main method that hydrates this object:
public TextAnalyticsInsight GetInsights(string documentid, string text) { // create the TextAnalyticsInsight object if (!string.IsNullOrEmpty(text)) { TextAnalyticsInsight textAnalyticsInsight = new TextAnalyticsInsight(); EntitiesResult entitiesResult = this.ProcessEntities(documentid, text); KeyPhraseResult keyPhraseResult = this.ProcessKeyPhrases(documentid, text); SentimentResult sentimentResult = this.ProcessSentiment(documentid, text); foreach (EntityRecord record in entitiesResult.Entities) { // convert azure object to our own object textAnalyticsInsight.EntityRecords.Add(new TextAnalyticsFunction.Entity.CognitiveServices.EntityRecord { Name = record.Name, SubType = record.SubType, Type = record.Type, WikipediaId = record.WikipediaId, WikipediaLanguage = record.WikipediaLanguage, WikipediaUrl = record.WikipediaUrl }); } foreach (string keyPhrase in keyPhraseResult.KeyPhrases) { textAnalyticsInsight.KeyPhrases.Add(keyPhrase); } textAnalyticsInsight.SentimentScore = sentimentResult.Score.Value; // map the CogServices models to our custom model TextAnalyticsInsight which // contains all TextAnalytics insights for the paramter Text return textAnalyticsInsight; } return new TextAnalyticsInsight(); }
When insights are generated, they are returned and stored in a Blob Storage Container.
Azure – Blob Storage Container
Two new blob storage containers are created in the storage account tweetdata to support the TextAnalytics function.
- archived-tweets
- processed-tweets
Archived-tweets contains processed that have already been processed by the Text Analytics function.
Processed-tweets contains newly generated JSON files with AI and text analytics insights generated using Azure Cognitive Services for Language.
Testing
Text Analytics Function can be tested in Visual Studio. We run the function and wait for the timer trigger to be invoked.
When the Azure Function Timer Trigger kicks into action, the console signals that data is being generated and text analytics AI are being applied to the text in each tweet:
We can examine the data in the container in Azure:
You can further check the contents by opening a JSON file to examine the insights that have been generated:
The Azure Function TextAnalyticsFunction is behaving as expected.
Summary
In Part 3 of this mini-series, you have seen how to implement an Azure Function that integrates with Azure Cognitive Services for Language.
You learned how this can be used to surface insights in twitter data. The data was then serialized in moved into Azure Blob Storage.
In Part 4 of this mini-series, we’ll explore some ideas on how these insights can be visualized.
Leave a Reply