Software Architect / Microsoft MVP (AI) and Technical Author

Analytics and Big Data, Architecture, C#, Chatbots, Cognitive Services, Facebook API, Instagram Graph API, Machine Learning, MVC, Sentiment Analysis, Social Media, Twitter

Instagram Graph API – Part 4: Using Azure Cognitive Services to surface insights in Instagram Data

In part 3 of this miniseries, we saw how you can create a C# API that connects to and extracts image related insights from Instagram by using the Insights API.

We looked at:

  • create an extendable architecture
  • make requests to the Insights API
  • deserialize the Instagram Graph API response to custom objects (DTOs)
  • convert the DTOs to custom Entities

We also discussed some of the benefits of the approach and were able to successfully test and extract values such as:

  • the number of likes an image received
  • the number of impressions an image received
  • the number of comments an image received
  • the image caption and comments

To recap, we’re covering the following topics in this 4-part mini-series.

If you haven’t followed the first 3 articles, it’s best to have a quick read to get up-to-speed as what’s about to make sense may not be clear!


In this installment of the miniseries, we’ll build on the API we’ve been creating and integrate Azure Cognitive Services APIs.  This will help us surface additional insights in the data we’ve been extracting.

Specifically, we’ll see how to use:

  • Text Analytics to surface Entities, Keyphrases, and Sentiment being expressed in the comments associated with each image
  • Computer Vision to detect the existence of objects in an image. We’ll see how Computer Vision can auto-generate human-readable descriptions and tags for each image

All of this will be integrated with our existing API (class library). By the end of this installment in the series, you’ll have a complete class library that:

  • can make requests to the Instagram Graph API
  • will shield you from low-level work that’s often needed when consuming 3rd party APIs
  • can be easily referenced in a software project of your choice
  • uses AI to surface text and image related insights
  • can be easily extended to include other integrates such as Facebook or Twitter
  • is easily tested using test automation tools

Before we get into the low-level detail of it, we should introduce Azure Cognitive Services.

Introducing Azure Cognitive Services

Cognitive Services are a collection of APIs that live in Azure that makes it easy for developers to fuse AI capabilities into existing software applications.  You have several integration options, can be used at scale and give you access to world-class AI.

The following categories are available:

  • Decision
  • Language
  • Speech
  • Vision
  • Web Search

Each category contains collections of APIs that make it easy for developers to consume rich AI capabilities and can accelerate your development time.  No Ph.D. required!

For example, in 2013 I spent a lot of time creating a custom API that could perform sentiment analysis and apply text analytics to social media data.  I ended up creating a classifier based on Bayesian Theorem to perform the sentiment analysis piece and implemented a POS Tagger (part of speech) to identify specific types of keywords.  Doing this involved several tasks such as:

  • Obtaining training data
  • Cleansing the training data
  • Building a data model
  • Coding the API
  • Testing the results

There were a few iterations of this, and it took quite a bit of development.  In the years that followed and as Cognitive Services started to grow, I swapped out my custom solution for one of the Cognitive Services called Text Analytics API.


Which Cognitive Services will we use?

We’ll be using APIs that belong to the Language and Vision categories:

To consume each API, we’ll use the dedicated SDKs that are available for free.  Each SDK will give us access to a nice object model to work with that our class library can use.  The SDK’s will also save development time as we won’t need to create additional objects (DTOs) to represent the JSON being returned from each API.

Text Analytics API

This API contains a collection of features that let you detect sentiment, key phrases, named entities and language from your text.

For example, in the screenshot below, you can see the results of processing the text “I had a wonderful trip to Seattle last week and even visited the Space Needle 2 times!” :

The Text Analytics API has been able to identify important information, the underlying emotion (positive) and the existence of named/linked entities.

Being able to surface insights like this can give your application richer reporting functionality and help you identify important /linked topics.  Alternatively, you can also use the API to identify potentially contentious issues in corporate email or website feedback.

How will we use Text Analytics?

We’ll use this API to process the text found in the caption and comments associated with an Instagram image.  Specifically, we’ll use to API to:

  • detect the underlying emotion being expressed (sentiment analysis)
  • surface keywords and phrases
  • identify the existence of an entities

This will give us an additional level of insight that can be valuable from a reporting perspective and paves the way for other possibilities.

For example, by tapping into the sentiment being expressed for a given keyword or phrase, a brand advocate could be identified.

Computer Vision API

It can often be difficult to get the machine to describe what’s included in image-related content, this is where the Computer Vision API comes into play.

With the Computer Vision API, you supply an image from the file system or a URL, the API will then extract rich information which includes things such as the objects that exist, a human-readable description of the image, the existence of human faces and more.

To build this type of functionality in the past, you might have had to supplement your image with an accompanying metadata file that described the image, then parsed that data as you processed your image.  Alternatively, you might have had to create your own image classifier (no easy feat).

The Computer Vision saves you from having to do all of this as it comes pre-trained with many domains out of the box.  All you need to do is provision the API and send the image!

How will we use Computer Vision?

We’ll use this API to help us identify the following information in Instagram images:

  • The existence of brands in the image
  • Human-readable description of the image
  • Objects present in the image
  • Human readable tags that best describe the image

By using the API, we’ll fuse vision capabilities into our project which will help easily process and label Instagram image data.

How will we consume Text Analytics and Computer Vision?

To consume each API, we’ll use the dedicated SDKs that are available for free.  We’ll integrate these SDKs with custom classes that encapsulate the functionality we need.

These classes will be called:

  • TextAnalyticsManager
  • ComputerVisionManager

Under the hood, each of these classes will make calls to the respective APIs via the dedicated SDKs.  Using the SDK’s will save us development and testing time as we won’t need to create low-level REST requests or parsing out the JSON responses.

We’ll then use object models provided by each SDK and convert these to custom objects that represent the unique values that we’re interested in.

Extending the existing Project Structure

We’ve introduced Azure Cognitive Services, the APIs we’re going use and how we’ll use them.  We need to update the existing project structure to handle this.  To recap, our existing project structure looks like this:

You can check out Part 3 again if you need a recap on the initial architecture, responsibility of each project and classes within.

Adding Text Analytics and Computer Vision SDKs

First, we need to do add support for Text Analytics and Computer Vision APIs to our Insta.Graph.API project.  This can be done by adding NuGet references for each API.

  • Text Analytics API SDK can be found here.
  • Computer Vision API SDK can be found here.

After you’ve added these, the Dependencies node in Visual Studio should look like this:

With the references now added, we can now start to extend our project to support the functionality that Text Analytics and Computer Vision provide. This means updating our existing DTO, Logic, and Entities. Let’s look at these now.

DTOs, Logic, and Entities

We covered the rationale of encapsulation, decoupling our objects and separating the responsibility of objects in Part 3 of the miniseries. We’ll do the same again here.

Domain Transfer Objects (DTOs)

In Part 3 we had to create custom objects to represent the messages we got back from the Instagram Graph API.

This time, however, we’re using Cognitive Services dedicated SDKs to interact with the Text Analytics and Computer Vision APIs.  These SDKs ship with their own unique object models, this means we don’t need to create our own custom objects!


We need 2 new classes in our project to support interaction with the Text Analytics and Computer Vision APIs. We also need to extend the existing InstagramManager class.  Let’s look at these in more detail now.


This class encapsulates all required methods that are needed to interact and exchange data with the Text Analytics API that include:

  • Authentication with Azure
  • Processing Key Phrases
  • Processing Sentiment
  • Processing Entities
  • Converting and consolidating keyphrases, sentiment and entity information from the SDK objects a custom object called TextAnalyticsInsight

Data from this class is used by InstagramManager.


This class encapsulates all required methods that are needed to interact and exchange data with the Computer Vision API that include:

  • Authentication with Azure
  • Surfacing insights in images
  • Converting and consolidating image insights from the SDK objects to a custom object called ComputerVisionInsight

Data from this class is used by InstagramManager.


This class is extended to contain two new methods that call the respective functionality within the  TextAnalyticsManager and ComputerVisionManager classes. Two new methods are added to this class:

  1. GetTextAnalyticsInsight
  2. GetComputerVisionInsightAsync

Each method returns an entity that contains all text and image insights that we’re interested in.


We’re using two main entities to represent the information returned by the Text Analytics and Computer Vision APIs called:

  • TextAnalyticsInsight
  • ComputerVisionInsight

There is also third entity EntityRecord which is used by TextAnalyticsInsight to model additional insights.

You might be wondering why not just use the objects supplied by each SDK as the entities for our project (and you could do that) – our entities live in a separate project however called Insta.Graph.Entity.

This is a deliberate design decision.

Doing this means we can share the output .dll for this project with other applications and systems with little to no dependencies. It also keeps the code clean and easy to maintain.


Here you can see the entire code for the TextAnalyticsInsight object:

public class TextAnalyticsInsight
    public List<string> KeyPhrases { get; set; }
    public double SentimentScore { get; set; }
    public List<EntityRecord> EntityRecords { get; set; }

It’s straight forward enough with a collection of strings to represent keyphrases, a number to hold the sentiment score and a collection of EntityRecords to stored information that represents any identified Entities.


In terms of the class EntityRecord, here is the source code for this object:

public class EntityRecord
    public string Name { get; set; }
    public string WikipediaLanguage { get; set; }
    public string WikipediaId { get; set; }
    public string WikipediaUrl { get; }
    public string Type { get; set; }
    public string SubType { get; set; }


Here we can see the code that represents a computer vision insight.  A class Item is used to represent each data item for tags, brands or detected objects in an image.

Each instance of this the Item class has a corresponding confidence scoring as to the accuracy of the identified tag, brand or object.

public class ComputerVisionInsight
    public string ImageDescription { get; set; }
    public List<Item> Tags { get; set; }
    public List<Item> Brands {get;set;}
    public List<Item> DetectedObjects { get; set; }

    public ComputerVisionInsight()
        Tags = new List<Item>();
        Brands = new List<Item>();
        DetectedObjects = new List<Item>();

public class Item
    public string Name { get; set; }
    public double Confidence { get; set; }

Updated Project Structure

Taking stock of everything so far, we have:

  1. Added Cognitive Services.
  2. Created classes to manage the interaction with the Text Analytics and Computer Vision APIs (TextAnalyticsManager and ComputerVisionManager).
  3. Created new entities to model the text and vision insights we’re interested in.

This hasn’t affected too much in terms of the folder layout though. Our revised project now looks like this:

We’re now at the point where everything is almost in place. Now is a good time to take a closer look at the changes we need to make in our project’s main API logic.

Bringing it all together

The main changes we need to make are to the InstagramManager class.  We need to integrate the functionality from the TextAnalyticsManager and ComputerVisionManager classes as these deliver the AI insights that we’re interested in.


To recap, we’re looking to extract the following text analytics and computer vision insights:

  • underlying emotion being expressed (sentiment analysis) in the image caption and comments
  • keywords and phrases in the image caption and comments
  • identify the existence of entities in the image caption comments
  • existence of brands in an image
  • human-readable description of an image
  • objects present in an image
  • human-readable tags that best describe an image

Process and Workflow

To extract these insights, we follow a standard process which consists of the following steps:

  1. Invoke the Instagram Manager’s DoMediaSearch method
  2. Return a List of Media Entities (the image) and comments
  3. For Each entity in the List of Media Entities
    1. Get Text Analytics Insights
    2. Get Computer Vision Insights
    3. Add insights to the Media entity being processed
  4. Process Next media Item
  5. Return a list of hydrated Media Entities with all text and vision insights

The following UML Sequence diagram shows you how the flow of information and object activations when the InstagramManager Search and the AI classes are called.

What follows are the key methods that form the updated functionality in the Instagram Manager class.


I noticed a bug in this method in that duplicate objects were being added.  This has now been fixed and looks like the following:

private DTO.InstagramResult DoMediaSearch()
    // get the list of media items
    // parse out the response and the fields we want
    // convert to DTOs and return

    string mediaFields = "media%7Bmedia_url%2Cmedia_type%2Ccomments_count%2Clike_count%2Ctimestamp%2Cpermalink%2Ccaption%7D";
    string mediaSearchUrl = this.baseUrl + mediaFields + "&access_token=" + _token;

    List<InstagramResult> list = new List<InstagramResult>();

    //invoke the request
    string jsonResult = this.Get(mediaSearchUrl);

    // convert to json annotated object
    InstagramResult instagramResult = new InstagramResult();
    instagramResult = JsonConvert.DeserializeObject<InstagramResult>(jsonResult);

    if (instagramResult != null && != null)
        return instagramResult;

    return null;


This method does most of the heavy lifting in terms of hydrating our custom Media Entities with the accompanying values that we’re interested in.  This method is also responsible for hydrating the text analytics and computer vision insights for all images, image captions, and related comments.

You’ll notice the suffix Async has been added to this method and the method is now returning a Task which contains a List<Entity.Media> objects.  We’ve had to do this as our Computer Vision call only supports an async method.  Here you can see the entire source code for this method:

public async Task<List<Entity.Media>> GetMediaAsync()
    // invoke the private method - DoMediaSearch()
    InstagramResult instagramResults = this.DoMediaSearch();
    List<Entity.Media> mediaModels = new List<Entity.Media>();

    //map from the JSON/DTO returned by DoMediaSearch() to the Domain Entities
    foreach (MediaData mediaData in
        Entity.Media media = new Entity.Media
            id =,
            like_count = mediaData.like_count,
            caption = mediaData.caption,
            comments_count = mediaData.comments_count,
            impression_count = GetMediaImpressionValue(GetMediaImpressionsInsight(mediaData)),
            media_url = mediaData.media_url,
            permalink = mediaData.permalink,
            timestamp = mediaData.timestamp,
            DateCreated = mediaData.DateCreated

        // run text analytics over the caption field
        media.CaptionInsights = GetTextAnalyticsInsight(, media.caption);

        // get comments and associated AI insights
        media.Comments = GetMediaCommentsEntities(mediaData);
        foreach (Comment comment in media.Comments)
            // run text analytics over text comments
            comment.TextAnalyticsInsight = GetTextAnalyticsInsight(, comment.text);

        // get image insights
        media.VisionInsights = await GetComputerVisionInsightAsync(media.media_url);
        // finally, add the fully hydrated object to our list and return it
    return mediaModels;

From the above code, you’ll see that we loop through each image, get the associated impressions and called our respective helper methods which in turn fetch the text analytics and computer vision insights.  A strongly typed list of Media Entities are then returned to the caller (mediaModels).


This method contains code that makes a call to the TextAnalyticsManager class.  It returns all associated text analytics insights for the image being processed:

public TextAnalyticsInsight GetTextAnalyticsInsight(string documentid, string text)
    TextAnalyticsManager textAnalytics = new TextAnalyticsManager();

    return textAnalytics.GetInsights(documentid, text);


This method contains code that makes a call to the ComputerVisionManager class.  It returns all associated image analytics for the image being processed:

public async Task<ComputerVisionInsight> GetComputerVisionInsightAsync(string imageUrl)
    ComputerVisionManager visionManager = new ComputerVisionManager();

    return await visionManager.GetImageInsightsAsync(imageUrl);

Invoking our updated API

With everything in place, we can now invoke our API. I’m using the following images from my Instagram test account:

Examining the debugger session

To test our API, I’m using a .NET Core Web Application which invokes the public method GetMediaAsync that belongs to the InstagramManager class.  The data returned from our API is then converted to a list of View Models (instaMediaVMs).  This list is then bound to the UI.

private async Task<List<MediaViewModel>> GetMediaViewModelsAsync()
    InstagramManager instagramManager = 
        new InstagramManager("YOUR INSTAGRAM KEY");

    List<Entity.Media> instaMedia = await instagramManager.GetMediaAsync();
    List<MediaViewModel> instaMediaVMs = new List<MediaViewModel>();

    foreach (Entity.Media m in instaMedia)
        MediaViewModel mvm = new MediaViewModel
            id =,
            media_url = m.media_url,
            like_count = m.like_count,
            impression_count = m.impression_count,
            comments_count = m.comments_count,
            permalink = m.permalink,
            DateCreated = m.DateCreated
        foreach (Entity.Comment c in m.Comments)
          mvm.Comments.Add(new CommentViewModel { id =, text = c.text });
    return instaMediaVMs;

When we step into the method GetMediaAsync, we can see that 3 Media Data DTO’s are returned from the call DoMediaSearch:

As we loop through each Media Data DTO, we create a Media Entity – hydrating each property as we go.

These properties contain values such as – the image caption, number of likes, comments, and impressions.   It’s in this method that we also surface the Text Analytics and Computer Vision insights.

For example, if we take a closer look at the DTO at position 1 in the returned collection:

We can see this Media Data DTO contains information related to the following image:

(Note: This is a real book!  16 MVPs from around the world and I wrote a book about Microsoft AI. It’s filled with real-world examples to get you started quickly with AI.  I have chapters which that you how to connect to the Twitter API and use AI to surface additional insights in Twitter data.  You can get it eBook or paperback format.)

After this Media Data DTO has been processed, we have a fully hydrated Media Data Entity.  You can see this here in the debugger:

We then add the Media Data Entity to a list (mediaModels).  This loop keeps running until all Media Data DTOs’ are processed.

Text Analytics Insights for the MVP Book

Working with our MVP Book Media Entity, we can look at the insights that have been surfaced by the Text Analytics API.


Taking a closer look at the CaptionInsights property, we can see that:

  • 13 Entities have been identified
  • 13 Key Phrases have been identified
  • The sentiment is relatively neutral (0.5)


Expanding the EntityRecord at index 2 we can see that “Microsoft” has been identified, the Type (Organization) and a Wikipedia URL.

Tip: This URL could be used to make an Http Request; the page could be retrieved, and the underlying HTML could also be processed by the Text Analytics API to surface even more insights about Microsoft.

Key Phrases

Expanding the Key Phrases, we can see phrases like Microsoft AI book, practical guide, chatbots and Amazon are all captured:

Having information in this format makes it easy to report over and query.


Looking at the associated comment for the MVP Book image, we can see there is one comment that says “test comment: this is an awesome book”:

Two key phrases have been identified “test comment” and “awesome book”.  Notice the words “such an” aren’t included. These words don’t add much value to the overall sentence and can be considered “noise words” – so it’s great that it’s been filtered out.

Notice the sentiment for this comment is 0.92 – the higher to 1.0 the more positive the comment.  Perfect!

Computer Vision Insights for the MVP Book

Next, we can look at the computer vision insights that have been identified after the method GetComputerVisionInsightsAsync has run.  Here we can see the Computer Vision API has identified:

  • 1 Object
  • 7 Tags

Detected Objects

If we look at the object, the Computer Vision API believes the main object in the image is a “Poster”:

This isn’t ideal but the API can be forgiven here as the image itself does look like a poster and not a book!


Taking a closer look at the Tags that have been identified for the MVP Book, we can see that at index 6 in the collection, the Computer Vision API as successfully identified the image as being a “book”:

A look at some of the other Tags for this image shows that:

  • text is present
  • we have a screenshot – this is very impressive as this image was a screenshot I uploaded!
  • there are design elements

You can see this here:

In the past, it’s been difficult for developers to surface this kind of information from visual content, but we’ve seen how the Computer Vision API makes this easier.

Being able to surface and extract insights like this from visual content provides you with additional information that can help you better understand the content you’re processing. You can use this to inform business decisions, optimize processes or build innovative solutions.

Use Cases and Further Ideas

The API has many uses, here are some ideas to get you going:

Additional Integrations – Integrate API’s from other platforms like Facebook, Reddit or Twitter. More data = more insight.

Azure– house the API in an Azure Web Job or Function that runs periodically which ingest data for reporting purposes.

Chatbot – create a conversational agent that lets users ask questions around top-performing content, the type of content or even brand. Alternatively, let the user ask the bot “how is my marketing campaign for “#sneakers” performing.  Use the text analytics insights to create a roll-up of the overall sentiment of your campaign, product, brand or service.

Dashboards – Surface data processed by the API in web applications or Power BI thereby making it easy for the business to consume.

Public Sector Datasets– blend datasets from public sector records such as with social media. Ties these back to postcode/zip codes thereby providing further insights


Summary and Closing Thoughts

This has been the 4th and final part of the “Tapping into the Instagram Graph API with C#” series.

I hope you’ve enjoyed it and got some value from it.

Throughout this series, we’ve been building a custom API that can connect to and extract data from the Instagram Graph API using C#.

In this final installment, we’ve seen how to augment our custom API’s intelligence with AI capabilities with Azure Cognitive Services.

We successfully ran our API and saw how the Text Analytics and Computer Vision APIs helped surface additional insights in Instagram data and considered some real-world use cases for our API.

All source code for this entire series can be found on my GitHub repo here.

Note: All parts of this 4-part series are now available into an easy to consume eBook! Find it here.


  • Do you need to integrate with the Instagram Graph API?
  • Do you need to extract specific data from Facebook, Instagram or Twitter?
  • Do you have any pain points with Facebook, Instagram or Twitter APIs?

Leave a comment down below or contact me via the social links on the blog!


Enjoy This Blog?

Get more in the eBook.

This contains an end to end solution and entire source code.

Available on Gumroad here.

Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.


  1. Amazing series Jamie! Great use case for cognitive services… I can see the benefit of just plugging into Microsoft’s API vs trying to not only build this framework, but the additional ML models on top of it all!

  2. Comment by post author


    Hey Zach!

    I’m glad you enjoyed the series.

    Being able to plug in APIs like this saves you a load of work.

    Several years ago I built a custom text classification API that implemented Bayesian Theoroem and Part of Speech (POS) Tagging to perform sentiment analysis and text classification of certain words. It involved a load of work: building a datamodel, getting training data, cleansing it then testing etc.

    With these APIs, you don’t need to do all that!

  3. Haha wow, I love this. PLEASE keep posting! Can’t wait to read your next blog!

Leave a Reply