Software Architect / Microsoft MVP (AI) and Technical Author

AdTech, Analytics and Big Data, C#, Machine Learning, Prototyping, Social Media

Segmenting customers and performing market research using Microsoft Cognitive Services


In an earlier post, I detailed how LUIS could be leveraged to identify sales leads in Twitter data.  It applied a probability in terms of how “hot” the sales lead was and saved this to the database.

This was achieved by creating 2 Intents (sales lead, no sales lead), then by subsequently training LUIS with specific Utterances that fell within those Intents.

As a proof of concept, it worked well and saved you time (less .NET code to write and test).

But what if you wanted to:

  • identify specific products, models or services?
  • handle synonyms in specific datasets?
  • group data based on attributes?

After experimenting some more with LUIS, I found that Entities (specifically Lists) can be used to achieve this and in this post, I cover how to:

  • use Entities to segment social data based on specific attributes
  • use synonyms to identify edge cases in social data

An example

Before diving into the detail, consider the following tweet where the person is talking about an A3.


Despite the person not mentioning this, another human can read this and infer that a car made my Audi is being discussed (A3 Coupe).

It’s hard for the computer to infer subtle nuances like this without writing custom code but by implementing Entities in LUIS you can train the computer to do just that!

Creating a new Entity

For LUIS to successfully identify that “A3” or “S3” are types of “car” manufactured by “Audi”,  you need to create an Entity List.

So, within your LUIS Application, browse to Entities, and select Add Custom Entity then List as the Type.


With the Entity List “Cars” defined you can start to add the canonical form and the respective synonyms.

Input “Audi” and supply 3 models:

  • A1
  • A3
  • S3


With the Entity List and synonyms defined, the next thing to do is train your LUIS Application.  Click Train Application.


Next, browse to the Publish App screen and select Publish. The date will refresh to indicate your LUIS application has been published successfully.


So far, we have:

  • 2 Intents with many Utterances (Sales Leads/ No Sales Lead)
  • 1 Entity List with associated synonyms (Car-Audi [A1, A3, S3])

Everything is now in place and we can test the application, so Browse to Train and Test.

Next, select Enabled published model. You need to select this to interrogate the JSON.  The JSON will contain any potential Entities that LUIS has identified.


Finally, supply the text “I might get a new a1” and hit enter.


Your LUIS application is invoked and will have:

  • Identified the Intent is Sales Lead
  • Identified that A1 is an Entity (note the text A1 has been replaced by $cars)


But where is the entity data?

To get this, you need to click on Raw JSON View.  When you do this, you can see the following elements and associated values in the entities node:

  • Entity: A1
  • Type: Cars
  • Values: Audi


Imagine you had captured this data as part of a social listening strategy?

You’d now know the person has expressed commercial intent and is talking about:

  • car(s)
  • brand(s)
  • model(s)

More importantly, you’d have taken raw social data and converted it into structured format.

This is ideal for reporting and data mining.  All of this has been inferred from the text “I might get a new A1”.

This is just the tip of the iceberg and there are different types of Entity that you can experiment with which you can read about here.


On the American site, you can download a “Consumer Complaint Database”.

This contains data from the complaints received by the Consumer Financial Protection Bureau (CFPB) on financial products and services, including bank accounts, credit cards, credit reporting, debt collection, money transfers, mortgages, student loans, and other types of consumer credit.

The database contains over 100,000 anonymized complaints and is refreshed daily.

Complaint data includes:

  • the name of the provider
  • type of complaint
  • the customer complaint narrative (free form text)
  • date
  • zip code


By applying techniques such as the one we’ve just explored, you can segment complaints based on entities within the consumer complaint narrative.

This would let you drill down into further detail than the standard Product and Issue fields offer.

  • Maybe this could be used to improve customer service?
  • Or even identify competitors’ weak points?

Some other ideas and applications:

  • Windows Service integration to facilitate automatic data-mining 24×7
  • Process alternative social media APIs such as Instagram and Facebook.
  • Build an analytics dashboard of commonly discussed products, makes and models within a specific domain


In the post, we’ve detailed how you can use Microsoft Cognitive Services LUIS to gain further insights into social data and customer complaints.

Are you using Cognitive Services in any of your applications?

Is there a topic you’d like to see covered?

Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.


  1. rontomlin

    Well structured “how to” article.
    I have a couple of applications I think I’ll apply LUIS in order to automate and glean higher value

  2. I have noticed you don’t monetize,
    don’t waste your traffic, you can earn extra cash every month with new monetization method.
    This is the best adsense alternative for any type of website (they approve
    all sites), for more details simply search in gooogle: murgrabia’s

Leave a Reply