In my last post, I introduced the Twitter Labs Filtered Stream API.
In that post, I explored the main features of the API and the rich data you get back. I looked at the rich JSON output and shared some ideas around use cases for the API.
To towards the end of that blog post, I mentioned an API I had been working on that connects to the Twitter Filtered Stream API.
This was called the Social Opinion API. This API can extract process and parse data from the Detailed JSON payload that is returned by the Twitter Filtered Stream API. It’s been implemented a reusable .NET Core class library that can be used by multiple projects. I shared a cut of the data the Social Opinion API can surface with the Twitter DevRel Team along with some other mislabelled datasets.
In this blog post, I follow on from the last and share:
- Overview of the Social Opinion API and Filtered Stream API integration
- How the Social Opinion API connects to the Filtered Stream API
- Insight into some of the data Social Opinion API can surface from the Filtered Stream API
- Telemetry, Metrics, Resource Usage and Diagnostics details
- Snapshot of the data the Social Opinion API can extract and persist
First things first, an overview of the process and main components.
Overview
The Twitter Filtered Stream API needs at least one Rule to be configured before you can use it. You also need to create your own client / service / code to make the connection and start listening for data.
I thought about how this might work and quickly mocked this up using a mobile app called Lucid Charts which you can see here:
The above diagram gives you an overview of the main components and there is more going on under the hood. Boxes in blue site at the Twitter domain whereas everything else sits within the Social Opinion API end.
Social Opinion API (Data Processing)
The Social Opinion API (Service) section is a class library. This can be housed in an Azure Web Job, Web API, Windows Service – anything really that can call .NET code. It will run 24×7 extracting, ingesting, transforming data an as surfacing actionable insights.
Manual Input (Rules Administration)
The Manual Input section is a smaller collection of APIs that sit within the Social Opinion API which let you administer Filtered Stream Rules. You could house this functionality within a console application, Web API or even build a web front end to let you administer Rules.
(Sample Web UI for Rule Administration)
How does the Social Opinion API connect to the Twitter Filtered Stream API?
I’m a big fan of encapsulation and aim to create easy to use APIs. The Social Opinion API lets you connect to the Filtered Stream API in just a few lines of code.
In the code segment below, you can see the Filtered Stream API connection can be made with less than 3 lines of code:
class Program { static void Main(string[] args) { SocialOpinionAPI.Core.Labs.FilteredStream.Logic.LabsFilteredStreamDataLogic logic = new LabsFilteredStreamDataLogic(true, true); logic.LabsFilteredStreamStartStream(); } }
Under the hood, it’s a different matter. There’s a lot to consider such as:
- Creating OAuth Headers using the Twitter App Secret Keys/Tokens
- Building the Http Requests
- Processing the JSON response from the Filtered Stream and mapping to the Social Opinion data model
- Data access/persistence
- Handling errors and adhering to the Twitter Terms of Service
I won’t go into all of this but calling this one method does all the above.
Under the Hood
Under the hood, I call a private method that loads in a series of configuration settings for the interface.
These settings let me set the number of “Tweets to fetch” or “Maximum Number of Errors” the interface should tolerate before telling the service to “back off” from the Filtered Stream API.
You’ll also see there is a setting that lets you enable/disable the service too:
if(!config.ServiceActive) { _log.Info("Service not Active. Abandoning start"); } else { LabsFilteredStreamStartStream(config.AddressAndFilter, config.TweetsToFetch, config.MaxNetworkErrorRetryAttempts, config.MaxParseErrorTolerence); }
From the code snippet above, you’ll see a further call is made to LabsFilteredStreamStartStream.
This is a private method that does a lot of the heavy lifting in terms of parsing the JSON from the Filtered Stream API to custom C# objects (and a whole load of other things).
Live Debug Session
You can see a live debug session of a lower level Social Opinion API method call in action here:
From the screenshot above, you can see that the Hashtag (COVID2019) and Annotations (Person) have been identified in the Tweet being processed.
Events
The Social Opinion API also raises events as Twitter data is fetched and saved. This lets you handle responses from the Filtered Stream API whichever way you want. For example, here I am setting up a few event handlers and telling Social Opinion to start listening to the Filtered Stream API:
private void button17_Click(object sender, EventArgs e) { SocialOpinionAPI.Core.Labs.FilteredStream.Logic.LabsFilteredStreamDataLogic logic = new LabsFilteredStreamDataLogic(true, true); logic.FilteredStreamDataReceivedEvent += Logic_FilteredStreamDataReceivedEvent; logic.FilteredStreamDataSavedEvent += Logic_FilteredStreamDataSavedEvent; logic.LabsFilteredStreamStartStream(); }
Here I am taking the data (the Filtered Stream API JSON payload mapped to a custom DTO) and outputting this to console:
private static void Logic_FilteredStreamDataReceivedEvent(object sender, EventArgs e) { TweetReceivedEventArgs eventArgs = e as TweetReceivedEventArgs; FilteredStreamDataResponse dataResponse = eventArgs.filteredStreamDataResponse; Console.WriteLine("Data received from filtered stream: " + dataResponse.data.ToJson()); }
Finally, I am listening to the “data saved event” that my API raises:
private void Logic_FilteredStreamDataSavedEvent(object sender, EventArgs e) { StreamDataSavedEventArgs streamDataSavedEventArgs = e as StreamDataSavedEventArgs; LabsFilteredStreamData data = streamDataSavedEventArgs.filteredStreamDataReceived; Console.WriteLine("Data saved with StreamDataUid:" + data.StreamDataUid + ". Content:" + data.ToJson()); }
When this happens, I know that Social Opinion has pushed the Filtered Stream data into the database. You can see from the above code; the Social Opinion API shields you from the low-level calls you need to make if you want to consume the Filtered Stream API and can be used with just a few lines of code!
Let’s look at a subset of the data the Social Opinion API will return when it processes real-time data from the Filtered Stream API.
Filtered Stream Data
This is the root level object for a processed Tweet. It contains many fields which can be valuable for reporting, analytics and social listening purposes:
public class LabsFilteredStreamData { public LabsFilteredStreamData(); public string source { get; set; } public string lang { get; set; } public bool possibly_sensitive { get; set; } public int quote_count { get; set; } public int like_count { get; set; } public int reply_count { get; set; } public int retweet_count { get; set; } public Stats stats { get; set; } public Models.Entities Entities { get; set; } public List<FilteredStreamReferencedTweet> referenced_tweets { get; set; } public string in_reply_to_user_id { get; set; } public string author_id { get; set; } public string text { get; set; } public DateTime created_at { get; set; } public string id { get; set; } public int RequestId { get; set; } public long StreamDataUid { get; set; } public List<FilteredStreamContextAnnotation> context_annotations { get; set; } public string format { get; set; } }
Annotation
This is one of my favourite objects that I’ve added to the Social Opinion API. Arriving at this kind of insight in the past would have meant implementing my own custom Part of Speech (POS) Tagger or Named Entity Recognition (NER) API, now I have a lot of this out of the box.
public class Annotation { public int start { get; set; } public int end { get; set; } public double probability { get; set; } public string type { get; set; } public string normalized_text { get; set; } }
Let’s look at how this is used. Consider the following Tweet:
“UK could face ‘indefinite lockdown’ until vaccine is found. They know where they can go on this idea. #coronavirusuk #COVID2019 https://t.co/HkQF1BOQ6z”
The Social Opinion API integration with the Twitter Filtered Stream API is identifying UK as a Place and an Annotation with a high confidence score.
The API will also identify Products, Organisations, and People. With the information in this format, you can start to build some innovative solutions.
Domain
Here you can see some of the Domains that are being collected:
These are just a few examples of the data that’s being captured by the Social Opinion API.
TweetReceivedEventArgs and StreamDataSavedEventArgs
Finally, the objects we’ve just looked at (and others) are passed back in custom Event Args. These contain data for your client application to handle. You can see an example of some of this data here:
public class Data { public string id { get; set; } public DateTime created_at { get; set; } public string text { get; set; } public string author_id { get; set; } public string in_reply_to_user_id { get; set; } public List<ReferencedTweet> referenced_tweets { get; set; } public Entities entities { get; set; } public Stats stats { get; set; } public bool possibly_sensitive { get; set; } public string lang { get; set; } public string source { get; set; } public List<ContextAnnotation> context_annotations { get; set; } public string format { get; set; } } public class MatchingRule { public string id { get; set; } public string tag { get; set; } } public class FilteredStreamDataResponse { public Data data { get; set; } public List<MatchingRule> matching_rules { get; set; } }
Storing Filtered Stream API data
The JSON payload from the Filtered Stream API contains a lot of data points so there the number of tables that are needed to support it.
Here you can see a subsection of a database in Azure where I tell the Social Opinion API to push processed data into:
The actual number of tables required to support the Detailed JSON payload goes into the double-digits. I’ve mapped them all! Those weekends were busy!
Housing the Social Opinion API
The Social Opinion API is housed in Azure running 24×7 and is processing data as I write this blog. The API has a feature that lets you switch it on or off and detailed error logging to help diagnose errors. I may create a web UI that lets me search for errors.
For the time being, I currently write direct SQL to the database to a system standard logging table that’s used by the Social Opinion API.
Fault Tolerance and Twitter ToS
In addition to parsing and processing the Detailed JSON payload to custom C# POCOS, the Social Opinion API features built-in configuration and error handling.
It will also let you configure how many tweets to fetch and when to exit on the existence of a given number of errors.
This also makes sure any calls made to the Filtered Stream API adhere to the Twitter Terms of Service.
Telemetry, Metrics, Resources and Diagnostics
During testing, I tracked the following key data points when 1 Rule was set up and approximately 300k Tweets were processed.
- 1,500 Tweets processed every 5 minutes
- 2.8% error rate for every 1,000 Tweets processed
- 3GB database storage usage (Azure SQL)
- Maximum of 60% database resource usage at any given time
- 421,609 Context Annotations identified
- 160,931 Entity Annotations identified
- 44 Domains Identified
- 1,727 Entities Identified
With these sorts of metrics, the Social Opinion API will quickly hit the limit of 500,000 Tweets per month. In a recent video call with the Twitter Product Team, one of the things I politely asked for was more data. 😉
Summary
This has been a follow on from last week’s blog post and we’ve seen how you can connect to the Twitter Labs Streaming API.
We’ve looked at an interface that lets you process data from the Filtered Stream API, the data you can expect to get back and some of the in-built features and metrics. I’m working on a set of dashboards and visualisations for this data.
Another idea I have is around placing a chatbot in front of the data and insights that are being surfaced by the Twitter Filtered Stream API and Social Opinion API.
- Got a question?
- Want to know more?
- Interested in this interface?
Drop me a message below or reach out on social.
Leave a Reply