Our Twitter Tracker tool is designed to illustrate the scale and variety of discussion about The Promise on Twitter.
It pulls in public tweets that use the hashtag #c4thepromise. These tweets are moderated to make sure that they follow Channel 4's Community Guidelines before they are published on the Twitter Tracker.
How the tweets are clustered
Every public tweet that mentions the programme hashtag #c4thepromise is captured and then broken down into its constituent parts (nouns, verbs, adjectives as well as hashtags, @-replies and URLs etc) by a natural language algorithm.
From this, we form an adjacency list of common phrases, which represents how frequently phrase combinations occur. From the adjacency list we build a distance matrix that represents how 'near' a given combination of phrases are to each other. For example, phrases that commonly occur together will have a distance close to 0 (i.e. they are considered to be near each other).
Clusters in the conversation are generated from this distance matrix using the k-medoids algorithm. This means that phrases that are near, or similar to, each other are grouped into a common cluster. For each cluster, we find the most common theme, along with nearby associated themes.
The clusters of common themes give a sense of the variety of the online conversation around The Promise. The volume of tweets around each theme determines how close it is to the centre of the visualisation, with the most discussed theme being closest.