DMI Twitter Capturing and Analysis Toolset (DMI-TCAT)

Data selection

Select the dataset:

675.208.105 tweets archived so far (and counting)

Select parameters:

Query: (empty: containing any text*)
Exclude: (empty: exclude nothing*)
From user: (empty: from any user*)
Exclude user: (empty: exclude no users*)
User bio: (empty: anything in biography*)
User language: (empty: any language*)
Twitter client URL/descr: (empty: from any client*)
(Part of) URL: (empty: any or all URLs*)
(Part of) media URL: (empty: any or all media URLs*)
Startdate (UTC): (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS)
Enddate (UTC): (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS)
* You can also do AND or OR queries, although you cannot mix AND and OR in the same query.
Overview of your selection
Dataset:eurocopa (andresiniesta8, AntoGriezmann, ArdaTuran, BelRedDevils, Cristiano, David_Alaba, DFB_Team, England, equipedefrance, euro2016, gianluigibuffon, hazardeden10, MesutOzil1088, MilliTakimlar, oefb1904, SeFutbol, selecaoportugal, SFV_ASF, uefaeuro, Vivo_Azzurro gianluigibuffon SFV_ASF, Vivo_Azzurro, WayneRooney, XS_11official)
Search query:
Comments:
Exclude:
From user:
Exclude from user:
From twitter client:
(Part of) URL:
(Part of) media URL:
Startdate:2024-10-13
Enddate:2024-10-14
Number of tweets:0
Number of distinct users:0



Date and time are in UTC (UTC).
Graph resolution
Export selected data

All exports have the following filename convention: {dataset}-{startdate}-{enddate}-{query}-{exclude}-{from_user_name}-{exclude_from_user_name}-{from_user_lang}-{url_query}-{media_url_query}--{module_name}-{module_settings}-{dmi-tcat_version}.{filetype}

Output format for tables:

Tweet statistics and activity metrics

All statistics and activity metrics come as a .csv file which you can open in Excel or similar.
Here you can select how the statistics should be grouped:

Tweet stats

Contains the number of tweets, number of tweets with links, number of tweets with hashtags, number of tweets with mentions, number of retweets, and number of replies
Use: get a feel for the overall characteristics of you data set.

User stats (overall)

Contains the min, max, average, Q1, median, Q3, and trimmed mean for: number of tweets per user, urls per user, number of followers, number of friends, nr of tweets, unique users per time interval
Use: get a better feel for the users in your data set.

User stats (individual)

Lists users and their number of tweets, number of followers, number of friends, how many times they are listed, their UTC time offset, whether the user has a verified account and how many times they appear in the data set.
Use: get a better feel for the users in your data set.

Hashtag frequency

Contains hashtag frequencies.
Use: find out which hashtags are most often associated with your subject.

Hashtag-user activity

Lists hashtags, the number of tweets with that hashtag, the number of distinct users tweeting with that hashtag, the number of distinct mentions tweeted together with the hashtag, and the total number of mentions tweeted together with the hashtag.
Use: explor user-hashtag activity.

Twitter client (source) frequency

Contains source frequencies.
List the frequency of tweet software sources per interval.

Twitter client (source) stats (overall)

Contains the min, max, average, Q1, median, Q3, and trimmed mean for: number of tweets per source, urls per source
Use: get a better feel for the sources in your data set.

Twitter client (source) stats (individual)

Lists sources and their number of tweets, retweets, hashtags, URLs and mentions.
Use: get a better feel for the sources in your data set.

User visibility (mention frequency)

Lists usernames and the number of times they were mentioned by others.
Use: find out which users are "influentials".

User activity (tweet frequency)

Lists usernames and the amount of tweets posted.
Use: find the most active tweeters, see if the dataset is dominated by certain twitterati.

User activity + visibility (tweet+mention frequency)

Lists usernames with both tweet and mention counts.
Use: see wether the users mentioned are also those who tweet a lot.

Identical tweet frequency

Contains tweets and the number of times they have been (re)tweeted indentically.
Use: get a grasp of the most "popular" content.

Word frequency

Contains words and the number of times they have been used.
Use: get a grasp of the most used language.

Media frequency

Contains media URLs and the number of times they have been used.
Use: get a grasp of the most popular media.

Export table with potential gaps in your data

Exports a spreadsheet with all known data gaps in your current query, during which TCAT was not running or capturing data for this bin.
Use: Gain insight in possible missing data due to outages

Tweet exports

All tweet exports produces a .csv or .tsv file which you can open in Excel or similar.
Here you can select additional columns for the tweet exports (more = slower):

Random set of tweets from selection

Contains 1000 randomly selected tweets and information about them (user, date created, ...).
Use: a random subset of tweets is a representative sample that can be manually classified and coded much more easily than the full set.

Export all tweets from selection

Contains all tweets and information about them (user, date created, ...).
Use: spend time with your data.

List each individual retweet

Lists all retweets (and all the tweets metadata like follower_count) chronologically.
Use: reconstruct retweet chains.
Warning: This script is slow. Small datasets only!

Only tweets with lat/lon

Contains only geo-located tweets.

Export tweet ids

Contains only the tweet ids from your selection.

Export hashtag table (tweet id, hashtag)

Contains tweet ids from your selection and hashtags.

Export mentions table (tweet id, user from id, user from name, user to id, user to name, mention, mention type)

Contains tweet ids from your selection, with mentions and the mention type.

Networks

All network exports come as .gexf or .gdf files which you can open in Gephi or similar.

Social graph by mentions

Produces a directed graph based on interactions between users. If a users mentions another one, a directed link is created. The more often a user mentions another, the stronger the link ("link weight"). The "count" value contains the number of tweets for each user in the specified period.
Use: analyze patterns in communication, find "hubs" and "communities", categorize user accounts.

Social graph by in_reply_to_status_id

Produces a directed graph based on interactions between users. If a tweet was written in reply to another one, a directed link is created.
Use: analyze patterns in communication, find "hubs" and "communities", categorize user accounts.

Co-hashtag graph

Produces an undirected graph based on co-word analysis of hashtags. If two hashtags appear in the same tweet, they are linked. The more often they appear together, the stronger the link ("link weight").
Use: explore the relations between hashtags, find and analyze sub-issues, distinguish between different types of hashtags (event related, qualifiers, etc.).

Bipartite hashtag-user graph

Produces a bipartite graph based on co-occurence of hashtags and users. If a user wrote a tweet with a certain hashtag, there will be a link between that user and the hashtag. The more often they appear together, the stronger the link ("link weight").
Use: explore the relations between users and hashtags, find and analyze which users group around which topics.

Bipartite hashtag-mention graph

Produces a bipartite graph based on co-occurence of hashtags and @mentions. If an @mention co-occurs in a tweet with a certain hashtag, there will be a link between that @mention and the hashtag. The more often they appear together, the stronger the link ("link weight").
Use: explore the relational activity between mentioned users and hashtags, find and analyze which users are considered experts around which topics.

Bipartite hashtag-source graph

Produces a bipartite graph based on co-occurence of hashtags and "sources" (the client a tweet was sent from is its source) . If a hashtag is tweeted from a particular client, there will be a link between that client and the hashtag. The more often they appear together, the stronger the link ("link weight").
Use: explore the relations between clients and hashtags, find and analyze which clients are related to which topics.

Bipartite user-source graph

Produces a bipartite graph based on co-occurence of users and "sources" (the client a tweet was sent from is its source) . If a users tweets from a particular client, there will be a link between that client and the user. The more often they appear together, the stronger the link ("link weight").
Use: explore the relations between clients and users, find and analyze which users use which clients.

Experimental

Cascade

The cascade interface provides a ground level view of tweet activity by charting every single tweet in the current selection. User accounts are distributed vertically; tweets - shown as dots - are spread out horizontally over time. Lines indicate retweets.
Use: visually explore temporal structures and retweets patterns.
Warning: This view requires a large screen and is limited to (very) small data selections.

The Sankey Maker

Produces an alluvial diagram.
Use: plot the relation between various fields such as from_user_lang, hashtags or Twitter client.

Associational profile (hashtags)

Produces an associational profile as well as a time-encoded co-hashtag network.
Use: explore shifts in hashtags associations.

Modulation Sequencer (URL)

The tool allows one to qualitatively examine how a URL is shared on Twitter over time. See Moats and Borra (2018) for a full explanation.
Use: enter a (part of a) URL in the data selection field at the top and click 'update overview'. Then launch this tool.