Over recent weeks I have been working out ways to extract “top tweet” lists from huge extracts of Twitter data (NodeXL). A defining feature of Twitter is its immediacy. That presents challenges when attempting to summarise a large volume of activity – eg from an international conference (eg Quality2017 in London April 2017 – blog) or global public health campaign such as Immunization Week (read blog). Recent political events (eg science marches, UK’s June 2017 General Election) generated tens of thousands of tweets in hours or even minutes. I have attempted the step by step approach described below on a series of different extracts and it appears to work well.
NodeXL allows us to extract data on planned and unplanned social media activity, for a period of up to 9 days into the past. Sometimes a little bit of tweaking or patience is required to extract data.
NodeXL maps interactions around tweets, helping identify top tweeters, hashtags and quoted URLs. Sometimes, however, it is helpful to see the most dominant tweets; this identifies other content – eg popular infographics. Number of retweets is a measure of the popularity (or sometimes infamy) of a tweet, and I have used this measure to rank tweets (eg this for Antibiotic Awareness week, November 2016). For conferences I have then ranked the tweets chronologically (eg this for FISHIS16).
If you’re keen to try this out for yourself, here are a couple of large extracts to get your teeth into:
- An extract of last year’s American Public Health Association conference – see resulting Storify (top 179 tweets) here
- An extract of recent medical journal tweets (6 days to 30 June) – see resulting Storify (top 100 tweets) here
See my NodeXL extracts (and a few that mention ScotPublicHealth) here.
Have a go at following the step by step guide below, and let me know how you get on via Twitter.
You’ll need Excel and a Storify account (free). If you have an Apple or Linux machine and want to produce your own step by step guide then please let me know and I’ll include on this page, giving you credit.
Graham Mackenzie (@gmacscotland on Twitter)
Consultant in Public Health
30 June 2017
|Instructions||Image (click to view full size)|
|Step 1. Open up your chosen extract using NodeXL Graph Gallery website. Click here for NHSScotland17 extract for this worked example.|
|Step 2. Scroll down to link at bottom of page – “Download the Graph Data as a NodeXL Workbook” and download the related file|
|Step 3. Find the file (may be in your “downloads” folder) and move to a location of your choice. Open it up|
|Step 4. Select the “Edges” worksheet at the far left of the workbook|
|Step 5. Select the “Tweet” column (click on column “Q”)|
|Step 6. Copy and paste this column into a new workbook|
|Step 7. Click on “Tweet” at the top of this column and “select all” (hold down “control” key and press “A” key on a Windows PC). Insert a pivot table. This should open up in another worksheet|
|Step 8. In the pivot table fields box drop “Tweet” field into “Rows” and “Values”|
|Step 9. Select and copy all but the bottom row of the resulting pivot table (Select “Row labels” cell; hold down “control” key, press “A” key; release both keys; now hold down “shift key” and press “up” arrow; release; hold down “control” key again and press “C”)|
|Step 10. Paste the information alongside pivot table (click into a cell a couple of columns to right of pivot table, and paste the contents of the pivot table as “paste special”/ values). This gets rid of any formatting from the pivot table.|
|Step 11. Sort these columns by “Count of Tweet” in descending order (click on “Row Labels” in the columns you’ve just pasted; press control A; click “data” tab at top of screen and then “Sort” by “Count of Tweet”, largest to smallest)|
|Your data will look something like this|
|Step 12. The “Row Labels” column contains the texts of the different tweets. Some of these tweets will include mentions of other users (@username). Each mention of a user is recorded as a new relationship by NodeXL. This will be multiplied up with replies and retweets. Read more here. You need to take account of the number of “mentions” when you’re estimating number of retweets and can do this by copying the following formula into the adjacent column:
(you’ll probably have to type this out as browser does strange things to formatting of inverted commas; you may need to change the cell reference D4 depending on how you have set up your spreadsheet)
|Step 13. Label your new column “Mentions”. Label the next column “Count/mentions” and add the following formula in the cell below:
Again you’ll probably have to type this out rather than copying and pasting, and may need to change the cell references (E4 and F4) depending on where you’ve pasted your data in the sheet
|Step 14. Copy these two new cells down to the bottom of the sheet|
|Step 15. Sort the columns by “count/ mentions” column (in decreasing order).|
|Step 16. Sort the columns by “count/ mentions” column (in decreasing order).|
|Step 17. Remove tweets with “#DIV/0” in “count/mentions” column. This might seem counter-intuitive, but these tweets will already be included in the analysis as retweets. Select the rows with the #DIV/0 tweets, and delete. Cut and paste the remaining rows just below the title row|
|Step 18. Set up 3 new columns as shown, in preparation for recording URL, number of retweets (documenting when you recorded this information), and date and time of tweet (if you’re going to be ordering your tweets chronologically).|
|Step 19. Select a search term from the first tweet, avoiding URLs or hyphens at the start of the word (Twitter search uses hyphens as an instruction to exclude that term). You will need to identify which term work best for you using trial and error.|
|Step 20. Go to your browser. Enter the search term you’ve copied from Excel into the Twitter search box. Search through the different tweets if more than one appears and find the tweet that best matches sender and number of retweets. Once you’ve found the tweet click arrow in right hand side and click “Copy link to tweet”.|
|Step 21. Copy the link (control C).|
|Step 22. Paste the link into Excel (control V).|
|Step 23. Document the number of retweets identified from Twitter.|
|This step is only relevant if you’re going to order your tweets by time (eg for a conference).
Step 24. Go back to browser and open up the tweet to extract the date and time it was posted (highlight by clicking on date, Control C to copy). Don’t worry if this is in a different time zone – it’ll be the same for all the tweets.
|Step 25. Paste into Excel|
|Step 26. Repeat|
|Step 27. Order by number of retweets to identify the top tweets|
|Step 28. Order chronologically if you are summarising a conference|
|Your data should look something like this depending on the rules you have applied – eg the number of retweets you are going to include, the weighting that you have given to different days etc|
|Step 29. You’re now going to use this information to produce a Storify summary. If you don’t have a Storify account set one up|
|Step 30. Find the individual tweets by taking the URLs in your Excel list and searching for them in Storify (click the Twitter logo, select search and paste the first URL|
|Step 31. Storify will find the tweet|
|Step 32. Drag it across into the Storify summary – and repeat…
Complete the title and description once you’re ready