Introduction: Readers of this blog will know that I have been looking at extracting public health content from Twitter over the past year. In this analysis I bring together a series of NodeXL social network analyses, extracted over the past 2 days, to look at what they can tell us about physical activity and health. I have termed this a “person, topic, organisation, network and campaign” approach (PTONC – pronounced Pétanque perhaps for this physical activity theme?)
This work was prompted by a request by Ann Gates at ExerciseWorks on Saturday 7 October 2017. ExerciseWorks is a prominent physical activity focused Twitter account based in the UK but with a global reach. Ann was interested in demonstrating her Twitter following to a physiotherapy audience and asked if I could produce a NodeXL map (figure 1). I thought it would be interesting from a CPD perspective to look beyond the interactions shown in figure 1 to look at the contents of the most shared tweets, for ExerciseWorks and other NodeXL searches.
Figure 1: @ExerciseWorks NodeXL map 27 Sep to 7 Oct 2017
Source: NodeXL graph gallery
Methods: NodeXL allows us to extract information by searching Twitter content – for example a phrase (eg “physical activity”), a hashtag (eg #physicalactivity) or a Twitter username (eg @ExerciseWorks) that are used in the body of the tweet. A combination of hashtags, words and usernames can also be used, and further refinement is possible by date (whole days) and geocode. NodeXL extracts information about tweets that use these terms, and in the case of usernames (eg @ExerciseWorks) this allows us to look at tweets by that account as well as tweets by others that mention that username ie tweets by or about that user. NodeXL extracts the tweets (and retweets) meeting the search criteria during the period of the extract (up to a period of 9-10 days for tweets, but without limit for tweets retweeted during that period; for example a tweet by or about @ExerciseWorks tweeted during 2012 would feature in a current NodeXL if anybody had found and retweeted the tweet over the past few days).
For popular search terms these NodeXL extracts can contain many thousands of tweets by thousands of users. This would be overwhelming to analyse using a Twitter search alone. As described previously there are straightforward ways to extract “top tweets” from these NodeXL extracts, so we can look at both interactions (using NodeXL reports) and content. I thought that it would be useful to look beyond reach (both by Twitter username and geography) to explore the content and offered to perform the relevant analysis for ExerciseWorks.
In order to put the ExerciseWorks tweets in context I have looked at three other NodeXL extracts:
- BJSM_BMJ (a high impact sports, exercise and physical activity academic journal with tweets by highly engaged editors)
- A general search for #PhysicalActivity OR (“physical activity” AND “health”) which I will refer to subsequently as the “#PhysicalActivity search”
- The recent European Week of Sport campaign (23-30 September: tweets using #BeActive hashtag). I noted this as a popular hashtag from the #PhysicalActivity search and decided to have a closer look).
I was interested to see if these three searches identified different content to the ExerciseWorks search, or whether most of the top materials were captured by one or other of the searches.
I set simple rules to produce a manageable number of tweets, as described in the summaries. Using the PTONC categories, ExerciseWorks = person and network; BJSM = person and organisation/ network; the search term for #PhysicalActivity search = topic; #BeActive = campaign).
Results: The “top tweet” summaries of the four analyses are provided at the bulleted list below. I knew in advance that there was likely to be some overlap between searches, with ExerciseWorks and BJSM mentioning and retweeting each other, and ExerciseWorks frequently using hashtags and terms used in the third search. The #BeActive search largely featured European Week of Sport (EWOS) tweets as the organisers intended, but there were some more general tweets (it’s a relatively non-specific hashtag) and some spam tweets for this search (the non-EWOS tweets did not feature in the top 3 listed below and the spam posts were excluded from the analysis).
The following links go to the Storify summary with links to data, method and top tweet lists (can be viewed without a Twitter account). The “top tweet” lists are split in the summaries into the period before and during the dates listed below.
- @ExerciseWorks 27 Sep to 7 Oct
- @BJSM_BMJ 27 Sep to 7 Oct
- #PhysicalActivity 28 Sep to 8 Oct (see NodeXL report for full search term)
- #BeActive 29 Sep to 8 Oct (mainly European Week of Sport); top 3 tweets listed below are from the “week” itself which is given as 23-30 September in campaign materials.
Rather than produce a statistical analysis of tweet content I have listed top 3 tweets (by number of retweets) for each search for the periods studied (for the dates stated in the bulleted lists above). These tweets are listed in table 1.
Table 1: Top 3 tweets (by retweets) for each of the 4 NodeXL extracts on a physical activity theme.
Source: Uses data from NodeXL extracts and then a screengrab of individual tweets from Twitter.
* These are the same tweet, but tweet uses rotating image (GIF file) so images featured in table are different.
The number of retweets recorded for these 3 “top tweets” are shown in figure 2.
Figure 2: Number of retweets by search (EW = @ExerciseWorks; BJSM = @BJSM_BMJ; PA = #PhysicalActivity; BA = #BeActive)
Source: From NodeXL extracts provided in each of the Storify extracts listed above.
Discussion: This rapid summary of recent physical activity themed tweets has demonstrated the distinctive character of tweets extracted by person/ organisation/ network (ExerciseWorks and BJSM_BMJ), topic (#PhysicalActivity) and campaign (#BeActive). While the top 3 tweets shown in table 1 only give a snapshot of tweets, they are a fair representation of wider tweets identified by the NodeXL searches. The tweets identified in the ExerciseWorks and BJSM_BMJ searches show that these high profile Twitter accounts (with 50.7k and 43.8k followers respectively) are effective broadcasters but are also part of a “network” where other users are likely to mention, reply to and retweet each other. There are plenty of other tweeters included in the ExerciseWorks and BJSM_BMJ “top tweet” lists (summarised on the Storify site). Reading through the full “top tweet” lists in the Storify summaries, ExerciseWorks tweets have a general physical activity/ advocacy theme. In contrast BJSM_BMJ tweets for the period studied have a more specialist focus, including sports injury and rehabilitation, and focus particularly on journal content (ie new research and news stories). These differences are captured effectively in the “top 3” lists in table 1.
Unsurprisingly, as hashtag/ phrase based searches rather than searches focused on a single Twitter username, the #PhysicalActivity and #BeActive searches have a more diverse range of tweeters. The most retweeted tweets for these searches are dominated by Twitter users with large numbers of followers, including national and international sports and health organisations, journals and – for the #PhysicalActivity search – ExerciseWorks. The #PhysicalActivity search has the broadest scope of any of the four searches, and includes a considerable number of health related tweets, as would be anticipated from the search term (#PhysicalActivity OR (“physical activity” AND “health”). This search term identified tweets that were either advocacy or evidence focused, and sometimes both. As a blog based in Scotland it is interesting to note that several of the top tweets for the #PhysicalActivity search come from Scotland, with high profile tweeters including @CyclingSurgeon and @DocAndrewMurray providing hugely popular and informative content.
I have not attempted a formal comparison of the tweets included in the different analyses. Figure 2 shows the huge reach of @BMJ_latest. It also shows that for this comparison at least the most popular tweets from @ExerciseWorks and @BJSM_BMJ are broadly similar in reach (which is plausible given their similar number of followers). The European campaign achieves a slightly lower profile, but had a range of high level and grass roots supporters from several European countries. The content of the #BeActive tweet was largely advocacy/ awareness raising, but there was also an opportunity to share new findings.
It should be noted that there are potential limitations to the #BeActive analysis. This was a post hoc extract, conducted beyond the 9-10 day limits of the NodeXL tool. Nonetheless, if a popular tweet from 23-30 September was retweeted in the period available (29 Sep – 8 Oct) then this would have been picked up by the analysis, which is based on total number of retweets up to the date of the extract rather than retweets during the narrower period available. There seems to be a reasonable spread of tweets from the period of the campaign in the full #BeActive summary, with big tweets included in the top 3 analysis shown in table 1. The ability to capture tweets that attract interest beyond the period of a campaign is potentially useful in studying Twitter activity, particularly during large global campaigns when the Twitter/ NodeXL pairing may only provide extracts for much shorter periods than the typical 9-10 days. It is possible that a few hours of tweets (sometimes all that is available) could be used to identify the largest tweets from preceding days. I plan to explore this for the forthcoming World Mental Health Day campaign (12 October 2017; NodeXL extracts of #MentalHealth tweets in the days leading up to this campaign have been limited to a few hours).
In tweets and previous blogs I have described the methods that I use to extract top content from NodeXL reports as “big data” approaches, partly because the term is useful shorthand in sharing and promoting content on social media. While the work described here is not absolutely consistent with a formal definition of big data, there are shared attributes. The approaches that I have used here: could be fully automated; could be produced in close to real time; already provide insights into individual and collective behaviour (the posts themselves and the user selection of popular content through retweets); provide insights into structured and unstructured data (through the search terms used and the “rules” around dates and number of retweets) ; and there is some ability to fill in gaps in data through the capture of retweets in later extracts (eg see #BeActive summary as described above). There is also, no doubt, the potential for machine learning to understand, predict and generate successful content based on information stored in NodeXL extracts, but that is a step too far for my liking.
Conclusion: This has been a useful exercise in attempting to understand and describe the uses of Twitter and NodeXL in digital curation for a Public Health related topic. In common with a literature review it is important to consider a number of different search terms when looking at Twitter content. Outputs from each of the 4 searches here are different and provide potentially useful information, though of course popularity does not necessarily equate to accuracy. The method can be applied rapidly (no more than 30 minutes to produce a top tweet list once a NodeXL extract is obtained, and often much faster for smaller searches). As noted previously, it would be good to automate this top tweet method, but that is a discussion for another day.
Dr Graham Mackenzie, Consultant in Public Health, NHS Lothian, 8 October 2017