Peter Hafez - Chief Data Scientist - RavenPack
| August 16, 2016
Here at RavenPack, we are currently developing our next-generation news analytics platform which includes many exciting new features and enhancements.
To showcase the power of our new platform, we wanted to look at a current high profile economic and political event, and what better case than the US Presidential Elections taking place on November 8th?
To test the methodology of our planned blog series, we felt that the recent Brexit referendum provided the perfect context for a test of the type of analysis we’ll be performing in the run up to the US Presidential Elections. In this post, we are going to walk through our methodology and analysis of key entities and media sentiment throughout the Brexit campaign in an effort to understand how good our next-gen news analytics platform is at predicting the outcome of political events.
Our next-gen analytics platform allows us to track many new types of entities, such as people and products, and many additional places and organizations. The ability to track and analyze people in the media, in particular,
makes this Brexit analysis very interesting
. Using this newly available data, we were able to form our own ‘Remain’ and ‘Leave’ groups and analyze the evolution of news sentiment around these groups throughout the campaign. Our initial hypothesis is that news sentiment data for the ‘Remain’ and ‘Leave’ groups would be a strong indicator of poll results and therefore, the outcome of the UK referendum to leave the European Union.
Since events like elections occur seldomly, it is difficult to build a statistically robust model to predict the outcome using news analytics data. That said, the data can certainly be a strong indicator in assessing the probability of any given outcome. As we analysed the media sentiment for the two groups throughout the referendum campaign, it became clear that, although Remain started with and built an advantage in the early stages, momentum swung decisively behind Leave in mid-May. Remain was never able to recover from this shift, and by June 23rd, when British voters went to the polls, media sentiment was still pointing in favor of Leave despite survey-based polls showing a slight advantage for Remain. Below, we’ll go into more detail on our methodology and then have a look at the data to see how the situation developed in the run up to the referendum where the UK decided to leave the EU.
Before we can begin analyzing the media sentiment trends for the ‘Remain’ and ‘Leave’ camps in the run up to the referendum, we need to build our ‘Remain’ and ‘Leave’ groups of entities. We did this by looking at the entities in the system most commonly mentioned in stories where the term ‘Brexit’ appears in the headline. The methodology produced the following ‘Remain’ and ‘Leave’ groups.
Tracking the news volume for each of the individual entities in our two groups also reveals that there are what we call Key Players in each group. In the graphic below, we can see how the volume of news related to each of the key players accelerated in the run-up to the referendum itself.
David Cameron received the largest share of media coverage in the run up to the referendum. Both Boris Johnson and Nigel Farage also received substantial press coverage in the run up to the referendum while other key players received substantially less.
Now that we are armed with our two ‘watchlists’ of entities, we can start to analyze whether the aggregate media sentiment might have helped us predict the result of the referendum.
Calculating the average sentiment of the two groups was relatively straightforward; it is the mean event sentiment score per day for each entity where they are relevant to a story that has ‘Brexit’ in the headline. For the polling information, we used the Financial Times’
Poll of Polls
to gather polling data on both the Remain and Leave camps.
In the graphic below, we then chart the difference between the Remain and Leave for each data set. The POLL_SPREAD_REMAIN_LEAVE series charts the difference between the Remain share of the vote to that of the Leave share. When this series moves above zero, the advantage goes to remain while a value below zero means that the Leave camp had an advantage in the polls at the time.
Similarly for sentiment, we plotted the difference between the Remain and Leave camps’ average media sentiment. When the SENTIMENT_SPREAD_REMAIN_LEAVE moves above zero, it indicates that the media mood was currently in favour of Remain while a reading below zero indicates a measure in favour of Leave. Both time series use a five day rolling average to smooth out spikes in the data and help make the trends more apparent.
Looking at both the poll spread and sentiment spread data charted over time reveals some interesting trends. First, both the sentiment spread and the polling spread closely track each other right up until the middle of May at which point they diverge. The Remain camp maintained a consistently strong advantage in both media sentiment and the polls until the middle of May.
In mid-May however, there is a marked trend downwards in the sentiment data that corresponds with warnings from both the Bank of England and the IMF regarding the
substantial negative impact of a vote in favor of Brexit
. Re-enforcing the fall in sentiment for Remain, Michael Gove made his now infamous statement that “people in this country have had enough of experts” which supported the continued swing towards Leave in media sentiment. This period also saw a continued decline in sentiment for David Cameron, in particular, which was mirrored in
opinion polls measuring his approval ratings at the time
. The decline in sentiment for David Cameron was the primary contributor in the fall for the Remain camp over this period.
The sentiment spread continued to strengthen in favour of the Leave camp throughout June as major UK newspapers such as The Sun and The Daily Telegraph came out in favor of Leave. The rise of the Leave campaign was only arrested by the tragic killing of MP Jo Cox in Leeds on the 16 of June. After the end of the ensuing campaign suspension, we see the trend in sentiment spread reverse, but it never comes close to moving in favor of Remain on the day of the referendum and remains below zero (i.e. in favour of Leave) on the 23rd of June.
When it came down to it, the polls indicated a 2 percentage point advantage for Remain, while RavenPack’s next-gen event sentiment score when applied to both Remain and Leave was still indicating strongly in favor of Leave. Looking back with the benefit of hindsight, we can say that this was a strong indicator that the UK was, in fact, likely to vote to leave the EU.
Moving forward, we are starting to look at the upcoming US Presidential elections in November 2016. Similarly to the methodology outlined above, we will research and build ‘watchlists’ of key entities related to both Donald Trump’s and Hillary Clinton’s campaign. Getting the right watchlists is, arguably, the most important step in the analysis. Once we have those watchlists in place, we’ll then be able to analyze the media sentiment surrounding those two campaigns. While it is not a guarantee that the news sentiment will be accurate at predicting the outcomes of events like elections, we see it as a valuable additional information source to incorporate into our models.
Please use your business email. If you don't have one, please email us at firstname.lastname@example.org.
We will process your personal data with the purpose of managing your personal account on
RavenPack and offering our services. You can exercise your rights of access, rectification,
erasure, restriction of processing, data portability and objection by emailing us at email@example.com. For more information, you can
Your request has been recorded and a team member will be in touch soon.
High inflation has returned in developed markets after decades of lying low. In our latest paper, we show how to build an inflation-based asset allocation strategy using sentiment data and we illustrate that sentiment-based strategies outperform models that depend merely on past observed inflation values.
This year's RavenPack Research Symposium brought two intense days of knowledge sharing in London and New York, from 25 top experts in natural language processing, quantitative investing and machine learning. Together, we explored how firms can leverage new language models to generate alpha, better manage risk and respond to calls for more sustainable investment practices.
Human capital is at the heart of value creation. Our latest research demonstrates how unprecedented workforce insights, sourced from over 200 million job postings, can generate more alpha.