| May 14, 2014
As we expand RavenPack's coverage from regular news sources to social media, including Twitter.
We face several complex challenges, including the unique language used by microbloggers; the problem of source reputation; issues related to security and abuse of accounts and, lastly, whether news actually breaks on Twitter versus news sites.
These challenges were some of the topics I discussed at a presentation to over 500 delegates at the Global Derivatives Quant Conference in Amsterdam last week. My topic was “A New Era: Big News Data Disrupting Financial Markets" and it was well-received.
unique language characteristics used by microbloggers
can be partially solved by developing a custom lexicon or dictionary, for example, recognising a list of hashtags, cashtags, abbreviations and acronyms. But that doesn’t solve everything - what if someone mentions Total, the French energy company, but doesn’t use $TOT? The solution has to distinguish this is a proper noun and not a regular noun, adjective or verb by putting the word in context. That’s quite difficult when the normal rules of sentence structure don’t apply.
Now, assuming we can make out the entity (company, commodity, currency, organisation or place) that's being referred to, and the event type (corporate, geopolitical or economic),
we have to be able to trust the source
. A tweet coming from an average guy on the street is obviously less important than one from a hedge fund manager or activist investor. So there has to be some measure of “clout”, but more specifically of
. The infamous Apple tweet from Carl Icahn after he had spoken to Tim Cook is a great example of financial clout. From the chart below you can see Icahn obviously has massive financial clout, even though he has relatively few followers.
So, the source management system has to consider the position of the person that owns the Twitter handle in additional to traditional measures of clout like the number of followers and the number of retweets. We haven't seen a good system that considers both yet, so I would still be hesitant in doing event based trading on individual tweets unless there’s a very good mechanism monitoring financial clout.
The Icahn case study does raise another topic, and that’s the abuse of a Twitter account - or even of one's financial clout. It could be alleged that Icahn deliberately manipulated the market with that tweet. After all, that tweet came out on August 13 2013, just a day after Icahn Enterprises L.P. filed an
telling us he’d be using Twitter to reveal “material information”. The tweet wasn't really news either - we were merely told Icahn “had a large position” and was bullish Apple. He had not even bought more stock.
It’s a grey area whether
this use of Twitter
was manipulative, but there is a real danger of a financial market participant engaging in market abuse via Twitter. Or of the Twitter account being compromised (there’s a long history of these occurrences). These problems are very difficult to solve using text analytics.
One possible answer to the abuse problem is to only rely on sources you know to be trustworthy, ie, a named list. While this should solve the issue, it undermines the purpose of mining Twitter for the market-moving news in the first place.
But how much of the unexpected news actually breaks on Twitter? Our research suggests less than you think - a view backed up by
at the Universities of Edinburgh and Glasgow. A lot of breaking news on Twitter comes from news publications trying to enhance traffic flow to their sites.
In summary, text analysis of social media is a very attractive opportunity - but fraught with difficulty. For now, users should focus on the ‘wisdom of the crowd’ rather than the ‘wisdom of one’, and when tempted to do the latter, make sure the originator has plenty of financial clout.
Please use your business email. If you don't have one, please email us at email@example.com.
We will process your personal data with the purpose of managing your personal account on
RavenPack and offering our services. You can exercise your rights of access, rectification,
erasure, restriction of processing, data portability and objection by emailing us at firstname.lastname@example.org. For more information, you can
Your request has been recorded and a team member will be in touch soon.
High inflation has returned in developed markets after decades of lying low. In our latest paper, we show how to build an inflation-based asset allocation strategy using sentiment data and we illustrate that sentiment-based strategies outperform models that depend merely on past observed inflation values.
This year's RavenPack Research Symposium brought two intense days of knowledge sharing in London and New York, from 25 top experts in natural language processing, quantitative investing and machine learning. Together, we explored how firms can leverage new language models to generate alpha, better manage risk and respond to calls for more sustainable investment practices.
Human capital is at the heart of value creation. Our latest research demonstrates how unprecedented workforce insights, sourced from over 200 million job postings, can generate more alpha.