We face several complex challenges, including the unique language used by microbloggers; the problem of source reputation; issues related to security and abuse of accounts and, lastly, whether news actually breaks on Twitter versus news sites.
These challenges were some of the topics I discussed at a presentation to over 500 delegates at the Global Derivatives Quant Conference in Amsterdam last week. My topic was “A New Era: Big News Data Disrupting Financial Markets" and it was well-received.
The unique language characteristics used by microbloggers can be partially solved by developing a custom lexicon or dictionary, for example, recognising a list of hashtags, cashtags, abbreviations and acronyms. But that doesn’t solve everything - what if someone mentions Total, the French energy company, but doesn’t use $TOT? The solution has to distinguish this is a proper noun and not a regular noun, adjective or verb by putting the word in context. That’s quite difficult when the normal rules of sentence structure don’t apply.
Now, assuming we can make out the entity (company, commodity, currency, organisation or place) that's being referred to, and the event type (corporate, geopolitical or economic), we have to be able to trust the source. A tweet coming from an average guy on the street is obviously less important than one from a hedge fund manager or activist investor. So there has to be some measure of “clout”, but more specifically of “financial clout”. The infamous Apple tweet from Carl Icahn after he had spoken to Tim Cook is a great example of financial clout. From the chart below you can see Icahn obviously has massive financial clout, even though he has relatively few followers.
So, the source management system has to consider the position of the person that owns the Twitter handle in additional to traditional measures of clout like the number of followers and the number of retweets. We haven't seen a good system that considers both yet, so I would still be hesitant in doing event based trading on individual tweets unless there’s a very good mechanism monitoring financial clout.
The Icahn case study does raise another topic, and that’s the abuse of a Twitter account - or even of one's financial clout. It could be alleged that Icahn deliberately manipulated the market with that tweet. After all, that tweet came out on August 13 2013, just a day after Icahn Enterprises L.P. filed an 8-K form telling us he’d be using Twitter to reveal “material information”. The tweet wasn't really news either - we were merely told Icahn “had a large position” and was bullish Apple. He had not even bought more stock.
It’s a grey area whether this use of Twitter was manipulative, but there is a real danger of a financial market participant engaging in market abuse via Twitter. Or of the Twitter account being compromised (there’s a long history of these occurrences). These problems are very difficult to solve using text analytics.
One possible answer to the abuse problem is to only rely on sources you know to be trustworthy, ie, a named list. While this should solve the issue, it undermines the purpose of mining Twitter for the market-moving news in the first place.
But how much of the unexpected news actually breaks on Twitter? Our research suggests less than you think - a view backed up by researchers at the Universities of Edinburgh and Glasgow. A lot of breaking news on Twitter comes from news publications trying to enhance traffic flow to their sites.
In summary, text analysis of social media is a very attractive opportunity - but fraught with difficulty. For now, users should focus on the ‘wisdom of the crowd’ rather than the ‘wisdom of one’, and when tempted to do the latter, make sure the originator has plenty of financial clout.