| June 27, 2017
At the recent “Battle of the Quants” event, which took place at the Union Club in New York, I was fortunate enough to give the keynote speech talking about “Exploiting Alternative Data in the Investment Process”. Given the feedback at the event, I wanted to share my views more broadly of what I believe is required to succeed in today’s world of quant investing.
As a company, RavenPack has been part of the quantitative investment community for almost 15 years and has been able to observe, first hand, how quant investing has risen in prominence over the years. According to the TABB Group, today quantitative hedge funds account for nearly 27% of all stock trading, which is more than any other investor type.
Combined with the explosive growth in the amount of digital data available and the massive influx of capital into quant funds, the alpha landscape has gone through major changes - something which is putting even more traditional quantitative investors under pressure. They need a new formula of success!
According to IDC, 90% of all digital data that exists today has been generated over the last two years, of which almost 80% comes as “hard to consume” unstructured content. This has created incredible opportunities for investors to identify new alpha sources that move beyond traditional fundamental and market data that have seen decreasing efficacy over recent years.
These new alternative data sources include anything from credit card transactions, satellite data, crowd-sourced data, location or foot traffic data to social media sentiment, etc.
In the early days, the most visionary investment firms were able to achieve an informational advantage in the market place by hiring dedicated teams of data hunters to scour the world for new and interesting datasets that no-one else were using.
However, as the market continues to mature, with more and more sell-side research providing fairly comprehensive overviews of available alternative data sources, this is becoming less of a differentiator.
Recently, J.P. Morgan released a well-received tour de force, titled
“Big Data and AI Strategies”, in which they put a host of alternative data providers, including RavenPack, under the microscope.
Today, the edge is no longer found in being the only one to have a particular dataset, rather it is all about efficient processing of what is already publicly available (or at least also available to your competitors). Thinking that a proprietary data advantage necessarily leads to a proprietary informational advantage is “old school thinking”, unless you’re Alphabet, Facebook, Amazon, Apple, and perhaps Microsoft.
Even though you may be able to achieve proprietary access to one particular dataset, there may be another 99 datasets out there that provide similar information. In the end, most alternative datasets are focused on providing a nowcast of fundamental data, i.e. both credit card transactions and location/foot traffic data can be used to forecast company revenues.
As already described, the big data and quant revolution has significantly impacted the alpha landscape, as seen in the figure below. Compared to the 1950-70s, where the cross-section of stock returns could be explained by just a few factors that had slow signal decay, today there are hundreds, if not thousands, of potential data-driven alpha sources that mostly have shorter durations
This is placing massive pressure on established firms, since they need to consume an ever increasing amount of data to achieve the necessary capacity to continue their growth, or even just to maintain their current level of AUM and performance.
Furthermore, since each individual alpha signal contains less marginal value, there is also an additional pressure on cost, i.e. investment firms need to be able to convert data into alpha signals at an ever cheaper rate to be able to capture the available alpha.
Successful investing is truly becoming a “numbers game”. At a high level, this means that we need an ever increasing amount of storage and computing power; and not to forget, data scientists. Unfortunately, we’re not yet at a stage where we can simply plug a bunch of data into an AI and expect that useful alpha signals will come out of it (and I doubt that we will get there anytime soon).
This introduces another challenge: how do investment firms ensure that they can recruit enough data scientists that can turn all their data into valuable alpha signals? Indeed, it isn’t just in finance that data scientists are in high demand. The “war on talent” is real.
It is no longer enough to only search for talent locally. Instead, you need to be able to dip into the global talent pool. To stay on top, several creative solutions have been seen in the market place.
For instance, Worldquant has already taken the physical growth approach and established several global offices. Other investment firms, such as Two Sigma and Winton Capital, have run several competitions on
(a Google-owned community of more than 500,000 data scientists) to recruit talented individuals from other data-driven industries.
Firms such as
have taken a different approach. Their entire business model is built around crowd-sourcing alpha signals and building a hedge fund on top of it, which results in having very little fixed overhead.
Instead, they rely on talented data scientists using their platform, data, and backtesting engine which have all been made freely available. Even though this model seems attractive as it offers a cost efficient way of tapping into the global talent pool, it also suffers from multiple issues.
An obvious question to ask is whether we truly believe that freelance data scientists have any chance competing with professional investors. For instance, Quantopian have only identified 50 individuals out of a total user-base of 130,000 data scientists with whom they are comfortable providing a capital allocation. Of course, this number may increase over time, however, with such small numbers, it resembles more the talent recruitment approach rather than being a “true” crowd-sourced hedge fund.
Another challenge that these platforms face is that it will be hard to convince institutional data vendors to expose their datasets at low cost to entire communities. Most often, data vendors require iron-tight contracts to protect not only their intellectual property but also their institutional price point. Allowing users only to consume data on the platform itself, with no download option, may be part of a solution.
However, there is still the issue of pricing. Numerai has tried to solve these issues by encrypting all of their content, placing their users completely in the dark about the data they work with. This turns the alpha construction process into a pure statistical inference exercise, where you, so to say, “let the data speak for itself”. A major drawback of such an approach is that you entirely remove the possibility of applying any sort of financial domain or data expertise - it’s all about the statistical modelling skills of the user.
In the long run, I’m curious to see whether the crowd-sourced hedge funds can keep their best talent, or whether there will be a brain-drain with the best data scientists leaving for the more established firms like Worldquant and Two Sigma, who already have significant capital available.
Currently, the best funded crowd-sourced hedge fund only has $250 million of committed capital, which is still a blip in the ocean in a trillion-dollar industry. It’s interesting to see that Worldquant has developed their own crowd-sourced algo platform called
websim. This should position them well should it “take off”.
Up until now, we haven’t given much thought to what is required in order to turn unstructured into structured content, something which is typically seen as an independent process to the actual alpha construction process. The obvious question is: “should you build or buy?”.
I’m not going to go too deep into this discussion, since I’m obviously biased. However, I’d like to highlight a few things to take into consideration before you go ahead developing your own natural language processing (NLP) capabilities. These considerations include:
We have already covered a lot of ground. However, there is still a lot of questions that I have left unanswered, such as how to combine alpha signals into an overall strategy, how to handle risk management and trade execution etc. These all require analyses that go beyond the scope of this writing, so they are best left for another time. Instead, let’s recap how I believe you can succeed as a quant investor using big data analytics:
If you want to learn more about what it takes to succeed with big data either as a quantitative or discretionary investor, join us at the upcoming
RavenPack Research Symposium
taking place on September 19th at 10 on the Park (Time Warner Center) in New York.
The keynote will be given by J.P. Morgan’s Global Head of Quantitative & Derivatives Strategy, Marko Kolanovic Rajesh, and T. Krishnamachari, VP at J.P. Morgan, co-authors of their landmark “Big Data & AI Strategies”
published in May 2017.
Please use your business email. If you don't have one, please email us at email@example.com.
We will process your personal data with the purpose of managing your personal account on
RavenPack and offering our services. You can exercise your rights of access, rectification,
erasure, restriction of processing, data portability and objection by emailing us at firstname.lastname@example.org. For more information, you can
Your request has been recorded and a team member will be in touch soon.
We consider incorporating sentiment signals from news, earnings call transcripts, and insider transactions to
boost the risk-adjusted returns, and revive factor performance.
We find stronger, more predictable market reactions when the words of company executives agree with their actions.
We have gathered 12 insights from 2021 research that can be leveraged in 2022.