How to Succeed in Quant Investing with Big Data Analytics

As a company, RavenPack has been part of the quantitative investment community for almost 15 years and has been able to observe, first hand, how quant investing has risen in prominence over the years. According to the TABB Group, today quantitative hedge funds account for nearly 27% of all stock trading, which is more than any other investor type.

Combined with the explosive growth in the amount of digital data available and the massive influx of capital into quant funds, the alpha landscape has gone through major changes - something which is putting even more traditional quantitative investors under pressure. They need a new formula of success!

Data is growing at an incredible speed

According to IDC, 90% of all digital data that exists today has been generated over the last two years, of which almost 80% comes as “hard to consume” unstructured content. This has created incredible opportunities for investors to identify new alpha sources that move beyond traditional fundamental and market data that have seen decreasing efficacy over recent years.

These new alternative data sources include anything from credit card transactions, satellite data, crowd-sourced data, location or foot traffic data to social media sentiment, etc.

alt text

In the early days, the most visionary investment firms were able to achieve an informational advantage in the market place by hiring dedicated teams of data hunters to scour the world for new and interesting datasets that no-one else were using.

However, as the market continues to mature, with more and more sell-side research providing fairly comprehensive overviews of available alternative data sources, this is becoming less of a differentiator.

Recently, J.P. Morgan released a well-received tour de force, titled “Big Data and AI Strategies”, in which they put a host of alternative data providers, including RavenPack, under the microscope.

Will proprietary data access provide an edge?

Today, the edge is no longer found in being the only one to have a particular dataset, rather it is all about efficient processing of what is already publicly available (or at least also available to your competitors). Thinking that a proprietary data advantage necessarily leads to a proprietary informational advantage is “old school thinking”, unless you’re Alphabet, Facebook, Amazon, Apple, and perhaps Microsoft.

Even though you may be able to achieve proprietary access to one particular dataset, there may be another 99 datasets out there that provide similar information. In the end, most alternative datasets are focused on providing a nowcast of fundamental data, i.e. both credit card transactions and location/foot traffic data can be used to forecast company revenues.

The alpha landscape has changed!

As already described, the big data and quant revolution has significantly impacted the alpha landscape, as seen in the figure below. Compared to the 1950-70s, where the cross-section of stock returns could be explained by just a few factors that had slow signal decay, today there are hundreds, if not thousands, of potential data-driven alpha sources that mostly have shorter durations

This is placing massive pressure on established firms, since they need to consume an ever increasing amount of data to achieve the necessary capacity to continue their growth, or even just to maintain their current level of AUM and performance.

Furthermore, since each individual alpha signal contains less marginal value, there is also an additional pressure on cost, i.e. investment firms need to be able to convert data into alpha signals at an ever cheaper rate to be able to capture the available alpha.

alt text

The war on talent continues...

Successful investing is truly becoming a “numbers game”. At a high level, this means that we need an ever increasing amount of storage and computing power; and not to forget, data scientists. Unfortunately, we’re not yet at a stage where we can simply plug a bunch of data into an AI and expect that useful alpha signals will come out of it (and I doubt that we will get there anytime soon).

This introduces another challenge: how do investment firms ensure that they can recruit enough data scientists that can turn all their data into valuable alpha signals? Indeed, it isn’t just in finance that data scientists are in high demand. The “war on talent” is real.

It is no longer enough to only search for talent locally. Instead, you need to be able to dip into the global talent pool. To stay on top, several creative solutions have been seen in the market place.

For instance, Worldquant has already taken the physical growth approach and established several global offices. Other investment firms, such as Two Sigma and Winton Capital, have run several competitions on Kaggle (a Google-owned community of more than 500,000 data scientists) to recruit talented individuals from other data-driven industries.

The crowd-sourced alpha revolution?

Firms such as Numerai , Quantiacs , and Quantopian have taken a different approach. Their entire business model is built around crowd-sourcing alpha signals and building a hedge fund on top of it, which results in having very little fixed overhead.

Instead, they rely on talented data scientists using their platform, data, and backtesting engine which have all been made freely available. Even though this model seems attractive as it offers a cost efficient way of tapping into the global talent pool, it also suffers from multiple issues.

An obvious question to ask is whether we truly believe that freelance data scientists have any chance competing with professional investors. For instance, Quantopian have only identified 50 individuals out of a total user-base of 130,000 data scientists with whom they are comfortable providing a capital allocation. Of course, this number may increase over time, however, with such small numbers, it resembles more the talent recruitment approach rather than being a “true” crowd-sourced hedge fund.

Another challenge that these platforms face is that it will be hard to convince institutional data vendors to expose their datasets at low cost to entire communities. Most often, data vendors require iron-tight contracts to protect not only their intellectual property but also their institutional price point. Allowing users only to consume data on the platform itself, with no download option, may be part of a solution.

However, there is still the issue of pricing. Numerai has tried to solve these issues by encrypting all of their content, placing their users completely in the dark about the data they work with. This turns the alpha construction process into a pure statistical inference exercise, where you, so to say, “let the data speak for itself”. A major drawback of such an approach is that you entirely remove the possibility of applying any sort of financial domain or data expertise - it’s all about the statistical modelling skills of the user.

In the long run, I’m curious to see whether the crowd-sourced hedge funds can keep their best talent, or whether there will be a brain-drain with the best data scientists leaving for the more established firms like Worldquant and Two Sigma, who already have significant capital available.

Currently, the best funded crowd-sourced hedge fund only has $250 million of committed capital, which is still a blip in the ocean in a trillion-dollar industry. It’s interesting to see that Worldquant has developed their own crowd-sourced algo platform called websim. This should position them well should it “take off”.

Turning unstructured into structured: should you build or buy?

Up until now, we haven’t given much thought to what is required in order to turn unstructured into structured content, something which is typically seen as an independent process to the actual alpha construction process. The obvious question is: “should you build or buy?”.

I’m not going to go too deep into this discussion, since I’m obviously biased. However, I’d like to highlight a few things to take into consideration before you go ahead developing your own natural language processing (NLP) capabilities. These considerations include:

IT infrastructure : This step requires a fairly big investment before you have a setup that can handle large datasets of raw unstructured content. Sure, the cloud has made things easier when it comes to scalable storage and computing power, but it still comes with a serious price tag.
Data cleaning (and service maintenance) : Addressing data quality is non-trivial. You’d be surprised how many issues you run into, including anything from bad encoding, lack of paragraphs, spelling mistakes, bad timestamps, gaps in history, bad metadata etc., and who should you call if the service suddenly stops working. Who will get out of bed in the middle of the night?
Maintenance of reference data : Again, another non-trivial task. In order to make any data relevant for trading, you need to be able to relate it to some tradable security. The best way of doing so is to link content to particular named entities in a point-in-time sensitive fashion (after all, we’re quants). Unfortunately, entities change over time: companies are acquired or go out of business, government organizations change, people die, etc. Unfortunately, you can’t just buy these types of entity databases “off-the-shelf”.
Building expertise in NLP : Lastly, building expertise in natural language processing takes time and is expensive. You need to hire the right people, and as with everything else, NLP is partly art, partly science. Even though word2vec is an impressive algorithm, it is not a surefire way to success in finance (I know, I have tried). Do you really want to fight another “talent war”?

What it takes to succeed!

We have already covered a lot of ground. However, there is still a lot of questions that I have left unanswered, such as how to combine alpha signals into an overall strategy, how to handle risk management and trade execution etc. These all require analyses that go beyond the scope of this writing, so they are best left for another time. Instead, let’s recap how I believe you can succeed as a quant investor using big data analytics:

Consume as many datasets as possible. Obviously, only the predictive ones.
Hire RavenPack to handle all of your unstructured content.
Recruit a bunch of data scientists, but only the ones that can build an amazing AI system, where you can plug in any data and alpha comes out.
... Profit!

If you want to learn more about what it takes to succeed with big data either as a quantitative or discretionary investor, join us at the upcoming RavenPack Research Symposium taking place on September 19th at 10 on the Park (Time Warner Center) in New York.

The keynote will be given by J.P. Morgan’s Global Head of Quantitative & Derivatives Strategy, Marko Kolanovic Rajesh, and T. Krishnamachari, VP at J.P. Morgan, co-authors of their landmark “Big Data & AI Strategies” report published in May 2017.

How to Succeed in Quant Investing with Big Data Analytics

Data is growing at an incredible speed

Will proprietary data access provide an edge?

The alpha landscape has changed!

The war on talent continues...

The crowd-sourced alpha revolution?

Turning unstructured into structured: should you build or buy?

What it takes to succeed!

Thank you for your request!

Data Insights

Read More

Company-level

Macro-level