Equity Portfolios with Machine Learning & Alternative Data

Richard Bateson, Director, Bateson Asset Management | May 22, 2018

View an extract of this session held at the London Big Data and Machine Learning Revolution event in April 2018.You can also access the full video and slides.

Combining alternative news and sentiment data with traditional signals can provide increased risk-adjusted returns in long/short equity portfolios. In this presentation we consider the application of Machine Learning techniques to capture these effects and explore non-linear approaches to alternative data.


Big Data Machine Learning Finance

The Machine Learning ‘Toolkit’

There are a lot of machine learning techniques. For classification and regression (diagram below) you can see two different data sets have different techniques and categorise the data.

Equity Portfolios

Today we will discuss one of the simplest ones, k-NN, which is the Nearest Neighbours technique but some of the others can give amazing overfitting power, like neural nets. But Python provides a great toolkit for most applications

Comparing ML Techniques

  • The benchmark ML dataset is the MNIST handwriting dataset (60,000 labelled samples, 28x28 pixels).
  • Lots of different ML techniques can tackle the MNIST handwritten dataset and achieve error rates of <1-2%.
  • In the financial markets the data is noisier and more limited.

Financial Markets - Only one history

  • If a model does not work in Training and Test then temptation is to change model and try again = big feedback problem in quant research.
  • More quants you have optimising/data-mining, the worse the problem (in absence of any new ideas).
  • Also market data is very ‘noisy’, nearly random and signals are weak (close to 50/50).
  • Machine Learning is “walk-forward” training and test. Keep models ‘simple’.
  • Financial Markets: Regime Change

    • Fundamentals - QT, QE, subprime, .com, rates, Russia/Asia/Eurozone crisis
    • Technicals - HFT (from 2019=, computing power/$, money in quant funds
    • Also model degradation from competition (<50% of alpha once published)
    • Machine Learning is adaptive to regime change (slowly!)

    Equity Portfolios

    Nearest Neighbours Approach

    • k-NN or k-Nearest Neighbours approach is the simplest of all ML techniques
    • Non-parametric, ‘lazy learning’ uses local approximation for decision boundary
    • Training examples are vectors in multi dimensional feature space
    • The output is the weighted sum over the k nearest neighbours of the object being assigned in the feature space
    • Weights can be (1/k) and include distance function from object
    • You can interpret the results (not a black box)

    k-NN for market Prediction

    • Use an inputs ‘Features’ rather than just raw data. Some ‘features’ are integrated variables eg Momentum. Semi supervised learning
    • Output is the average historical risk-adjusted return eg k-Nearest Neighbours

    Equity Portfolios

    Algorithmic vs ML/k-NN

    Equity Portfolios

    Alternative Data: Our Approach

    Equity Portfolios

    • Large cap equities news effects are short lived and decay rapidly (<2 days). Larger in mid/small cap. Sentiment Score effects smaller in large cap.
    • Effect of negative news is bigger than positive news and can lead to bigger sell-offs (asymmetry).
    • Reaction to news is non-linear and effect of persistent good news has saturation value. Bull market most news taken as positive!
    • Bigger effects for Abnormal News Volume (RavenPack ‘Buzz’). Higher news volume for high volume / ‘trendy’ / bullish stocks (eg. Apple, Tesla ...).
    • Surprise value is important. News can be already factored into current price (‘buy on rumor, sell on news’).

    k-NN Alternative Data Feature

    • Combine Sentiment Score and Abnormal News Volume into one Alternative Data Feature for k-NN providing SR -0.9
    • Abnormal News Volume is an approximation of surprise news events. Bigger contribution than Sentiment Score

    Equity Portfolios

    Non-Linear Alternative Data

    • News events can ‘interfere’ positively or negatively with existing Price based Features
    • k-NN automatically integrates these non-linear effects

    Equity Portfolios

    k-NN Model Account Curve

  • L/S portfolio has high long term SR -1.9 after costs with stable volatility and low equity correlation
  • Alternative data has around SR -0.3 contribution
  • 2017 was a touch year with all equities very bullish with little discrimination, although our Alternative Data Feature performed well
  • Conclusion

    • k-NN provides an easy non-linear technique for combining traditional and alternative equity portfolios signals.
    • Small performance increases due to alternative news data in large cap L/S equity portfolios.
    • Huge potential for more sophisticated data analysis
    • Better article interpretation: Follow history of each equity and better determine ‘surprise’ value of news article since has a higher impact.
    • Faster Trading: Large cap equities effects are short lived. Analyse articles and trade intraday BUT higher costs.
    • Better Features: Use better Features that have good concept or intuition and have an explanatory power. K-NN then provides a non-linear way of combining signals.



    By providing your personal information and submitting your details, you acknowledge that you have read, understood, and agreed to our Privacy Statement and you accept our Terms and Conditions. We will handle your personal information in compliance with our Privacy Statement. You can exercise your rights of access, rectification, erasure, restriction of processing, data portability, and objection by emailing us at privacy@ravenpack.com in accordance with the GDPRs. You also are agreeing to receive occasional updates and communications from RavenPack about resources, events, products, or services that may be of interest to you.

    Data Insights

    Read More