Combining alternative news and sentiment data with traditional signals can provide increased risk-adjusted returns in long/short equity portfolios. In this presentation we consider the application of Machine Learning techniques to capture these effects and explore non-linear approaches to alternative data.
The Machine Learning ‘Toolkit’There are a lot of machine learning techniques. For classification and regression (diagram below) you can see two different data sets have different techniques and categorise the data.
Today we will discuss one of the simplest ones, k-NN, which is the Nearest Neighbours technique but some of the others can give amazing overfitting power, like neural nets. But Python provides a great toolkit for most applications
Comparing ML Techniques
- The benchmark ML dataset is the MNIST handwriting dataset (60,000 labelled samples, 28x28 pixels).
- Lots of different ML techniques can tackle the MNIST handwritten dataset and achieve error rates of <1-2%.
- In the financial markets the data is noisier and more limited.
Financial Markets - Only one history
Financial Markets: Regime Change
- Fundamentals - QT, QE, subprime, .com, rates, Russia/Asia/Eurozone crisis
- Technicals - HFT (from 2019=, computing power/$, money in quant funds
- Also model degradation from competition (<50% of alpha once published)
- Machine Learning is adaptive to regime change (slowly!)
Nearest Neighbours Approach
- k-NN or k-Nearest Neighbours approach is the simplest of all ML techniques
- Non-parametric, ‘lazy learning’ uses local approximation for decision boundary
- Training examples are vectors in multi dimensional feature space
- The output is the weighted sum over the k nearest neighbours of the object being assigned in the feature space
- Weights can be (1/k) and include distance function from object
- You can interpret the results (not a black box)
k-NN for market Prediction
- Use an inputs ‘Features’ rather than just raw data. Some ‘features’ are integrated variables eg Momentum. Semi supervised learning
- Output is the average historical risk-adjusted return eg k-Nearest Neighbours
Algorithmic vs ML/k-NN
Alternative Data: Our Approach
- Large cap equities news effects are short lived and decay rapidly (<2 days). Larger in mid/small cap. Sentiment Score effects smaller in large cap.
- Effect of negative news is bigger than positive news and can lead to bigger sell-offs (asymmetry).
- Reaction to news is non-linear and effect of persistent good news has saturation value. Bull market most news taken as positive!
- Bigger effects for Abnormal News Volume (RavenPack ‘Buzz’). Higher news volume for high volume / ‘trendy’ / bullish stocks (eg. Apple, Tesla ...).
- Surprise value is important. News can be already factored into current price (‘buy on rumor, sell on news’).
k-NN Alternative Data Feature
- Combine Sentiment Score and Abnormal News Volume into one Alternative Data Feature for k-NN providing SR -0.9
- Abnormal News Volume is an approximation of surprise news events. Bigger contribution than Sentiment Score
Non-Linear Alternative Data
- News events can ‘interfere’ positively or negatively with existing Price based Features
- k-NN automatically integrates these non-linear effects
k-NN Model Account Curve
- k-NN provides an easy non-linear technique for combining traditional and alternative equity portfolios signals.
- Small performance increases due to alternative news data in large cap L/S equity portfolios.
- Huge potential for more sophisticated data analysis
- Better article interpretation: Follow history of each equity and better determine ‘surprise’ value of news article since has a higher impact.
- Faster Trading: Large cap equities effects are short lived. Analyse articles and trade intraday BUT higher costs.
- Better Features: Use better Features that have good concept or intuition and have an explanatory power. K-NN then provides a non-linear way of combining signals.