A Machine Learning-Based Trading Strategy Using Sentiment Analysis Data

Lucena Research | February 25, 2015

This study from Lucena Research shows how to use RavenPack Equity Indicators in conjunction with traditional factors to enhance portfolio returns.

Executive Summary:

In the study, Lucena Research uses RavenPack Equity Indicators in two strategies. First, Lucena constructs a portfolio by using the RP Indicators together with a 5-day momentum factor and, secondly, it combines them with other factors selected by Machine Learning in the Lucena QuantDesk® platform.

Lucena found that constructing portfolios using sentiment indicators jointly with traditional factors can result in significant outperformance versus the S&P 500 benchmark over their Jan 2005 to Nov 2014 backtesting period. In particular, with machine learning, Lucena finds P/E ratios and moving average crosses to work well with the sentiment indicators, delivering:

  • an outperformance of 339% against the benchmark over the period
  • a Sharpe Ratio of 0.83 versus 0.46

Trading Strategy

Trading Strategy

Trading Strategy

Abstract

We show how sentiment information in combination with a Machine Learning technique can provide a successful stock trading strategy. Specifically, we:

  • Create a predictive Machine Learning-based model for company stock prices based on the recent sentiment data;
  • Use that model as an input to build portfolios that are rebalanced weekly;
  • Simulate the performance of those portfolios.

This research leverages a dataset of sentiment indicators developed by RavenPack International S.L. RavenPack recently investigated the utility of these sentiment indicators to predict 5 day price reversion (see citation). We carry that work forward in the creation of a full predictive model that can be used to drive a trading strategy. Overall, our results indicate that the sentiment information has predictive value and is useful as part of a Machine Learning strategy that significantly outperforms the market from which the candidate equities are drawn.

Machine Learning-based Model Construction

Lucena Research has developed a suite of Machine Learning tools that facilitate the creation of statistical models for stock price prediction. Machine Learning is simply the use of historical cause/effect data to build a model to predict future cause/effect relationships. We typically refer to the factors or variables that may cause price changes as “observations” and the affected factors (such as price) as the “predictions.” Once the Machine Learning model is trained using example data, it can be consulted to make predictions based on the current values of factors.

Figure 1: Machine Learning Block Diagram For convenience, we refer to the components of the observation as X, and the forecast or prediction as Y. There are a number of algorithms that can be used to build a Machine Learning-based model, such as KNN, SVM or decision trees. The specific method we use in this case is proprietary. In this study, the data elements used for training include the following, for each stock in the S&P 500:

Trading Strategy

For convenience, we refer to the components of the observation as X, and the forecast or prediction as Y. There are a number of algorithms that can be used to build a Machine Learning-based model, such as KNN, SVM or decision trees. The specific method we use in this case is proprietary. In this study, the data elements used for training include the following, for each stock in S&P 500.

Assessment Methodology: Backtests & Benchmark

Strategy: We implemented a trading strategy in simulation as follows: On the first trading day of each week, we compute a forecast for each member of the S&P 500. We assess each decile (groups of 50) of stocks ranked from highest forecast to lowest, as follows: We enter an equally-weighted long position in each group of 50 stocks. Positions are held one week, and then rebalanced.

Benchmark: One purpose of this experiment is to determine the value sentiment indicators can provide to a strategy. Accordingly we built a benchmark approach by following the exact same procedures as described above for the experimental strategy, except with the sentiment indicators removed. The benchmark is then, essentially informed only by momentum.

The study was conducted using data since January 2005 until November 2014. For these decile backtests, we do not model transaction costs. We report the performance as cumulative return of the top and bottom deciles in...

Click here to continue reading the "A Machine Learning-Based Trading Strategy Using Sentiment Analysis Data" White Paper

Request White Paper

Request a Trial

Fill out the form below and see RavenPack in action.