Machine Learning & NLP for Long Term Investors

Wolfe Research | February 04, 2019

In a recent report, Wolfe Research developed a long/short strategy from news and social media. Our Chief Data Scientist provides an overview of their model below.

Here are the highlights:

Wolfe Research uses machine learning algorithms with RavenPack’s Natural Language Processing (NLP) engine to create an orthogonal trading signal. This signal is particularly valuable for longer term investing.

In summary, their model:

  • is particularly effective in predicting future stock returns in Asia ex Japan, Europe, US, and emerging EMEA, with a Sharpe Ratio up to 2.4
  • demonstrates decent predictive power in the US and developed markets ex US up to six months, with an IC of 1.6% and 2%, respectively
  • is relatively uncorrelated to traditional stock selection factors

The following figure highlights the consistent and impressive performance of the NLP Machine Learning model across the US market (Russell 3000) over the last 15 years.

NLP Machine Learning Model Cumulative Performance

Interesting quote from the Wolfe Research

"Having previously explored Thomson Reuters News Analytics, Recorded Future, News Quantified and of course, our favorite, RavenPack data ; news and web-based signals are not new to us. In this research, we continue our quest for alternative data by revisiting RavenPack Analytics – a leading data vendor on news sentiment and text analytics."

Comments from our Chief Data Scientist:

Peter Hafez, RavenPack

Great results for long term investing

In a recent report , Wolfe Research revisited the RavenPack dataset powered by their NLP engine with the goal of developing orthogonal signals for long-term investing. Taking advantage of RavenPack’s event detection algorithm, they focused on less common, but critical news events such as legal and regulatory issues, labor disputes, shareholder disclosures, and executive appointments.

As one of their key findings, they show how the market reaction to news and sentiment varies tremendously across different types of corporate events. For example, investors typically overreact to bad news related to layoffs and litigations, which leads to price reversal post news release. On the other hand, the overwhelmingly “boring” news on share buybacks and dividends are often overlooked, which produces a momentum effect.

Methodology: using NLP and Machine Learning

Using a combination of LASSO and xgBoost machine learning techniques, they take advantage of these complex and nonlinear relationships among news and corporate events to create a series of event based sentiment factors. In particular, they consider the following variables as part of their modelling efforts:

  • News Event . They rely on RavenPack’s news event classification algorithm, by focusing on Groups (Level II) and Type (Level III).
  • News Sentiment . They use three of the 8 pre-defined RavenPack sentiment measures – ESS, MCQ, and CSS – all computed over different rolling windows.
  • News Volume . They compute several news volume factors, i.e., frequency of news stories, over various look-back windows. News volume factors are all adjusted for size.
  • Volatility Impact . RavenPack offers a pre-defined NIP factor, i.e., predicted impact on stock volatility.
  • Market Behavioral Bias . They compute a series of event-day return (and abnormal trading volume) factors to measure how the market reacts to each event. Event-day returns (and abnormal trading volumes) are typically calculated on a five-day window, i.e., two days before and two days after each event. In addition, since many news events are sporadic, arriving at low frequency, they further aggregate their signals over a few rolling windows (one-, three-, and six-month).


Figure 21, from the report , shows the performance (as measured by rank IC) and coverage of the resulting event-based sentiment factor for each event type in the US. As can be observed, both coverage and performance vary greatly across event types. For example, layoffs and same-store sales have low coverage, but strong performance. On the other hand, insider-trades and ownership have broader coverage, but poor cross-sectional predictive power. Earnings- and revenue-related events have both strong coverage and performance. The underrepresented, low frequency events such as regulatory, legal and labor-issues, which RavenPack data provides, can be quite relevant to asset returns and are generally not well captured by traditional stock selection factors.

Performance of Event-Based Sentiment Factors in the US

Using the elastic net regression, they combine the event-based sentiment factors into a composite signal, which they refer to as the NICE model (News with Insightful Categorical Events). The model is shown to outperform conventional quantitative factors on a risk-adjusted basis, with long investment horizons and modest turnover (monthly autocorrelation of close to 70%, in line with price momentum factors). The NICE model provides good global coverage, covering 90% of the Russell 3000 universe in the US, 1,500 stocks in other developed markets, and over 1,500 firms in emerging markets.

The NICE model performance (based on a long/short quintile portfolio) has been consistent over the past decade with particularly strong performance in Asia ex Japan, Europe, US, and emerging EMEA, while it struggles in Canada and Japan. While the excess returns are high in China and ANZ, the NICE model volatilities are also large in these two regions. Although RavenPack data focuses exclusively on English language news, the NICE model has delivered exceptional performance in AxJ, with a Sharpe ratio of more than 2.0x, while Europe and US have delivered Sharpe ratios of around 1.1x and 1.2x, respectively (see Figure 26).

NLP Machine Learning Model Global Performance

Unlike conventional news sentiment factors (which show little predictive power beyond a month), the NICE model demonstrates decent predictive power up to six months in the US and even longer outside of the US (see Figure 27), with only modest correlation with traditional factors. In fact, combining the NICE model with Wolfe’s benchmark multifactor model, they are able to improve annualized alpha by 1.5%.

NLP Machine Learning Model Signal Decay

By providing your personal information and submitting your details, you acknowledge that you have read, understood, and agreed to our Privacy Statement and you accept our Terms and Conditions. We will handle your personal information in compliance with our Privacy Statement. You can exercise your rights of access, rectification, erasure, restriction of processing, data portability, and objection by emailing us at in accordance with the GDPRs. You also are agreeing to receive occasional updates and communications from RavenPack about resources, events, products, or services that may be of interest to you.

Data Insights

Read More