How to use Natural Language Processing in Quant Investing

RavenPack | July 11, 2017

In the face of an ever-increasing amount of financial news, investors need the right tools to cut through the noise to uncover the signal behind the latest move in the markets.

RavenPack comes to the aid of market professionals with its state-of-the-art Natural Language Processing (NLP) solution. RavenPack Analytics (RPA) effortlessly categorizes news stories on-the-fly to one of nearly 6,900 categories in 56 broader groups - covering anything from earnings and analyst ratings to mergers and litigation issues.

Hot off the press, our newest research paper “A Multi-Topic Approach To Building Quant Models” shows how to account for asymmetry via group-specific novelty and event relevance filters to achieve higher returns – both risk-adjusted and in absolute terms. This is accomplished by utilizing the brand-new Event Relevance Score (REL) in combination with the revamped Event Similarity Days Score (SIM), produced using RavenPack’s proprietary NLP techniques. These metrics allow investors to identify the most novel and relevant events in specific groups for trading. The REL is a score on the scale 0-100 based on where in the news story an event is detected – the higher, the better – while the SIM designates the number of days since a similar event was last detected.

First, we build on knowledge acquired in a previous paper ( “Introducing RavenPack Analytics for Equities” ) to construct benchmarks based on highly novel and relevant events, i.e. those with REL and SIM of 90+. Table 1 shows the benchmark results (event volume-weighted across groups) across our two regions, the U.S. and Europe, and market capitalizations, “Large/Mid-Cap” and “Small-Cap”.

Natural language processing

RavenPack Analytics delivers solid risk-adjusted returns across the four universes, in particular for small-cap companies. Taking advantage of the vastly expanded taxonomy and premium news sources, such as Benzinga Pro and FactSet, RPA yields a four-fold increase in detections over its predecessor, RavenPack News Analytics (“RPNA”) 4.0. This results in higher returns, both on an absolute and risk-adjusted basis, larger portfolios, and better hit ratios.

Impressive as the results above are, they are based on complete symmetry across all event groups. Intuitively, this seems like an overly restrictive constraint. Some groups may perform better by allowing less relevant events to be included with the reason being that these groups struggle to carry a news story on their own. For example, consider the group Equity Actions. News about stock buybacks and reorganizations are typically buried deeper in a news story – in particular for smaller companies. This results in lower audience exposure as most readers only read the first part of a story, but the news may still move the price of a stock.

In Figure 4 we divide news into high and low event relevance with a score of 90 (out of 100) being the delimiter. While high event relevance (i.e. 90+) yields higher Information Ratios for many groups, this is not true across the board. This observation supports the case for the inclusion of “less relevant” news in the trading signal construction.

Natural language processing

We proceed to build new trading signals based on customized novelty and relevance filters at the group level. We allow the event relevance threshold to vary from 0 to 90 in steps of 10 and the novelty threshold to vary from 1 over 7 and 30 to 90 (i.e. one day, one week, one month, and three months). The optimal results are presented in the table below.

Natural language processing

RPA delivers superior performance in all four universes – both compared to the benchmark results and compared to optimized RPNA 4.0 results. The average improvement in Information Ratio across the four universes is 0.62 compared to RPNA 4.0 and is based on a 75% uptick in the number of event groups with statistically significant returns. The benchmark settings (REL and SIM of 90+) are only chosen for 4.2% of the groups, demonstrating the need for asymmetry in novelty and event relevance.

Overall, the case for asymmetry in the selection of event relevance and novelty filters is supported by evidence of higher risk-adjusted returns across the four universes – compared not only to the benchmark but also to the predecessor product. For a more details, read our white paper “A Multi-Topic Approach To Building Quant Models” .

By providing your personal information and submitting your details, you acknowledge that you have read, understood, and agreed to our Privacy Statement and you accept our Terms and Conditions. We will handle your personal information in compliance with our Privacy Statement. You can exercise your rights of access, rectification, erasure, restriction of processing, data portability, and objection by emailing us at in accordance with the GDPRs. You also are agreeing to receive occasional updates and communications from RavenPack about resources, events, products, or services that may be of interest to you.

Data Insights

Read More