How to use Natural Language Processing for Multi-Topic Quant Investing

In the face of an ever-increasing amount of financial news, investors need the right tools to cut through the noise to uncover the signal behind the latest move in the markets.

RavenPack comes to the aid of market professionals with its state-of-the-art Natural Language Processing (NLP) solution, RavenPack Analytics (RPA), which effortlessly assigns news stories on-the-fly to one of nearly 6,900 event categories in 56 broader groups - covering anything from earnings and analyst ratings to mergers and litigation issues.

Hot off the press, our newest research paper “A Multi-Topic Approach To Building Quant Models” shows how to account for asymmetry via group-specific novelty and event relevance filters to achieve higher returns – both risk-adjusted and in absolute terms. This is accomplished by utilizing the brand-new Event Relevance Score (REL) in combination with the revamped Event Similarity Days Score (SIM), produced by RavenPack’s proprietary NLP techniques. These metrics allow investors to identify the most novel and relevant events in specific groups for trading. The REL is a score on the scale 0-100 based on where in the news story an event is detected – the higher, the better – while the SIM designates the number of days since a similar event was last detected.

First, we build on knowledge acquired in a previous paper (“Introducing RavenPack Analytics for Equities”) to construct benchmarks based on highly novel and relevant events, i.e. those with REL and SIM of 90+. Table 1 shows the benchmark results (event volume-weighted across groups) across our two regions, the U.S. and Europe, and market capitalizations, “Large/Mid-Cap” and “Small-Cap”.

alt text

RavenPack Analytics delivers solid risk-adjusted returns across the four universes, in particular for small-cap companies. Taking advantage of the vastly expanded taxonomy and new premium news sources, such as Benzinga Pro and FactSet, RPA yields a four-fold increase in detections over its predecessor, RavenPack News Analytics (“RPNA”) 4.0. This results in higher returns, both on an absolute and risk-adjusted basis, larger portfolios, and better hit ratios.

Impressive as the results above are, they are based on complete symmetry across all event groups. Intuitively, this seems like an overly restrictive constraint. Some groups may perform better by allowing less relevant events to be included with the reason being that these groups struggle to carry a news story on their own. For example, consider the group Equity Actions. News about stock buybacks and reorganizations are typically buried deeper in a news story – in particular for smaller companies. This results in less coverage as many readers are only exposed to the first part of a story, but the news may still move the price of a stock.

In Figure 4 we divide news into high and low event relevance with 90 being the delimiter. While high event relevance (i.e. 90+) yields higher Information Ratios for many groups, this is not true across the board. This observation supports the case for the inclusion of “less relevant” news in the trading signal construction.

alt text

We proceed to build new trading signals based on customized novelty and relevance filters at the group level. We allow the event relevance threshold to vary from 0 to 90 in steps of 10 and the novelty threshold to vary from 1 over 7 and 30 to 90 (i.e. one day, one week, one month, and three months). The optimal results are presented in the below table.

alt text

RavenPack Analytics delivers superior performance in all four universes – both compared to the benchmark results and compared to optimized RPNA 4.0 results. The average improvement in Information Ratio across the four universes is 0.62 compared to RPNA 4.0 and is based on a 75% uptick in the number of event groups with statistically significant returns. The benchmark settings (REL and SIM of 90+) are only chosen for 4.2% of the groups, demonstrating the need for asymmetry in novelty and event relevance.

Overall, the case for asymmetry in the selection of event relevance and novelty filters is supported by the evidence with higher risk-adjusted returns across the four universes – compared not only to the benchmark but also to the predecessor. For a more thorough analysis, read our paper “A Multi-Topic Approach To Building Quant Models” .

Discover the RavenPack Platform

We offer more than 70 curated datasets and powerful visualisation tools.