Charting Data Science Insights from 2023

January 15, 2024

Peter Hafez, Chief Data Scientist at RavenPack, looks back on a year of rich research insights and charts the course for 2024.

Peter Hafez picture

Peter Hafez

Chief Data Scientist


Originally published on LinkedIn

Stepping into a new year, I'm thrilled to share reflections on what have been 12 months marked by extensive research and invaluable insights from the RavenPack Data Science team. Throughout 2023, our collective exploration encompassed a myriad of research themes, each contributing significantly to our profound comprehension of the intricate dynamics that define the financial markets.

Within this exploration, the impact of Large Language Models (LLMs) has emerged as particularly noteworthy. Throughout 2023, we witnessed the continued maturation of LLMs, including GPT-4, Mistral, LLAMA2, and more, showcasing their transformative influence on the financial industry. These sophisticated models have not only heightened the precision of language processing but have also fundamentally reshaped how we extract and interpret financial information.

As we step into 2024, the role of LLMs in financial analytics is poised to evolve further, promising greater accuracy, efficiency, and depth in uncovering insights.

At RavenPack, we've diligently employed LLMs in a wide array of applications, ranging from retrieval-augmented generation (RAG) to sentiment analysis and theme detection. Nonetheless, there is still substantial ground to cover in comprehending how these models can improve efficiency across diverse use cases in both quantitative and discretionary investing. The question remains: Can they surpass traditional and more scalable NLP techniques in rigorous backtests? In this article, I unpack the hype and analyze the impact and limitations of LLMs in stock price predictions. Specifically, I address the findings from the paper 'Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models'.

The question remains: Can LLMs surpass traditional and more scalable NLP techniques in rigorous backtests?

Now, let us delve into the key themes that have shaped our research journey in 2023 and explore their implications for the financial industry in 2024.

Enhancing Language Models Using RavenPack Annotations

Earlier this year, we explored the convergence of data and prediction in financial research, revealing the synergies between RavenPack Annotations, the latest product from RavenPack, and the open-source FinBERT model—a language model predating the more recently acclaimed LLMs.

RavenPack Annotations contain timestamped sentence-level textual data, enriched with a comprehensive set of point-in-time data. This data encompasses a wealth of detailed information, including entity detections, event information and sentiment scores, all with respective coordinates of where they appear in the text. The RavenPack Annotations product offers a vast data collection that can easily be filtered to select customized, high quality training datasets for countless targeted use cases, especially within the financial domain.

Our primary focus was predicting stock market movements, a challenge met through the fine-tuning of the FinBERT model using the rich annotations from RavenPack. Our research involved constructing a sentiment inference framework, converting raw data into daily sentiment indicators—an essential element in the prediction landscape.

Our research demonstrated the robust enhancement of the raw FinBERT model across diverse investment universes by relying on RavenPack Annotations for filtering relevant and novel content at company level.

2024 lookout

The utility of RavenPack Annotations extends beyond equities to various asset classes, offering a distinctive opportunity for training and fine-tuning text-based models with minimal data processing time and effort. In addition to supporting this research, RavenPack Annotations will fuel our internal efforts related to sentiment and thematic workflows, leveraging LLMs and targeted embeddings.

Thematic Investing: Innovation & Technology Adoption

In our exploration of thematic investing, we ventured into the realm of media attention surrounding R&D and firm-specific innovation, uncovering a robust predictor of global stock price outperformance. Our study, spanning from 2011 to 2023, delved into the profitability of equity trading strategies harnessing news-based metrics of firm-level innovation across diverse regions. Employing Natural Language Processing (NLP) techniques, we real-time monitor news volume related to specific innovative concepts, establishing a foundation for thematic investing. Through sorting equally-weighted equity portfolios, we consistently observe that companies drawing significant media attention for their innovative endeavors exhibit outperformance compared to their less innovative counterparts. This trend persists across various geographical markets, including Global, U.S., Europe, and Asia-Pacific, affirming the presence of a priced equity risk factor.

Diving deeper into the innovation landscape, we examined the adoption of new technologies within companies, focusing on the mention of skills in corporate job postings. Our research underscores that companies emphasizing the acquisition of new technology skills in their hiring processes tend to outperform their peers from an investment standpoint. Through an analysis of the correlation between the volume of new technology skills sought in job postings and a company's subsequent stock price performance, we unveil how these skill mentions serve as a potential alpha source for investors. Monitoring companies' hiring needs through their job postings offers distinctive insights into their financial health and strategic direction.

2024 lookout

In 2024 we aim to demonstrate the seamless utilization of our custom Job Factors Library to extract unique insights from millions of job postings that provide value to both quantitative and discretionary investors. This endeavor marks another step in our ongoing thematic journey.

Sector Rotation in European Markets: Navigating Earnings News and Controversies

In pursuit of effective stock selection, RavenPack's Data Science team has harnessed Earnings Intelligence not only for individual stock assessment but also for constructing sector scores, forming the basis for meticulous sector rotation strategies. Our research unveils the forward-looking sentiment embedded in earnings-related events—such as estimates, revisions, and guidance captured in news—at the sector level, proving instrumental in predicting future performance.

The efficacy of our sector-rotation strategies stands out, surpassing random selection and consistently outperforming zero-skill strategies with high confidence. Demonstrating its prescient nature, the strategy adeptly timed the Great Value Rotation by detecting weakened earnings sentiment during rates repricing and global recession fears. This resulted in a strategic shift from growth sectors to value and high-dividend stocks, with an added boost to the energy sector fueled by geopolitical tensions. Notably, the exclusion of stocks entangled in corporate controversy further enhanced performance at the sector level.

While recent studies have focused on European markets, parallel success has been observed in the US.

2024 lookout

Our exploration extends beyond earnings, delving into intangible assets and their associated sentiment for sector rotation—an avenue we plan to delve into more deeply in early 2024. Additionally, our sights are set on expanding sector rotation strategies into Asian markets. Here, we've already applied earnings call transcript sentiment to tilt the Topix index successfully, paving the way for further innovations in this domain.

Trading the News with Dynamic Feature Hierarchy Trees

While earnings sentiment has traditionally guided predictions of outperformance, we also embraced a more expansive approach to extract additional value from our extensive dataset. In our 2023 exploration efforts, we immersed ourselves in the intricate tapestry of events and analytics by RavenPack. Leveraging a dynamic framework featuring Feature Hierarchy Trees, we dynamically generated and fused signals, capturing the evolving performance of features over time. The model's multidimensional nature, coupled with an iterative dimensionality reduction process, ensures transparency, providing valuable insights into feature behavior.

In particular, our model—segmented by region and market capitalization—outperforms benchmark models across various parameter configurations. Its dynamic adaptability accommodates shifts in news content and market conditions over time. While conventional financial news such as earnings or analyst ratings produces robust signals, our model underscores the growing importance of monitoring a broader array of events. Guidance events and the escalating impact of web news content are also acknowledged as significant predictors.

The model's resilience and flexibility shine in diverse parameter configurations, allowing us to fine-tune the balance between signal strength and volume by adjusting confidence and relevance thresholds. Its adaptability to different market conditions is evident, with US markets benefitting from shorter training samples for a more granular and dynamic approach, while European and Asian markets thrive on longer samples for increased signal volume.

2024 lookout

We plan to extend this framework by exploring the possibility of introducing greater feature complexity. This might entail integrating reversal signals, often unfolding over extended time horizons, making it an appealing avenue for reducing signal turnover.

Trading Credit Events in Corporate Bond Markets

Utilizing the RavenPack Taxonomy as a set of event building blocks for focused analysis, we also demonstrated the substantial impact of selecting credit-related news events on corporate bond prices. Conditioning announcements linked to credit ratings, analyst ratings, and price targets on past performance and news polarity generates long-lasting outperformance, with news gradually influencing prices. Notably, positive news following poor bond performance yields significant abnormal returns on the announcement day and sustained post-event outperformance, especially for credit and insider trading-related events. This suggests that unexpected announcements have a pronounced effect on prices in the short term. While we observe credit risk agencies and analysts downgrading bond issuers after adverse news and negative past performance, bond markets tend to trade against positive news, showcasing the "buy the rumor, sell the news" phenomenon.

2024 lookout

The growing interest in incorporating sentiment data into corporate bond pricing models is a noteworthy trend, and it's an avenue we plan to delve deeper into in the coming year.

Nowcasting: A Sentiment-Based Approach

Our research in 2023 extended beyond equities and credit markets. In the midst of economic uncertainties, our exploration into the nowcasting of US inflation involved deploying a Bayesian neural network to predict inflation rates, utilizing real-time analytics from RavenPack. The results underscored the pivotal role of sentiment data in predicting inflation and inflation uncertainty, offering valuable insights for navigating economic uncertainties. With this model, we crafted multi-asset allocation strategies that integrated fixed income and commodities investments as hedges against inflation risk within the S&P 500. By leveraging inflation nowcasts and RavenPack sentiment analytics, our approach not only outpaced the S&P 500 but also yielded impressive risk-adjusted returns. The risk-targeted approach introduced an additional layer of stability, showcasing the adaptability of our strategy across diverse market conditions.

Expanding our focus, we delved into nowcasting both US and China GDP to capture economic activity in two of the world's largest economies, both driving forces in the global economy. Integrating real-time news analytics into our nowcasts provided us with a distinct information advantage over more traditional macroeconomic inputs, resulting in significant improvements in out-of-sample forecasting errors. RavenPack Analytics significantly improves nowcasting by providing more timely and accurate insights, highlighting the importance of real-time news sentiment. Particularly valuable in data-scarce environments, RavenPack sentiment proves its worth. For example, our data is particularly informative around Chinese New Year, compensating for the pause in traditional data releases. Tracking news-driven economic guidance ensures a continuous stream of information.

Harnessing News Sentiment for FX Futures Strategies

Lastly, in one of our client favorites, we explored how sentiment derived from macroeconomic and foreign exchange news could serve as a predictor of future currency performance. Specifically, we delved into the potential of sentiment momentum in country-specific macroeconomic news and sentiment overreaction in foreign exchange news to rotate G7 currency pairs based on their sentiment. This rotation allowed us to construct long-short trend-following and mean-reverting futures-based strategies. Intriguingly, both the long and short legs of the sentiment-based strategies outperformed an equally-weighted portfolio of G7 currencies, surpassing the performance of traditional price momentum and mean-reverting strategies. This suggests that sentiment and price are independent factors. We observed that mean-reverting (trend-following) strategies exhibit a faster (slower) decay and perform better for shorter (longer) time horizons, indicating that these two signals can complement each other across different time spans. The combined sentiment strategy consistently outperformed individual strategies across nearly all effective holding periods, particularly when concentrating the allocation of currencies.

2024 lookout

Early 2024, we will make our forex factors available for commercial consumption. Furthermore, we plan to expand the coverage of sentiment indicators beyond the G7 countries throughout the year.

We pledge to continue to pioneer research and push the boundaries of financial analysis. Here's to another year in the realm of Large Language Models in finance!

By providing your personal information and submitting your details, you acknowledge that you have read, understood, and agreed to our Privacy Statement and you accept our Terms and Conditions. We will handle your personal information in compliance with our Privacy Statement. You can exercise your rights of access, rectification, erasure, restriction of processing, data portability, and objection by emailing us at in accordance with the GDPRs. You also are agreeing to receive occasional updates and communications from RavenPack about resources, events, products, or services that may be of interest to you.