December 22, 2025
Chief Data Scientist Peter Hafez on how 2025 shifted finance AI from demos and hype to robust, traceable systems built on thoughtful architecture.
If the past couple of years were defined by experimentation and exuberance around generative AI, 2025 marked a clear inflection point. For RavenPack, and for much of the financial industry, this became the year when enthusiasm gave way to accountability. Not the year of "AI everywhere," but the year of tools, agents, and systems that actually work. Across the industry, organizations are now asking harder questions. Demos are no longer enough; tangible, repeatable value matters. This shift has been especially pronounced in finance, where robustness, traceability, and near-deterministic behavior are essential. In practice, this has exposed the limits of purely generative systems and reinforced a lesson many of us already knew: intelligence compounds only when models, tools, and data are architected together.
Large Language Models are not a magic bullet. Building production-grade systems remains hard, incremental work, grounded in specific workflows and clearly defined objectives. In that sense, 2025 felt less like a revolution and more like a return to fundamentals, though with better tools.
Tools have replaced rules. Orchestration layers and agent frameworks will continue to commoditize, but durable advantages will sit elsewhere: in proprietary or premium datasets, well-designed, agent-ready tool libraries, and the institutional knowledge embedded in both. These assets define what a system can do reliably, not just what it can say.
Human judgment also remains essential. Asking the right questions, defining the right abstractions, and deciding where automation is "good enough" versus where certainty is required are still deeply human responsibilities. For organizations without a history of working with data and automation, reaching this realization has taken time. But the broader conclusion is now hard to ignore: progress is cultural before it is technological. The mental shift comes first.
We're often asked by clients: "Should we be applying generative AI to our investment process?"
It's an understandable question. It's also usually the wrong one.
A more productive question is: "How do we architect intelligence that compounds over time?"
The difference is subtle but decisive. The first frames AI as a bolt-on technology. The second treats intelligence as an evolving system, one that continuously integrates new tools, models, and data while preserving what already works. No single model dominates forever; every approach carries trade-offs. Organizations that internalize this reality tend to experiment earlier, discard faster, and scale more deliberately.
This is where we focus: not on whether to use a particular model, but on how to build workflows and systematic processes that remain robust as components change. As a quant, I gravitate toward near-deterministic systems. That bias has served me well in 2025, because it turns out you can get there, even with probabilistic models, if you design for convergence rather than precision at every step.
Much of our applied research in 2025 focused on clarifying where generative models add the most value. A useful lens is the distinction between high-reasoning and high-structure workflows.
These aren't binary categories. Systems exist on a spectrum, combining probabilistic reasoning with varying degrees of determinism. The challenge isn't to eliminate uncertainty, but to manage it explicitly.
High-reasoning steps, such as concept expansion, query enrichment, or thematic discovery, are where LLMs excel. Transforming a vague question like "Which companies are exposed to U.S. - China trade tensions?" into a structured set of transmission channels and related concepts dramatically improves recall in downstream search and retrieval.
High-structure workflows define what must happen every time a process runs. They enforce sequencing, guardrails, compliance, and reproducibility. Within these workflows, high-reasoning steps can still exist, but they're invoked deliberately, as tools, not as fully autonomous decision-makers.
For our clients, quant researchers looking for alpha, risk managers building systematic processes, or technologists integrating Bigdata Search into co-pilots and agentic frameworks, this distinction matters. You don't want your risk system making creative interpretations. But you do want your research workflows to surface non-obvious connections.
One of our more instructive findings came from research into grounded mindmaps for concept expansion. By generating ensembles of mindmaps using reasoning models and aggregating them into a "super-mindmap," we observed strong convergence as the number of samples increased. Introducing simple probabilistic filters, such as excluding branches or nodes that appeared in fewer than a threshold number of maps, further reduced noise. Practically, this showed that the somewhat non-deterministic behavior of LLMs can be translated into near-deterministic outcomes when systems are designed for convergence across probabilistic components, rather than precision at each individual step. We found that reliable outlier detection was possible even with smaller ensembles, making this approach feasible at scale.
This is the kind of result that excites me: taking something inherently fuzzy and building a systematic process around it that behaves predictably. The world is changing, and you have to adapt in order not to be left behind. But things are not as different as people may think. Diversification, ensemble methods, disciplined validation, benchmarking, the principles that made quant strategies work still apply. The building blocks have changed. The mindset hasn't.
Read more about how we used Large Reasoning Models with Grounding to build Dynamic Mind Maps
In 2025, our Data Science team invested heavily in generative workflows aimed at measuring risk and thematic exposure across large universes. By combining grounded reasoning, mindmap-based concept expansion, and the Bigdata Search API, we constructed scalable pipelines that retrieved and verified relevant signals across news, earnings calls, and regulatory filings.
A key advantage of this approach is flexibility. Instead of relying solely on predefined taxonomies, clients can express their own conceptual view of the world, whether company-specific, thematic, or macro-driven, and see it operationalized consistently. LLMs play a critical role as verification layers, enabling validation at volumes where human review would be impractical.
Throughout the year, we worked closely with clients to adapt these workflows to real investment and risk use cases, demonstrating that generative systems can be both expressive and controlled when embedded within structured pipelines.
Check out the use cases in our Resource center.
This year-end review also captures our 2025 quantitative research, extending beyond traditional RavenPack Analytics to translate qualitative signals into measurable performance.
Financial Times + RavenPack: Compounding Data Quality
Our partnership with the Financial Times allowed us to test a clear hypothesis: does integrating premium journalism with broad news coverage improve equity selection? Using the top 1,000 U.S. equities, we combined RavenPack Core News with FT content in a long–short, dollar-neutral strategy rebalanced daily. Backtests from 2010 showed approximately 120 basis points of annualized performance improvement and an increase in Information Ratio from 0.59 to 0.73 (2-day holding period).
The methodology was intentionally simple: sentiment from earnings-related events combined with sentiment across all FT event types, cross-sectionally normalized and equally weighted. The conclusion was equally clear: data quality compounds. Premium sources don't merely add coverage; they amplify signal strength.
Read more about the strategy here.
We introduced a proprietary dataset capturing professional opinion across U.S. mid- and large-cap equities from 2020 to 2025. By structuring analyst commentary to reflect expert reasoning frameworks, we created a scalable mid-frequency signal that traditional sentiment models overlook. Related research on analyst rating changes demonstrated consistent outperformance across global equity markets.
Our 2025 research on earnings calls focused on extracting signals from the unscripted interaction between management and analysts. Q&A Language Transparency quantified disclosure quality using linguistic features such as clarity and evasiveness, while Q&A Sentiment Analysis showed that sentiment expressed during analyst questioning predicts subsequent stock performance, often ahead of broader market recognition.
Both examples systematize what human analysts do intuitively, looking for tells in management behavior, reading between the lines, but doing it at scale, consistently, and without cognitive biases.
At the macro level, we demonstrated how sentiment signals from earnings and asset-related news enhance U.S. equity sector rotation. Treating sentiment as a leading indicator of earnings momentum enabled more adaptive, systematic sector allocation across investment horizons.
Our 2025 research reinforced a consistent theme: systematic advantage increasingly comes from interpreting complex, unstructured signals at scale. The line between systematic and discretionary investing continues to blur, driven by infrastructure that supports compounding intelligence rather than isolated models.
Our 2026 priorities extend this trajectory:
Multi-Modal Signal Integration: Combining text, audio, and visual data into unified signals across risk and alpha use cases, as new datasets are integrated into RavenPack Analytics, Annotations, and Bigdata.com.
Advanced Workflow Development: Expanding support for sophisticated, tool-driven pipelines that embed reasoning within structured, repeatable processes. This is where I see the most opportunity: building workflows that happen to use LLMs as components.
Bigdata.com at Scale: Advancing large-scale content generation: briefs, daily digests, and trend analysis, supported by improved relevance, novelty, and sentiment modeling, and deeper integration with knowledge graphs and risk & thematic screeners. Our clients are building co-pilots and agentic systems. We provide the infrastructure and trustworthy data layer that makes them production-ready.
The next generation of alternative data analysis is already underway. Our focus remains pragmatic: partnering with clients to convert data intelligence into measurable, repeatable market advantage.
No shortcuts. No hype. Just systems that work.
The world is changing, and adaptation is not optional. But for those of us who've spent careers building systematic processes, the fundamentals remain surprisingly familiar. We're just applying them to better tools, richer data, and more ambitious problems. We're moving from prediction to reasoning. The shift is not optional.
That's the journey we've been on in 2025. And frankly, it's just getting started.
Please use your business email. If you don't have one, please email us at info@ravenpack.com.