How NLP Can Distil Value From Businesses’ Own Internal Data

RavenPack | February 21, 2020

In romantic comedies it’s not uncommon for the main character to realize their best friend - out of familiarity overlooked - is actually the one they’ve been looking for all along... The point is that it is often easy to overlook the value of what is sitting right in front of you.

In a similar way, we have found that a lot of the information companies have stored in their internal systems - in the form of emails, memos, Pdfs, chat conversations and attachments data - though seemingly merely a part of ‘process’, can actually be of great value when structured and analyzed using our platform’s Natural Language Processing (NLP) algorithm.

Alpha Generation

On Email and Instant Messages Content

In a recent collaboration with a $1b AUM discretionary hedge fund, for example, RavenPack used its next-gen NLP technology to structure, analyze and draw meaningful insights from the fund employees’ data contained in their inbox and skype chat conversations.

The resulting data was so valuable that when it was used as the basis of an investment strategy it beat the firm’s own overall rate of return.



“We demonstrated that there is alpha to be captured in the sea of internal digital content by systematically extracting, structuring and enriching the fund’s own content in real time to generate a tradeable investment strategy,” says Peter Hafez, Chief Data Scientist at RavenPack. “The study found strong long-only signals that persist for several weeks, offering fundamental investors a reasonable time frame to act on them.”

The study was also able to analyze individual analysts’ contributions to the fund’s performance and thereby define high value vs low-value user accounts, from an information ratio (IR) perspective (which measures the consistency of returns).

user correlation nlp data

The diagram above maps the marginal impact of the fund’s employees. The size of the node reflects the impact of the user account on the information ratio and the color reflects the net contribution. If green, it signaled that the IR would fall if the employee was removed from the network; if red it meant the IR would actually improve hat if that person was removed from the network.

The overall conclusion the RavenPack team reached was that employees on the periphery tended to be of higher value.



“Users with high marginal value tend to have fewer edges whilst those with low marginal value tend to have more edges,” says Peter Hafez.

Developing an optimized strategy using only high-value user accounts yielded an even higher return compared to all users.

user network nlp data

On Analysts Reports

JP Morgan is also using NLP data to recycle internal content and generate fresh investment insights. The bank started out by using the approach to analyze research generated by its real estate funds but then went on to include over 100,000 research reports written by JP Morgan analysts about individual companies.

It is not the only investment bank using this approach. Morgan Stanley is also using NLP data to recycle its analysts’ reports, driving new alpha streams. Although sentiment data derived from analyst’s research reports was not enough to drive a stand-alone strategy, when overlaid with price target changes it outperformed the S&P 500.



“We took sentences out of our research reports and we used the sentences as records for sentiment training. We labeled each sentence as positive, negative or neutral. We created about 6,000 different records, completely based on our own research, to generate the sentiment model. Of course, there were multiple iterations of training of that model so that in the end we were able to reach an out-of-sample accuracy of the model of around 80%. The trading strategy based on the sentiment model beat the S&P,” says Yimei Guo, Global Head of Investment Research Technology at Morgan Stanley.

But it is not just as a driver of investment strategies that internal data processing and evaluation can be of value. It can also be useful in other areas.

Risk and Compliance

In the field of risk and compliance, for example, NLP data can be used to monitor trader’s positions, decision-making and chat conversations; in the sales process is can be used as a tool to quantify and qualify leads, and in customer services as a method to monitor and analyze customer chat interactions.

One application in the field of risk and compliance is to use RavenPack NLP data to monitor conversations between financial market professionals on Instant Messaging Systems, which financial institutions use to create secure encrypted environments for intra-company communication.

RavenPack Text Analytics Service

RavenPack is an established vendor with over a decade and a half’s experience in providing NLP data, specializing in the financial services sector. It was recently listed as a reliable vendor in a business intelligence report by Forrester, exploring text analytics and leading industry providers.

RavenPack Text Analytics Services enables users to transform any textual data into a strategic asset. Over 1,000 data formats are supported (emails, instant messages, files, html, …) by our NLP API.

If you think your business may be sitting on a treasure trove of unstructured data, which if managed in the right way might yield up some valuable insights, then please take the next step to request a trial today and find out more about how we can help you make the most of untapped potential you have in your hands.

Request a Trial

Fill out the form below and see RavenPack in action.