RavenPack Technology

Our State-of-the-Art NLP Infrastructure Produces Analytics used by some of the Largest Financial Institutions Worldwide

How to analyze

300 million unstructured documents per month

Over the past 20 years, RavenPack has built one of the largest civilian infrastructures for natural language processing at scale. Starting with news analytics, and expanding to larger sets of business-relevant data sources, and larger universes of tracked entities, the RavenPack infrastructure now processes millions of documents every day. The service turns heaps of unstructured textual data into structured insights augmented with analytics such as sentiment scores. Powered by proprietary technology, the infrastructure performs 5 key tasks:

Content Collection Curate data from over 40,000 sources or from your own proprietary content Text Extraction Transform any document into a normalized textual format Enrichment Tag content with sentiment, entities, events, relevance and more Self Service Data Enable the selection and filtering of data to create custom datasets Data Delivery Make the data available via a self service data platform or real-time APIs
Content Collection Curated data from over 40,000 sources or from your own proprietary content Text Extraction Transform every document into a normalized textual format Enrichment Tag content with sentiment, entities, events, relevance, and more Self Service Data Enable the selection and filtering of data to create custom datasets Data Delivery Make the data available via a self service data platform or real-time APIs
The latest generation of our infrastructure:

RavenPack Edge

The latest generation of our infrastructure, Edge, is the outcome of over 5 years of technological research and development. Edge achieves an unparalleled breadth of coverage and depth of analytic insights.

Capable of processing up to 3 times as many documents, from over 40,000 sources, Edge produces analytics both in real time and across our deep historical archive. In addition, Edge tracks more than 12 million entities, representing a 25 fold increase over the prior generation product. Edge also benefits from an enhanced event taxonomy and incorporates new technology to both detect more events, and to augment each event match by extracting more information from the document. The net result is nearly 5 times the number of records produced on a daily basis.

The RavenPack Data-as-a-Service platform scales both vertically and horizontally to maintain sub-second latency for the majority of documents flowing through the system.

RavenPack Edge is powered by

Machine Learning

Machine learning can be a powerful technique, particularly when coupled with a large and accurate training set. RavenPack’s traditional event sentiment applied to our 20+ year archive provides one of the most comprehensive sets of tagged sentiment on English language news available anywhere. Using this curated, high-quality archive, RavenPack has been able to train a novel model and apply it to Edge, generating high-quality sentiment across each sentence of the entire document archive.

Explore

RavenPack's NLP Resources

February 26, 2021

Improving Sentiment Models With Better Inputs

Over the years, we’ve experimented with many different approaches and improvements to our core Named Entity Recognition (NER), Classification, and Sentiment analysis tasks...

July 26, 2021

Using Tied Autoencoders to Fine-tune and Reduce Sentence Embeddings

Being able to capture the context of a word or sentence provides insightful features for downstream tasks, like classification or named entity recognition. These context captures, called embeddings, are ubiquitous in current NLP approaches.

June 24, 2021

Exploring Content Acquisition

Gathering of quality news is the first stage in the process to turn unstructured data into actionable insights. Juan Sánchez Gómez, RavenPack’s head of Content Management , gives us a guided tour of that critical step.

June 24, 2021

Using AWS Lambda to deploy Machine Learning models at RavenPack

As Machine Learning engineers, there are occasions where we want to test our models in the real world, with real data and to receive real...

February 26, 2021

Using NER to detect relevant entities in finance

One of the key values of RavenPack for our customers is the ability of our products to deliver relevant information in real-time for their decision-making....

May 25, 2021

The Art and Science of Classifying News

Is the phrase “Jobs at Apple” talking about employment conditions at the U.S. tech giant or something related to its posthumous co-founder? These are the sorts of problems that those who work in the field of NLP have to wrangle with on a daily basis.