Jason Cornez, Chief Technology Officer, RavenPack
| May 22, 2018
View an extract of this session held at the London Big Data and Machine Learning Revolution event in April 2018.You can also access the full video and slides.
RavenPack automatically detects thousands of different types of market moving events in unstructured text documents. An enriched event captures more context from the document to provide more color about what the event means. We take a quick look at how events are detected now and what innovations are happening to help enrich the events the system can detect moving forward.
What's the buzz that might also be informative if you want to trade very short horizons that you want to filter for novelty? If you want to look at it in a longer time frame you might want to see how broadly has it been disseminated, how much coverage has it had, and that might give you additional information. We have sentiment techniques that are event aware. It knows that its earnings above expectation of end and so on.
But what about if it says earnings up versus stock price up? It will have very different implications in terms of predictability. So it's not just enough to look for positive and negative language you need the context. So if we see it’s related to an earnings event, we will move the sentiment more extreme based on the range of the up and lower bounds as opposed to if it's a stock price where we have already.
We also look at more traditional techniques like looking up a combination of words and what we find is that the event, what we called the Event Sentiment Score, is generally very strongly predictive but it's stronger on short horizons because it's also the underlying thing that is event driven to a large extent.
RavenPack processes every day over 100,000 documents. We do this in real time with a very low latency. We publish structured data on the unstructured data within about 250 milliseconds. We run in the Amazon cloud 24/7, it's always up. You probably think of and have heard about RavenPack as a great data vendor and I want to say that we also aspire, especially in my team, to be a world class software company as well. So I want to go a little bit behind the scenes and tell you about that.
Well there is a modern architecture.
Someone was actually asking me, “how do you navigate so smoothly to move to the cloud in the last five years?” Well our architecture was always distributed. That means that we had different pieces running on software, running on different pieces of hardware from the start. And multithreaded, that means computers which have lots of cores in it, that we can take advantage of. This is different from some sort of monolithic software, when you're moving to the cloud, all you're doing is running it, instead of on your computer, running it on someone else's. Right now instead of running on our set of computers, which was limited, we can run on Amazon set of computers which for many intents and purposes is kind of unlimited. So migrating to the cloud was easier for us than for many of our potential competitors and in the software industry. In a
, RavenPack Analytics (RPA) shows how their technology can be used to uncover profitable trading signals for energy futures.
Now with the cloud, we also have horizontal scaling, which means as we add more data or we have more subscribers to that data, we can easily get more computers onboard and basically everyone has the same level of service.
RavenPack tracks thousands of event types. These can be corporate events:
At RavenPack we identify all sorts of events, I think we're tracking nearly 7,000 at this point. There's a lot in equities related to companies. There's also lots of global events and in the future we can expand into events on sports or whatever, the technology is very agnostic about this but the expertise of the company is clearly much more towards finance.
So when we talk about an event, what does that mean? It means we identify which entities participate in the event. There are certainly companies but also people, products, and commodities. In a
recent article, we discussed being a commodity investor can be a roller-coaster ride, which was clear a few years ago when the prices of energy-related commodities collapsed, led by crude oil.
All of these things are what we call entities. Not only do we identify that they're in an event, but what role do they play? In lawsuit events it's really important to know who's the plaintiff, who's the defendant, if its a merger and acquisition. Similarly, ratings who's the rater? Who's being rated? Clearly this make a difference.
In addition to the entities, we also have other attributes of an event like dates, magnitude, sentiment, and trust. Trust is about as a factual event or an opinion event. And finally we have event consolidation, that is in any document, a news story, the headline might contain 'earnings up', later you might want to find earnings are up, but for which quarter and by how much? These are all the same events, so we take these little parts and we don't publish five events for the same story, but we publish one event based on the best aggregation of the data. All of this gives the valuable data, more actionable.
So an event type is composed of multiple roles and each role allows for a particular type of data and we have a matching engine which when it matches the text, is able to pull the data out of the text and put it into the appropriate role of the event type. Also, there's conditions which define a match.
For example, one role is the plaintiff and one is defendant., then there's the date of the legal action. In other words, is this legal action against a CEO or a CFO? A variety of roles. The there’s earnings up, and this one has a condition which two of the fields for earnings up are the actual earnings and the previous earnings. And for this event to match, actual has to be greater than previous, which of course is obvious but if you think textually, the pattern or how you would match this, is exactly the same for earnings down. You still have some companies announce some earnings in previous earnings. You don't know if it's up or down until you actually extract the numbers, look at them and make that determination.
(Event type example)
The classifier is a piece of technology at RavenPack which does something. And today we're talking about event detection. There's also one for entities, things that I mentioned before. The event detection classifier though, is fundamentally taking some template which is kind of like a regular expression, some pattern of text and matching it against the story body, but we annotate that body, particularly when it's annotated with entity detections and as well with attributes and an important thing is that the same text can be annotated in multiple ways.
RavenPack recently launched a
new self-service data and visualization platform
, allowing us to move beyond the quantitative community with our data offerings.
An important thing to note is that multiple templates can match the same set of text and there can be a scoring algorithm to either allow multiple to win, or to choose one to be the winner.
We have something like a Rete algorithm but ours was “invented” independently
Simultaneously matches against all templates
Multiple templates can match given text
Scoring to choose a winner
What are we doing now to make it even better and make the computer do a lot more?
Please use your business email. If you don't have one, please email us at email@example.com.
By providing your personal information and submitting your details, you acknowledge that you have read, understood, and agreed to our Privacy Statement and you accept our Terms and Conditions. We will handle your personal information in compliance with our Privacy Statement. You can exercise your rights of access, rectification, erasure, restriction of processing, data portability, and objection by emailing us at firstname.lastname@example.org in accordance with the GDPRs. You also are agreeing to receive occasional updates and communications from RavenPack about resources, events, products, or services that may be of interest to you.
Your request has been recorded and a team member will be in touch soon.