Tick History with Ian Friedman, Chief Data Scientist, RoZetta Technology
RoZetta’s Tick History platform captures every transaction on every stock exchange and futures, money and bond market around the world. That is, from over 650 markets and 2000 contributors and vendors, all feeding in data every second of every day. In 2002, the Stock Market Crash wiped over $4trillion value off the global equity market and its disastrous effects has led to greater transparency, accountability and compliance.
Tick History – a platform built and operated for Thomson Reuters – has transformed the way capital markets operate. It has changed the transparency of trading, decision-making, investment strategies and market surveillance and has made vast quantities of data instantly accessible to automated and high frequency trading enterprises.
When the New York Stock Exchange opens for business each day, Tick History is collecting about 2 million updates per second. That level of activity is sustained for about two hours, while markets around the world react to the NYSE opening. Over the next 8 hours Tick History collects and processes half a million updates per second.
That is Big Data!
How did it come about?
It was 1997. About 16% of adults were using the Internet and, when they did, it was with slow connections through telephone lines. The dotcom bubble was underway and increasing Internet access meant more and more people were buying and selling stocks on share markets around the world. And some organizations started to employ quantitative analysts to automate the entire trading process by leveraging sophisticated algorithms to make investment decisions in a split second. They needed lots of historical data to test their theories.
A group of universities from Australia and New Zealand banded together to form SIRCA Limited, a not-for-profit Australian company with the aim of developing and providing the global data and tools needed to enable financial research and innovation. They knew data volumes were accelerating globally and wanted to harness it for academia, particularly for microstructure research that examines the efficiency of financial markets.
The following year, after a chance meeting in a London pub, SIRCA learnt that recordings of every stock market transaction was being archived on tape media in a Reuters vault. This was Nirvana!
SIRCA realized the importance of this information and secured access to the tapes for academic use. Finding out what was on the tapes, and how to unlock it, was the next step.
Ian states “Several years earlier, I had been researching exactly what was on those tapes – I was trying to build a similar service for Reuters.
Our initial analysis of the tapes revealed 3 important lessons that still hold true today:
- Big Data today will be much bigger tomorrow. Plan to scale for future growth.
- Never throw anything away. You never know what might be important one day, so store every piece of data.
- Understand the problem you are trying to solve. Big Data is a waste of time unless there is a clear objective for the end user.
Early Adopter of open source technologies
SIRCA commenced using open source technologies back in 2001 and was critical to the success of Tick History.
Using open source technology we were able to modify programs to suit our needs. Open source technologies allowed us to push the boundaries and formulated a culture of innovation.
Unlocking the data secrets
The fact that we were a small team – just five or six people – helped our success. We were innovative, responsive and one of the earliest adopters of agile thinking, but more importantly we had a tremendous drive to crack an almost unsolvable problem.
There were no technologies available that would scale for the Reuters data set so we developed our own proprietary data processing system. We took a large problem, broke it into a lot of smaller problems and solved them concurrently across a cluster of CPU nodes – It was essentially a MapReduce system, but created years before Doug Cutting and Mike Cafarella developed their Hadoop technology.
The team also developed an automated request service that allowed academics to access the data over the Web. So now we had a very effective way of storing and processing the data and a very efficient way of making that data available.
Structured and unstructured data
The key to the success of Tick History is the quality and consistency of its data. It doesn’t start out that way.
The data that arrives from markets around the world is unstructured. Each market uses different methods of describing factors like instrument codes, time stamps, location codes and currencies and this means that none of the data is usable. At this stage it is not possible to distinguish a government bond from a spot FX rate!
The team at SIRCA unraveled those inconsistencies and built a system that takes the raw data, cleanses it and transforms it into structured data that is comparable, relatable and consistent.
What makes Tick History unique?
The sheer volume of data makes Tick History unique. That, and the fact that it is searchable – almost instantly.
It collects the data, cleans the data, analyses the data, stores the data and makes it available to clients within minutes via a searchable interface.
It seamlessly processes data requests from any number of clients in parallel, some small, some large and others incredibly large, and makes the results available in the shortest possible timeframe.
How big is the data?
It is big data in terms of its historic reach but also in terms of its depth.
Tick History captures every transaction –– across every financial asset class in every financial market around the world. It contains data from 70 million different financial instruments that have traded on markets since 1996. In 2005, the average daily Tick History file was 16 gigabytes compressed.
Today it is 200 gigabytes compressed– or 1.5 terabytes uncompressed. And it is still growing. The total size of the archive is about 2 petabytes or 2 thousand million million bytes or 250 bytes.
The commercial shift
In 2005, RoZetta Technology, the commercial arm of SIRCA and Reuters, now Thomson Reuters, reached a commercial arrangement and together launched the first global Tick History service.
Two years later an API was launched, providing Reuters clients with a searchable interface and delivering data in seven days from real-time.
In 2009, the technology shifted from the data arriving every second day on a flight from London to Sydney, to the data being delivered via a secure dedicated network between London and Sydney, with results made available within half an hour.
Tick History today
Today, Tick History is the largest financial database in the world and provides a vehicle for hedge funds and investment banks to generate several trillions of dollars of turnover each year.
More than 500 clients – including 80% of the world’s hedge funds and 90% of investment banks – are subscribers of Tick History.
It offers searchable data from 1996 to today on all transactions on all financial instruments including equities, derivatives, commodities, money and fixed income – from around the world.
Value-added services such as Exchange-By-Day, Thomson Reuters DataScope for Equities; specialized Vendor Packages, Hard Media Service for bulk data delivery and Reuters News Archives were integrated into the service to further help clients make daily investment decisions that affect global markets.
One cannot even begin to fathom how we would trade now without Tick History. Just getting access to the historical tick level across the global asset class amazes analysts. For an asset class that impacts billions of people globally we are thankful that Tick History allow us to:
- Effectively manage compliance requirements in today’s fluid regulatory environment
- Perform quantitative research and analytics
- Employ real-time algorithmic trading strategies in a cost-efficient manner.
What to talk to us? Contact us.