Back to Insights


The Data-Driven Hedge Fund

Mario Emmanuel

It is estimated that 80% of all equity trading in NYSE is algorithmic. And while it’s not possible to prove the accuracy beyond a reasonable doubt, widespread consensus on the figure points to fact rather than hypothesis.

Of that 80%, a large chunk belongs to passive investment, which, despite being automated, has certain particularities that I do not consider truly algorithmic.

Passive investment is based on the old financial axiom: in the long term, markets are always bullish. While there is some truth in that assessment (indices usually remove their weak constituents), there are examples in developed economies in which this wisdom fails to hold true. A quick look at the long-term Nikkei chart — and let’s not forget its 1989 peak — will suffice. The assumption can be a dangerous proposition.

The remainder of this automated trading belongs to hedge funds, proprietary firms, and high-frequency trading (HFT) platforms, all of which use less risky approaches: hedged strategies, quantitative algorithms, statistics, and artificial intelligence. While there are many sound approaches to markets, the actual figures show that traded volume is based on math models, rather than on fundamental analysis or discretionary trading. In fact, data and models have largely replaced human problem-solving.

For hedge funds operating with models, most of the focus is on algorithms; Artificial intelligence and machine learning are used to exploit market edges. These are the key assets of a modern hedge fund,the true revenue generators. What is not widely known or understood is the role of data and supporting operational systems. In order for those models to generate revenue, they need to be fed with accurate, up-to-date data, and translated into actual market trades. There is no room for error. The chain might look simple, but the operational challenges are massive, especially when your models are sophisticated and your trading universe is large. Building this perfect data pipeline requires determination, know-how and a strong commitment … attributes not always found in the technology industry.

An example: at Kaiju we are working on systems capable of analysing (in real time), instrument universes upwards of one million symbols. That means our systems need to process one million hieroglyphs without error, and do so with enough performance to use the myriad opportunities the markets offer daily. The demand for mistake-free outcomes is a technological headache. Systems require a complexity and sophistication most technology is not built to handle.

The required throughput places a high premium on fast data processing. The huge amount of data and lightning-quick pace is forcing hedge funds to abandon traditional information systems for “real-time” technologies — sophisticated processors that move data within milliseconds, sometimes microseconds. Data is to a modern hedge fund what a foundation is to a house: no one discusses it during a real estate transaction even though it makes the house itself possible. And so it is with data, the load-bearing structure of today’s hedge fund.

Data undergirds the very function of a hedge fund, where a vast amount of numbers travel from exchange venues into models that make buy and sell decisions. Contrary to what one might think, this need for perfect performance does not necessarily translate into complex systems. In fact, the "failure is not an option" paradigm forces systems to be designed based on simplicity. Every step in the process (sometimes even each line of code) must be analysed and discarded if not needed. The removal of unnecessary elements leads to minimalistic systems that work as precision machinery. They are designed to do one task perfectly.

This approach is widely understood by the HFT industry, where custom hardware pipelines are the norm. But this remains novel to many hedge funds, though they are steadily increasing in both the pace and quantity of processed data.

Mario Emmanuel is an external consultant advising Kaiju on market data. He has a M.Eng. in Electrical Engineering who has specialised in market data processing and quantitative analysis. His interests include actionable quantitative models using market volume data and the technologies required to process and store order flow market data. He has almost two decades of professional experience including projects processing data in real time in Tier-1 telecom networks.