On Backtesting

Backtesting is one of the biggest statistical rabbit holes there is.

Backtesting is used by traders to evaluate a trading strategy or model by applying it to historical data. The idea is to see how the strategy would have performed in the past, under the assumption that if it worked before, it might work again in the future.

At first glance, backtesting seems like a straightforward and logical approach. If you have a strategy that could have turned a profit over the past few years, why wouldn't it work going forward?

The reality of backtesting, like building any model based on real-world historical data, is far more nuanced. For markets that nuance is driven by the regimes that drive prices can change. A strategy that performs well during one period might fail miserably in another, simply because the underlying market dynamics have changed.

Let's assume a simple strategy 'Buy a NVIDIA call option at the beginning of each week and sell a week later'* - betting that the stock would go up in value every week. Backtesting this 'strategy' would have worked very well in the first 6 months of 2024 given the enormous rise of NVIDIA's stock.

If one looks at NVIDIA's stock price from the first 6 months and extrapolates that price of NVIDIA forward - most people will quickly (and rightly) tell you something along the lines of: results of the past are no guarantee for the future.

Most people know you can't just extrapolate an increasing stock price into the future. The same is true for complexer market dynamics - i.e. or the 'market regime', on which many strategies rely.

So can you use anything about the past to give you some information about the future? In general - it is possible to find strategies but it all depends on that mystical 'market regime'. The market regime in the NVIDIA case could be the bullish AI momentum? Buying call options weekly only works well in that 'regime'.

Any backtest or even a model predicting toilet paper sales, depends on the 'regime' and your data is from. Your model or backtest relies on data that must be of the same regime to be valid. For example 'Covid-19 pandemic' is a different regime for a model predicting how much toilet paper gets sold in supermarkets. A model can only interpret an event in terms of data within the context of what it has previously seen.

So you have to account for real-world context for when your model is relevant.

In financial markets, this goes deeper. Market regimes can shift ever so slightly and only become noticeable after some significant time. The market regime can also just be a rationalisation for overfitting, there must be a clear real world context as to why something works well or doesn't. If there isn't - there is a good chance your model will blow up at some point.

Some try to simulate new situations e.g. the once-in-a-100-year recession event, frequently used for stress testing. But how many one-in-a-100-year-recession events are there to sample from? Is there any statistical significance to say that the parameters of 3-sigma event are. What is the 'regime' of a recession event and how that impacts market changes?

The world is constantly progressing, evolving and changing - new regimes arrive and any model that can detect and adapt its behaviour accordingly is powerful. Backtests can give heuristics and understanding into how a strategy behaves, but the ultimate approach is to track why the strategy works and link those with an understanding of the regime that makes that possible.