Having survived our backtests on additional markets, as well as StrategyQuant’s walk-forward matrix, our GBPJPY trend strategy should be reasonably robust. In this final article on robustness testing, we will learn more about this strategy’s risk profile using StrategyQuant’s Monte Carlo simulations. This particular robustness workflow uses Monte Carlo simulation for informational purposes only; there is no filtering/shortlisting of strategies, although StrategyQuant has the capability to do so.

Figure 1: Our robustness workflow uses Monte Carlo simulations for informational purposes only

2. Monte Carlo Simulations

With random noise being a perpetual contaminant in every price dataset, your backtest statistics will never be exactly replicated in live trading. Monte Carlo simulations use repeated random sampling to determine the probabilities of obtaining different backtest outcomes, giving you an estimate of how your strategy performance might deteriorate in real-time.

There are two common trading applications of Monte Carlo simulations:

1. Randomizing your backtest’s trade sequence

2. Running a backtest with randomized changes in prices/strategy parameters

Both these options are available in StrategyQuant, and their usage will be demonstrated using our GBPJPY trend strategy.

2.2. Randomizing Your Trade Sequence

Randomizing your backtest’s trade sequence can provide a more reliable estimate of the drawdowns you will encounter in live trading. Every backtest, no matter how reliable, only represents a single run of your strategy over a certain historical period. Suppose you flip a fair coin 100 times and get 53 heads. If you flip it another 100 times, you are not likely to get exactly 53 heads again.

By randomizing the sequence of trades, the Monte Carlo method simulates multiple equity curves from your single backtest. Since drawdown is affected by the sequence of losing trades, these multiple equity curves can be used to produce a drawdown distribution that is more reliable than a single drawdown value. Before we go into more detail about these distributions, a brief explanation of confidence levels is required.

The GBPJPY trend following strategy above produced a 372-trade backtest over the 2003-2020 period. By rearranging the sequence of trades, we can arrive at 372! (372*371*370…*3*2*1) different equity curves. This is an astronomical number which no computer can process in a reasonable time. We can achieve a compromise by simulating only a few hundred/thousand different equity curves, and then using confidence levels to quantify the uncertainty caused by this simplification.

Confidence levels are a concept borrowed from statistics, and are used to measure the degree of uncertainty in a sampling method. A confidence level refers to the probability that the sampled results contain the true value of a certain parameter. In trading, we are usually concerned with profits and drawdowns. The typical result from a Monte Carlo simulation would look as follows:

Figure 2: A typical Monte Carlo result showing performance metrics at each confidence level

If we use a 95% confidence level, it means there is a 5% probability that the actual profit will be smaller than $672, and that the drawdown will be larger than $220. The higher the confidence level, the more the metrics will deteriorate, but the higher the probability that those metrics will encompass your future performance.

2.3. Setting up StrategyQuant’s Trade Sequence Randomizer

The setup options for our Monte Carlo simulation are shown in Figure 3 below:

Figure 3: Common Monte Carlo options include the sampling method and the number of simulations to run

1000 different equity curves will be simulated for our strategy. This should provide sufficiently reliable results without much computational burden. StrategyQuant offers two sampling methods: Exact and Resampling. In the exact sampling method (also known as sampling without replacement), each trade from the original backtest of 372 trades can only be sampled once. This preserves the strategy’s probability distribution, or its performance profile. The resampling method (sampling with replacement) allows each trade to be sampled more than once. This will alter your strategy’s probability distribution, and may be preferable if you expect market conditions to change drastically in future, or your original backtest only contains a small basket of trades.

Since our strategy was developed over 16 years, and 372 trades is a decent number, let’s stick with the exact sampling method.

2.4. Monte Carlo Simulation Results (Randomized Trade Sequences)

The characteristic ‘straw broom’ charts produced from the simulations are shown below, together with a table showing performance metrics at various confidence levels.

Figure 4: Our Monte Carlo simulation produced 1000 equity curves, along with selected metrics at various confidence levels

The blue equity curve reflects the original backtest. Notice how all the overall net profit remains the same for all the equity curves; this is a consequence of sampling without replacement. Ideally, you want the multiple equity curves to be grouped closely together, indicating consistency across the runs. Looking at the 95% confidence level, we can infer that there is a 5% probability that future drawdowns will be larger than $311. This is over twice as large as the drawdown in the original backtest. If you are using historical drawdowns to determine your strategy’s capital allocation, using the Monte Carlo-simulated drawdowns can help you determine a more conservative value. Capital allocation is discussed in more detail in the Portfolio Composition article.

Note that Monte Carlo simulations are probabilistic in nature, so the equity curves and metrics will vary slightly every time you run the test.

2.5. Limitations of Randomizing Your Trade Sequence

While Monte Carlo simulation can be a great tool for anticipating future risks, it has certain limitations. The following is a non-exhaustive list:

1. Overfitting cannot be detected

Monte Carlo simulations assume that the input trades from your backtest reflect your strategy’s true performance; only the sequence of trades is altered. If your backtest is the result of an overfitted strategy, your Monte Carlo performance metrics will be artificially good. This is a good example of ‘garbage in, garbage out.’ It is therefore best to input trades obtained from out-of-sample testing.

Similarly, if your original backtest only contains a handful of trades or covers a brief historical period, your estimations will lose their predictive value when market conditions change.

2. Serial correlations are not preserved

Serial correlations may exist in some strategies, whereby the outcome of a particular trade may affect the outcome of subsequent trades. This is especially prevalent if you are a trend-follower, considering that a large trend is likely to be followed by a period of consolidation. Your trades will thus tend to follow a cyclical pattern – one large winner is likely to be followed by a string of smaller losers.

Serial correlations may also be caused by abnormal market conditions. If you traded a basket of stocks during the 2008 financial crisis, your long positions would likely have been losers. Due to its random sampling, Monte Carlo simulation cannot capture the severity of such periods.

Since 1000 simulations is a sizeable number, the effects of these trade interdependencies should be minimal. Nonetheless, if you feel that serial correlations are important, you may consider doing Monte Carlo simulations on equity curve segments instead, whereby each segment would preserve the series of trades in the original backtest.

3. Market returns are assumed to be normally distributed

This assumption is used when computing the performance metrics at each confidence level. If your backtest contains a number of unusually large winners/losers, your metrics may be less accurate. This can be mitigated if your backtest contains a large number of trades, but regardless, it is best to treat your Monte Carlo results as estimations. A $1000 drawdown at the 100% confidence level does not mean your future drawdowns will never exceed $1000; there is no computational method/mathematical model that can entirely replicate the sophistication of the markets.

2.6. Randomizing Your Prices & Strategy Parameters

StrategyQuant allows you to rerun your backtest with randomized changes in prices or strategy parameters, or both. Both of these methods effectively ‘shake’ your strategy to evaluate how much curve fitting has occurred over your development, much like how you shake a ladder before climbing it. Overfitting is the enemy of robustness. If your strategy’s performance suffers drastically due to changing prices or parameters, it has likely fallen victim to overfitting.

I consider these simulations to be out-of-sample tests, although they are usually less stringent than the additional market tests and walk-forward optimizations described previously.

2.7. Setting up StrategyQuant’s Price/Parameter Randomizer

We will first randomize the prices, followed by our strategy’s parameters. The test setup is as follows:

Figure 5: The probability and magnitude of each change can be selected for each test

For price randomization, each bar in the price history has a 30% probability that one of its prices (open, high, low or close) will be changed. The maximum price change is 30% of the average true range (ATR). For example, if the ATR is 10 pips, the price may change by up to 3 pips.

For parameter randomization, each parameter has a 30% probability of being changed by up to 30%. For example, a 100-period moving average may have its lookback period changed to anything in the 70-130 range. A robust strategy should remain profitable over a wide range of input parameters.

2.8. Monte Carlo Simulation Results (Randomized Prices/Parameters)

The ‘straw broom’ for randomized prices is shown below.

Figure 6: Monte Carlo results with randomized prices

Unlike the trade sequence randomizer previously, every equity curve here is the result of an independent backtest. Thus most simulations contain different numbers of trades and final equity values. Let’s compare performance using the return/drawdown ratio. As we progress through the confidence levels, this ratio tends to deteriorate significantly because we expect a simultaneous decrease in profit and increase in drawdown. The 4.24 return/drawdown ratio at the 95% confidence level is still 51% of the original value, which I consider a decent result. The results from the randomized parameters are next.

Figure 7: Monte Carlo results with randomized strategy parameters

The ‘straw brooms’ look quite similar to those obtained from the randomized prices, with one key difference: there is a much larger variation in the number of trades per simulation. This is due to changes in the length of the time stop and size of the stop loss. The return/drawdown ratio at the 95% confidence level is higher than that in the randomized price simulation; perhaps this test is less stringent. How about we randomize both prices and parameters simultaneously? The equity curve chart looks somewhat similar, but the return/drawdown ratio fell to 3.38. If you want a stricter robustness test, this is probably the way to go.

3. Conclusion

Two types of Monte Carlo simulations have been discussed in this article. The first involves randomizing your original backtest’s trade sequence, thereby creating multiple equity curves, each with a different maximum drawdown. This allows a drawdown distribution to be generated, which can help you arrive at a conservative capital allocation for the strategy. The second form of Monte Carlo simulation randomizes your price history and/or strategy parameters, effectively creating an out-of-sample test that evaluates your strategy’s sensitivity to changing markets.

Through these three articles on robustness testing, I hope I have demystified some of the common test methods available in today’s commercial software. The importance of strategy robustness cannot be overstated. In today’s rapidly changing markets, robustness should be the foremost concern of every developer.

The final trading strategy can be downloaded here.

In the next stage of our development workflow, we will conclude our individual strategy development by running tick-precision backtests on them.