RobinVOL parameter optimization and portfolio analysis

Abstract — The process of finding and combining the best strategies into a portfolio can be tedious. Here we propose a methodology for automating and accelerating this task. All using a simple and convenient programming language with a great environment for working with data analysis and numerical calculus: Python.
Index Terms — portfolio, optimization, analysis.

 

I. BACKTESTING

When trying to create the best portfolio, we must take into account as many strategies as we can, which means we must backtest the performance of as many configurations as we can. However, this may result in a really huge number of configurations and, therefore backtests to be executed.

The most common approach to execute a backtest is to develop a strategy that takes as input a single market update (i.e.: a new Japanese candlestick) and then calculates whether it is interesting to perform any trades. This approach has a downside, which is that the data must be serialized nto this strategy and resulting in really bad performance if implemented with scripting languages.

However, with a bit of effort, we can develop strategies that work well, and fast, with vectorized operations, achieving the best performance even with slow (but convenient) program-ming languages such as Python.

In order to minimize the calculations, we load the historical data that we want to backtest into memory (something that can be easily achieved in modern and cheap hardware). This data is stored as Numpy [1] arrays for better efficiency.

Then, again to minimize calculations, we precalculate all the buy and sell returns for a given stop-loss and take-profit and for all bars in the history. While this operation cannot be vectorized easily, Numba 1 [2] allows us to execute this code with nearly the same performance as in C.

Once the buy and sell returns have been calculated, a lot of backtests with different configuration parameters can be executed with vectorized and parallelized operations within a single machine. This allows us to execute hundreds of backtests per second.

Furthermore, the backtests can be easily parallelized across multiple machines in a pool of workers, dividing the backtests by symbol, period, take-profit and stop-loss.

II. SMOOTHING

In order to filter the strategies that are most interesting for a given period of time we must evaluate their performance somehow. Typically, we could use a simple ratio between the cumulative return and the maximum drawdown or the famous, and simple as well, Sharpe ratio.
However, evaluating strategies based on a single factor might be dangerous, as it might happen that the strategy performed incredibly well for a given parameter configuration but performed really bad when slightly modifying those parameters.
In an ideal world, we could assume that:

  • The parameters of a strategy form a complete linear subspace.
  • An infinitesimal modification of a single parameter has an infinitesimal impact on the result over an infinite series of data.

According to this, a strategy that performs great should be surrounded by other strategies that perform well in the N-dimensional space that is formed with all the strategy’s parameters.

Nevertheless, the backtests just represent a small subset of the possible parameter combinations, being largely incomplete (figure 1).

 

smooth-points

Fig. 1. A 2-D visualization of some backtest results.

This subset can be interpolated to obtain a more dense subspace of solutions. Combining this interpolated result with a convolution filter, we can easily spot the areas which really show an outstanding performance (figure 2).

interpolated

Fig. 2. A 2-D visualization of the interpolated and convoluted subspace.

III. CORRELATION ANALYSIS

With the backtest data, using a factor (i.e.: Sharpe ratio) to evaluate the strategies for a given period, and interpolating and convoluting the results, we can filter those strategies which perform best.

Setting a threshold allows us to reduce the number of possible strategies to generate our portfolio. Still, there are probably many strategies that perform really well, but we would like to reduce them.

In order to do so, we are calculating the correlation between the time series of the returns generated by each strategy. The idea is to create a portfolio with strategies trying to minimize the correlation between them. Doing so will reduce the probability of our portfolio breaking down when eventual market changes arrive in time (something that will eventually happen).

Calculating the correlation matrix of all the possible strategies is too expensive, so the idea is to reduce and split this work. When creating a portfolio we are first defining the maximum number of strategies that we want to be part of it. This way we are setting a maximum number of correlation calculations when a new strategy is added as candidate to be part of the portfolio.

correlation-matrix

Fig. 3. A small correlation matrix with an allowed maximum correlation factor of 0.6.

When the portfolio reaches a maximum number of strategies, the new ones will be evaluated and, if that is the case, they will replace existing ones in the portfolio, as long as they contribute to lower the portfolio correlation.

Although this will not lead to the best portfolio, it is scalable and can lead to a very good portfolio anyways (figure 4), so it is a good trade-off.

portfolio-returns

Fig. 4. Example of the returns generated by a portfolio.

This process can be easily parallelized as well, dividing the set of what we consider as good backtests into different portfolios. Then the strategies in those portfolios can be easily compared again, creating a new single portfolio out of them.

IV. FURTHER IMPROVEMENTS

While technically out of the parameter optimization and portfolio analysis, the use of neural networks to improve the accuracy of the strategies included in the portfolios has been successful.

This part, however, is not yet automated in the process and will be further studied before including it. Mostly because choosing the layers and other parameters in the neural networks it is not easily automatable. Using deep learning and, specially, deep features, might help creating a single complex model that can be tuned for each case in an automated way.

V. CONCLUSIONS

Several conclusions have been reached with this study:

  • Python has been proved as very valuable for backtesting, even with very high performance thanks to Numpy [1] and Numba [2].
  • Parallelization tasks are very easily implemented with Python.
  • Interpolation and convolution can help to avoid adding strategies that perform particularly well for a given configuration but which are very unstable and, therefore, will probably perform poorly with future market conditions.
  • Using a portfolio that tries to minimize the correlation between the strategies that form it not only improves the cumulative returns curve, but also hardens the system against market changes in the future.

 

REFERENCES
[1] NumPy, http://www.numpy.org/
[2] Numba, http://numba.pydata.org/
[3] Profitable auto trading, http://www.profitableautotrading.com/
[4] Mechanical Forex, http://mechanicalforex.com/