On Backtesting Performance and Out of Core Memory Execution

There have been two recent https://redit.com/r/algotrading threads which are the inspiration for this article.

A thread with a bogus claim that backtrader cannot cope with 1.6M candles: reddit/r/algotrading - A performant backtesting system?
And another one asking for something which can backtest a universe of 8000 stocks: reddit/r/algotrading - Backtesting libs that supports 1000+ stocks?

With the author asking about a framework that can backtest '"out-of-core/memory", *"because obviously it cannot load all this data into memory"

We'll be of course addressing these concepts with backtrader

The 2M Candles

In order to do this, the first thing is to generate that amount of candles. Given that the first poster talks about 77 stocks and 1.6M candles, this would amount to 20,779 candles per stock, so we'll do the following to have nice numbers

Generate candles for 100 stocks
Generate 20,000 candles per stock

I.e.: 100 files totaling 2M candles.

The script

import numpy as np
import pandas as pd

COLUMNS = ['open', 'high', 'low', 'close', 'volume', 'openinterest']
CANDLES = 20000
STOCKS

dateindex = pd.date_range(start='2010-01-01', periods=CANDLES, freq='15min')

for i in range(STOCKS):

    data = np.random.randint(10, 20, size=(CANDLES, len(COLUMNS)))
    df = pd.DataFrame(data * 1.01, dateindex, columns=COLUMNS)
    df = df.rename_axis('datetime')
    df.to_csv('candles{:02d}.csv'.format(i))

This generates 100 files, starting with candles00.csv and going all the way up to candles99.csv. The actual values are not important. Having the standard datetime, OHLCV components (and OpenInterest) is what matters.

The test system

Hardware/OS: A Windows 10 15.6" laptop with an Intel i7 and 32 Gbytes of memory will be used.
Python: CPython 3.6.1 and pypy3 6.0.0
Misc: an application running constantly and taking around 20% of the CPU. The usual suspects like Chrome (102 processes), Edge, Word, Powerpoint, Excel and some minor application are running

backtrader default configuration

Let's recall what the default run-time configuration for backtrader is:

Preload all data feeds if possible
If all data feeds can be preloaded, run in batch mode (named runonce)
Precalculate all indicators first
Go through the strategy logic and broker step-by-step

Executing the challenge in the default batch `runonce` mode

Our test script (see at the bottom for the full source code) will open those 100 files and process them with the default backtrader configuration.

$ ./two-million-candles.py
Cerebro Start Time:          2019-10-26 08:33:15.563088
Strat Init Time:             2019-10-26 08:34:31.845349
Time Loading Data Feeds:     76.28
Number of data feeds:        100
Strat Start Time:            2019-10-26 08:34:31.864349
Pre-Next Start Time:         2019-10-26 08:34:32.670352
Time Calculating Indicators: 0.81
Next Start Time:             2019-10-26 08:34:32.671351
Strat warm-up period Time:   0.00
Time to Strat Next Logic:    77.11
End Time:                    2019-10-26 08:35:31.493349
Time in Strategy Next Logic: 58.82
Total Time in Strategy:      58.82
Total Time:                  135.93
Length of data feeds:        20000

Memory Usage: A peak of 348 Mbytes was observed

Most of the time is actually spent preloading the data (98.63 seconds), spending the rest in the strategy, which includes going through the broker in each iteration (73.63 seconds). The total time is 173.26 seconds.

Depending on how you want to calculate it the performance is:

14,713 candles/second considering the entire run time

Bottomline: the claim in the 1^st of the two reddit thread above that backtrader cannot handle 1.6M candles is FALSE.

Doing it with `pypy`

Since the thread claims that using pypy didn't help, let's see what happens when using it.

$ ./two-million-candles.py
Cerebro Start Time:          2019-10-26 08:39:42.958689
Strat Init Time:             2019-10-26 08:40:31.260691
Time Loading Data Feeds:     48.30
Number of data feeds:        100
Strat Start Time:            2019-10-26 08:40:31.338692
Pre-Next Start Time:         2019-10-26 08:40:31.612688
Time Calculating Indicators: 0.27
Next Start Time:             2019-10-26 08:40:31.612688
Strat warm-up period Time:   0.00
Time to Strat Next Logic:    48.65
End Time:                    2019-10-26 08:40:40.150689
Time in Strategy Next Logic: 8.54
Total Time in Strategy:      8.54
Total Time:                  57.19
Length of data feeds:        20000

Holy Cow! The total time has gone down to 57.19 seconds in total from 135.93 seconds. The performance has more than doubled.

The performance: 34,971 candles/second

Memory Usage: a peak of 269 Mbytes was seen.

This is also an important improvement over the standard CPython interpreter.

Handling the 2M candles out of core memory

All of this can be improved if one considers that backtrader has several configuration options for the execution of a backtesting session, including optimizing the buffers and working only with the minimum needed set of data (ideally with just buffers of size 1, which would only happen in ideal scenarios)

The option to be used will be exactbars=True. From the documentation for exactbars (which is a parameter given to Cerebro during either instantiation or when invoking run)

  `True` or `1`: all “lines” objects reduce memory usage to the
  automatically calculated minimum period.

  If a Simple Moving Average has a period of 30, the underlying data
  will have always a running buffer of 30 bars to allow the
  calculation of the Simple Moving Average

  * This setting will deactivate `preload` and `runonce`

  * Using this setting also deactivates **plotting**

For the sake of maximum optimization and because plotting will be disabled, the following will be used too: stdstats=False, which disables the standard Observers for cash, value and trades (useful for plotting, which is no longer in scope)

$ ./two-million-candles.py --cerebro exactbars=False,stdstats=False
Cerebro Start Time:          2019-10-26 08:37:08.014348
Strat Init Time:             2019-10-26 08:38:21.850392
Time Loading Data Feeds:     73.84
Number of data feeds:        100
Strat Start Time:            2019-10-26 08:38:21.851394
Pre-Next Start Time:         2019-10-26 08:38:21.857393
Time Calculating Indicators: 0.01
Next Start Time:             2019-10-26 08:38:21.857393
Strat warm-up period Time:   0.00
Time to Strat Next Logic:    73.84
End Time:                    2019-10-26 08:39:02.334936
Time in Strategy Next Logic: 40.48
Total Time in Strategy:      40.48
Total Time:                  114.32
Length of data feeds:        20000

The performance: 17,494 candles/second

Memory Usage: 75 Mbytes (stable from the beginning to the end of the backtesting session)

Let's compare to the previous non-optimized run

Instead of spending over 76 seconds preloading data, backtesting starts immediately, because the data is not preloaded
The total time is 114.32 seconds vs 135.93. An improvement of 15.90%.
An improvement in memory usage of 68.5%.

Note

We could have actually thrown 100M candles to the script and the amount of memory consumed would have remained fixed at 75 Mbytes

Doing it again with `pypy`

Now that we know how to optimize, let's do it the pypy way.

$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False
Cerebro Start Time:          2019-10-26 08:44:32.309689
Strat Init Time:             2019-10-26 08:44:32.406689
Time Loading Data Feeds:     0.10
Number of data feeds:        100
Strat Start Time:            2019-10-26 08:44:32.409689
Pre-Next Start Time:         2019-10-26 08:44:32.451689
Time Calculating Indicators: 0.04
Next Start Time:             2019-10-26 08:44:32.451689
Strat warm-up period Time:   0.00
Time to Strat Next Logic:    0.14
End Time:                    2019-10-26 08:45:38.918693
Time in Strategy Next Logic: 66.47
Total Time in Strategy:      66.47
Total Time:                  66.61
Length of data feeds:        20000

The performance: 30,025 candles/second

Memory Usage: constant at 49 Mbytes

Comparing it to the previous equivalent run:

66.61 seconds vs 114.32 or a 41.73% improvement in run time
49 Mbytes vs 75 Mbytes or a 34.6% improvement.

Note

In this case pypy has not been able to beat its own time compared to the batch (runonce) mode, which was 57.19 seconds. This is to be expected, because when preloading, the calculator indications are done in vectorized mode and that's where the JIT of pypy excels

It has, in any case, still done a very good job and there is an important improvement in memory consumption

A complete run with trading

The script can create indicators (moving averages) and execute a short/long strategy on the 100 data feeds using the crossover of the moving averages. Let's do it with pypy, and knowing that it is better with the batch mode, so be it.

$ ./two-million-candles.py --strat indicators=True,trade=True
Cerebro Start Time:          2019-10-26 08:57:36.114415
Strat Init Time:             2019-10-26 08:58:25.569448
Time Loading Data Feeds:     49.46
Number of data feeds:        100
Total indicators:            300
Moving Average to be used:   SMA
Indicators period 1:         10
Indicators period 2:         50
Strat Start Time:            2019-10-26 08:58:26.230445
Pre-Next Start Time:         2019-10-26 08:58:40.850447
Time Calculating Indicators: 14.62
Next Start Time:             2019-10-26 08:58:41.005446
Strat warm-up period Time:   0.15
Time to Strat Next Logic:    64.89
End Time:                    2019-10-26 09:00:13.057955
Time in Strategy Next Logic: 92.05
Total Time in Strategy:      92.21
Total Time:                  156.94
Length of data feeds:        20000

The performance: 12,743 candles/second

Memory Usage: A peak of 1300 Mbytes was observed.

The execution time has obviously increased (indicators + trading), but why the memory usage increase?

Before reaching any conclusions, let's run it creating indicators but without trading

$ ./two-million-candles.py --strat indicators=True
Cerebro Start Time:          2019-10-26 09:05:55.967969
Strat Init Time:             2019-10-26 09:06:44.072969
Time Loading Data Feeds:     48.10
Number of data feeds:        100
Total indicators:            300
Moving Average to be used:   SMA
Indicators period 1:         10
Indicators period 2:         50
Strat Start Time:            2019-10-26 09:06:44.779971
Pre-Next Start Time:         2019-10-26 09:06:59.208969
Time Calculating Indicators: 14.43
Next Start Time:             2019-10-26 09:06:59.360969
Strat warm-up period Time:   0.15
Time to Strat Next Logic:    63.39
End Time:                    2019-10-26 09:07:09.151838
Time in Strategy Next Logic: 9.79
Total Time in Strategy:      9.94
Total Time:                  73.18
Length of data feeds:        20000

The performance: 27,329 candles/second

Memory Usage: 600 Mbytes (doing the same in optimized exactbarsmode consumes only 60 Mbytes, but with an increase in the execution time as pypy itself cannot optimize so much)

With that in the hand: Memory usage increases really when trading. The reason being that Order and Trade objects are created, passed around and kept by the broker.

Note

Take into account that the data set contains random values, which generates a huge number of crossovers, hence an enourmous amounts of orders and trades. A similar behavior shall not be expected for a regular data set.

Conclusions

The bogus claim

Already proven above as bogus, becase backtrader CAN handle 1.6 million candles and more.

General

backtrader can easily handle 2M candles using the default configuration (with in-memory data pre-loading)
backtrader can operate in an non-preloading optimized mode reducing buffers to the minimum for out-of-core-memory backtesting
When backtesting in optimized non-preloading mode, the increase in memory consumption comes from the administrative overhead which the broker generates.
Even when the trading, using indicators and the broker getting constantly in the way, the performance is 12,473 candles/second
Use pypy where possible (for example if you don't need to plot)

Using Python and/or backtrader for these cases

With pypy, trading enabled, and the random data set (higher than usual number of trades), the entire 2M bars was processed in a total of:

156.94 seconds, i.e.: almost 2 minutes and 37 seconds

Taking into account that this is done in a laptop running multiple other things simultaneously, it can be concluded that 2M bars can be done.

What about the `8000` stocks scenario?

Execution time would have to be scaled by 80, hence:

12,560 seconds (or almost 210 minutes or 3 hours and 30 minutes) would be needed to run this random set scenario.

Even assuming a standard data set which would generate far less operations, one would still be talking of backtesting in hours (3 or 4)

Memory usage would also increase, when trading due to the broker actions, and would probably require some Gigabytes.

Note

One cannot here simply multiply by 80 again, because the sample scripts trades with random data and as often as possible. In any case the amount of RAM needed would be IMPORTANT

As such, a workflow with only backtrader as the research and backtesting tool would seem far fetched.

A Discussion about Workflows

There are two standard workflows to consider when using backtrader

Do everything with backtrader, i.e.: research and backtesting all in one
Research with pandas, get the notion if the ideas are good and then backtest with backtrader to verify with as much as accuracy as possible, having possibly reduced huge data-sets to something more palatable for usual RAM scenarios.

Tip

One can imagine replacing pandas with something like dask for out-of-core-memory execution

The Test Script

Here the source code

#!/usr/bin/env python
# -*- coding: utf-8; py-indent-offset:4 -*-
###############################################################################
import argparse
import datetime

import backtrader as bt


class St(bt.Strategy):
    params = dict(
        indicators=False,
        indperiod1=10,
        indperiod2=50,
        indicator=bt.ind.SMA,
        trade=False,
    )

    def __init__(self):
        self.dtinit = datetime.datetime.now()
        print('Strat Init Time:             {}'.format(self.dtinit))
        loaddata = (self.dtinit - self.env.dtcerebro).total_seconds()
        print('Time Loading Data Feeds:     {:.2f}'.format(loaddata))

        print('Number of data feeds:        {}'.format(len(self.datas)))
        if self.p.indicators:
            total_ind = self.p.indicators * 3 * len(self.datas)
            print('Total indicators:            {}'.format(total_ind))
            indname = self.p.indicator.__name__
            print('Moving Average to be used:   {}'.format(indname))
            print('Indicators period 1:         {}'.format(self.p.indperiod1))
            print('Indicators period 2:         {}'.format(self.p.indperiod2))

            self.macross = {}
            for d in self.datas:
                ma1 = self.p.indicator(d, period=self.p.indperiod1)
                ma2 = self.p.indicator(d, period=self.p.indperiod2)
                self.macross[d] = bt.ind.CrossOver(ma1, ma2)

    def start(self):
        self.dtstart = datetime.datetime.now()
        print('Strat Start Time:            {}'.format(self.dtstart))

    def prenext(self):
        if len(self.data0) == 1:  # only 1st time
            self.dtprenext = datetime.datetime.now()
            print('Pre-Next Start Time:         {}'.format(self.dtprenext))
            indcalc = (self.dtprenext - self.dtstart).total_seconds()
            print('Time Calculating Indicators: {:.2f}'.format(indcalc))

    def nextstart(self):
        if len(self.data0) == 1:  # there was no prenext
            self.dtprenext = datetime.datetime.now()
            print('Pre-Next Start Time:         {}'.format(self.dtprenext))
            indcalc = (self.dtprenext - self.dtstart).total_seconds()
            print('Time Calculating Indicators: {:.2f}'.format(indcalc))

        self.dtnextstart = datetime.datetime.now()
        print('Next Start Time:             {}'.format(self.dtnextstart))
        warmup = (self.dtnextstart - self.dtprenext).total_seconds()
        print('Strat warm-up period Time:   {:.2f}'.format(warmup))
        nextstart = (self.dtnextstart - self.env.dtcerebro).total_seconds()
        print('Time to Strat Next Logic:    {:.2f}'.format(nextstart))
        self.next()

    def next(self):
        if not self.p.trade:
            return

        for d, macross in self.macross.items():
            if macross > 0:
                self.order_target_size(data=d, target=1)
            elif macross < 0:
                self.order_target_size(data=d, target=-1)

    def stop(self):
        dtstop = datetime.datetime.now()
        print('End Time:                    {}'.format(dtstop))
        nexttime = (dtstop - self.dtnextstart).total_seconds()
        print('Time in Strategy Next Logic: {:.2f}'.format(nexttime))
        strattime = (dtstop - self.dtprenext).total_seconds()
        print('Total Time in Strategy:      {:.2f}'.format(strattime))
        totaltime = (dtstop - self.env.dtcerebro).total_seconds()
        print('Total Time:                  {:.2f}'.format(totaltime))
        print('Length of data feeds:        {}'.format(len(self.data)))


def run(args=None):
    args = parse_args(args)

    cerebro = bt.Cerebro()

    datakwargs = dict(timeframe=bt.TimeFrame.Minutes, compression=15)
    for i in range(args.numfiles):
        dataname = 'candles{:02d}.csv'.format(i)
        data = bt.feeds.GenericCSVData(dataname=dataname, **datakwargs)
        cerebro.adddata(data)

    cerebro.addstrategy(St, **eval('dict(' + args.strat + ')'))
    cerebro.dtcerebro = dt0 = datetime.datetime.now()
    print('Cerebro Start Time:          {}'.format(dt0))
    cerebro.run(**eval('dict(' + args.cerebro + ')'))


def parse_args(pargs=None):
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
        description=(
            'Backtrader Basic Script'
        )
    )

    parser.add_argument('--numfiles', required=False, default=100, type=int,
                        help='Number of files to rea')

    parser.add_argument('--cerebro', required=False, default='',
                        metavar='kwargs', help='kwargs in key=value format')

    parser.add_argument('--strat', '--strategy', required=False, default='',
                        metavar='kwargs', help='kwargs in key=value format')


    return parser.parse_args(pargs)


if __name__ == '__main__':
    run()

On Backtesting Performance and Out of Core Memory Execution

The 2M Candles

The test system

backtrader default configuration

Executing the challenge in the default batch runonce mode

Doing it with pypy

Handling the 2M candles out of core memory

Doing it again with pypy

A complete run with trading

Conclusions

The bogus claim

General

Using Python and/or backtrader for these cases

What about the 8000 stocks scenario?

A Discussion about Workflows

The Test Script

Executing the challenge in the default batch `runonce` mode

Doing it with `pypy`

Doing it again with `pypy`

What about the `8000` stocks scenario?