Optimization improvements

Version 1.8.12.99 of backtrader includes an improvement in how data feeds and results are managed during multiprocessing.

Note

The behavior for both has been made

The behavior of these options can be controlled through two new Cerebro parameters:

optdatas (default: True)

If True and optimizing (and the system can preload and use runonce, data preloading will be done only once in the main process to save time and resources.
optreturn (default: True)

If True the optimization results will not be full Strategy objects (and all datas, indicators, observers …) but and object with the following attributes (same as in Strategy):
- params (or p) the strategy had for the execution
- analyzers the strategy has executed
In most occassions, only the analyzers and with which params are the things needed to evaluate a the performance of a strategy. If detailed analysis of the generated values for (for example) indicators is needed, turn this off

Data Feed Management

In a Optimization scenario this is a likely combination of Cerebro parameters:

preload=True (default)

Data Feeeds will be preloaded before running any backtesting code
runonce=True (default)

Indicators will be calculated in batch mode a tight for loop, instead of step by step.

If both conditions are True and optdatas=True, then:

The Data Feeds will be preloaded in the main process before spawning new subprocesses (the ones in charge of executing the backtesting)

Results management

In a Optimization scenario two things should play the most important role when evaluating the different parameters with which each *Strategy was run:

strategy.params (or strategy.p)

The actual set of values used for the backtesting
strategy.analyzers

The objects in charge of providing the evaluation of how the Strategy has actually performed. Example:

SharpeRatio_A (the annualized SharpeRatio)

When optreturn=True, instead of returning full strategy instances, placeholder objects will be created which carry the two attributes aforementioned to let the evaluation take place.

This avoids passing back lots of generated data like for example the values generated by indicators during the backtesting

Should the full strategy objects be wished, simply set optreturn=False during cerebro instantiation or when doing cerebro.run.

Some test runs

The optimization sample in the backtrader sources has been extended to add control for optdatas and optreturn (actually to disable them)

Single Core Run

As a reference what happens when the amount of CPUs is limited to 1 and the multiprocessing module is not used:

$ ./optimization.py --maxcpus 1
==================================================
**************************************************
--------------------------------------------------
OrderedDict([(u'smaperiod', 10), (u'macdperiod1', 12), (u'macdperiod2', 26), (u'macdperiod3', 9)])
**************************************************
--------------------------------------------------
OrderedDict([(u'smaperiod', 10), (u'macdperiod1', 13), (u'macdperiod2', 26), (u'macdperiod3', 9)])
...
...
OrderedDict([(u'smaperiod', 29), (u'macdperiod1', 19), (u'macdperiod2', 29), (u'macdperiod3', 14)])
==================================================
Time used: 184.922727833

Multiple Core Runs

Without limiting the number of CPUs, the Python multiprocessing module will try to use all of them. optdatas and optreturn will be disabled

Both `optdata` and `optreturn` active

The default behavior:

$ ./optimization.py
...
...
...
==================================================
Time used: 56.5889185394

The total improvement by having multicore and the data feed and results improvements means going down from 184.92 to 56.58 seconds.

Take into account that the sample is using 252 bars and the indicators generate only values with a length of 252 points. This is just an example.

The real question is how much of this is attributable to the new behavior.

`optreturn` deactivated

Let’s pass full strategy objects back to the caller:

$ ./optimization.py --no-optreturn
...
...
...
==================================================
Time used: 67.056914007

The execution time has increased 18.50% (or a speed-up of 15.62%) is in place.

`optdatas` deactivated

Each subproccess is forced to load its own set of values for the data feeds:

$ ./optimization.py --no-optdatas
...
...
...
==================================================
Time used: 72.7238112637

The execution time has increased 28.52% (or a speed-up of 22.19%) is in place.

Both deactivated

Still using multicore but with the old non-improved behavior:

$ ./optimization.py --no-optdatas --no-optreturn
...
...
...
==================================================
Time used: 83.6246643786

The execution time has increased 47.79% (or a speed-up of 32.34%) is in place.

This shows that the used of multiple cores is the major contributor to the time improvement.

Note

The executions have been done in a Laptop with a i7-4710HQ (4-core / 8 logical) with 16 GBytes of RAM under Windows 10 64bit. The mileage may vary under other conditions

Concluding

The greatest factor in time reduction during optimization is the use of the multiple cores
The sample runs with optdatas and optreturn show speed-ups of around 22.19% and 15.62% each (32.34% both together in the test)

Sample Usage

$ ./optimization.py --help
usage: optimization.py [-h] [--data DATA] [--fromdate FROMDATE]
                       [--todate TODATE] [--maxcpus MAXCPUS] [--no-runonce]
                       [--exactbars EXACTBARS] [--no-optdatas]
                       [--no-optreturn] [--ma_low MA_LOW] [--ma_high MA_HIGH]
                       [--m1_low M1_LOW] [--m1_high M1_HIGH] [--m2_low M2_LOW]
                       [--m2_high M2_HIGH] [--m3_low M3_LOW]
                       [--m3_high M3_HIGH]

Optimization

optional arguments:
  -h, --help            show this help message and exit
  --data DATA, -d DATA  data to add to the system
  --fromdate FROMDATE, -f FROMDATE
                        Starting date in YYYY-MM-DD format
  --todate TODATE, -t TODATE
                        Starting date in YYYY-MM-DD format
  --maxcpus MAXCPUS, -m MAXCPUS
                        Number of CPUs to use in the optimization
                          - 0 (default): use all available CPUs
                          - 1 -> n: use as many as specified
  --no-runonce          Run in next mode
  --exactbars EXACTBARS
                        Use the specified exactbars still compatible with preload
                          0 No memory savings
                          -1 Moderate memory savings
                          -2 Less moderate memory savings
  --no-optdatas         Do not optimize data preloading in optimization
  --no-optreturn        Do not optimize the returned values to save time
  --ma_low MA_LOW       SMA range low to optimize
  --ma_high MA_HIGH     SMA range high to optimize
  --m1_low M1_LOW       MACD Fast MA range low to optimize
  --m1_high M1_HIGH     MACD Fast MA range high to optimize
  --m2_low M2_LOW       MACD Slow MA range low to optimize
  --m2_high M2_HIGH     MACD Slow MA range high to optimize
  --m3_low M3_LOW       MACD Signal range low to optimize
  --m3_high M3_HIGH     MACD Signal range high to optimize

Sample Code

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import argparse
import datetime
import time

from backtrader.utils.py3 import range

import backtrader as bt
import backtrader.indicators as btind
import backtrader.feeds as btfeeds


class OptimizeStrategy(bt.Strategy):
    params = (('smaperiod', 15),
              ('macdperiod1', 12),
              ('macdperiod2', 26),
              ('macdperiod3', 9),
              )

    def __init__(self):
        # Add indicators to add load

        btind.SMA(period=self.p.smaperiod)
        btind.MACD(period_me1=self.p.macdperiod1,
                   period_me2=self.p.macdperiod2,
                   period_signal=self.p.macdperiod3)


def runstrat():
    args = parse_args()

    # Create a cerebro entity
    cerebro = bt.Cerebro(maxcpus=args.maxcpus,
                         runonce=not args.no_runonce,
                         exactbars=args.exactbars,
                         optdatas=not args.no_optdatas,
                         optreturn=not args.no_optreturn)

    # Add a strategy
    cerebro.optstrategy(
        OptimizeStrategy,
        smaperiod=range(args.ma_low, args.ma_high),
        macdperiod1=range(args.m1_low, args.m1_high),
        macdperiod2=range(args.m2_low, args.m2_high),
        macdperiod3=range(args.m3_low, args.m3_high),
    )

    # Get the dates from the args
    fromdate = datetime.datetime.strptime(args.fromdate, '%Y-%m-%d')
    todate = datetime.datetime.strptime(args.todate, '%Y-%m-%d')

    # Create the 1st data
    data = btfeeds.BacktraderCSVData(
        dataname=args.data,
        fromdate=fromdate,
        todate=todate)

    # Add the Data Feed to Cerebro
    cerebro.adddata(data)

    # clock the start of the process
    tstart = time.clock()

    # Run over everything
    stratruns = cerebro.run()

    # clock the end of the process
    tend = time.clock()

    print('==================================================')
    for stratrun in stratruns:
        print('**************************************************')
        for strat in stratrun:
            print('--------------------------------------------------')
            print(strat.p._getkwargs())
    print('==================================================')

    # print out the result
    print('Time used:', str(tend - tstart))


def parse_args():
    parser = argparse.ArgumentParser(
        description='Optimization',
        formatter_class=argparse.RawTextHelpFormatter,
    )

    parser.add_argument(
        '--data', '-d',
        default='../../datas/2006-day-001.txt',
        help='data to add to the system')

    parser.add_argument(
        '--fromdate', '-f',
        default='2006-01-01',
        help='Starting date in YYYY-MM-DD format')

    parser.add_argument(
        '--todate', '-t',
        default='2006-12-31',
        help='Starting date in YYYY-MM-DD format')

    parser.add_argument(
        '--maxcpus', '-m',
        type=int, required=False, default=0,
        help=('Number of CPUs to use in the optimization'
              '\n'
              '  - 0 (default): use all available CPUs\n'
              '  - 1 -> n: use as many as specified\n'))

    parser.add_argument(
        '--no-runonce', action='store_true', required=False,
        help='Run in next mode')

    parser.add_argument(
        '--exactbars', required=False, type=int, default=0,
        help=('Use the specified exactbars still compatible with preload\n'
              '  0 No memory savings\n'
              '  -1 Moderate memory savings\n'
              '  -2 Less moderate memory savings\n'))

    parser.add_argument(
        '--no-optdatas', action='store_true', required=False,
        help='Do not optimize data preloading in optimization')

    parser.add_argument(
        '--no-optreturn', action='store_true', required=False,
        help='Do not optimize the returned values to save time')

    parser.add_argument(
        '--ma_low', type=int,
        default=10, required=False,
        help='SMA range low to optimize')

    parser.add_argument(
        '--ma_high', type=int,
        default=30, required=False,
        help='SMA range high to optimize')

    parser.add_argument(
        '--m1_low', type=int,
        default=12, required=False,
        help='MACD Fast MA range low to optimize')

    parser.add_argument(
        '--m1_high', type=int,
        default=20, required=False,
        help='MACD Fast MA range high to optimize')

    parser.add_argument(
        '--m2_low', type=int,
        default=26, required=False,
        help='MACD Slow MA range low to optimize')

    parser.add_argument(
        '--m2_high', type=int,
        default=30, required=False,
        help='MACD Slow MA range high to optimize')

    parser.add_argument(
        '--m3_low', type=int,
        default=9, required=False,
        help='MACD Signal range low to optimize')

    parser.add_argument(
        '--m3_high', type=int,
        default=15, required=False,
        help='MACD Signal range high to optimize')

    return parser.parse_args()


if __name__ == '__main__':
    runstrat()