Optimization improvements
Version 1.8.12.99 of backtrader includes an improvement in how data feeds and results are managed during multiprocessing.
Note
The behavior for both has been made
The behavior of these options can be controlled through two new Cerebro parameters:
-
optdatas
(default:True
)If
True
and optimizing (and the system canpreload
and userunonce
, data preloading will be done only once in the main process to save time and resources. -
optreturn
(default:True
)If
True
the optimization results will not be fullStrategy
objects (and all datas, indicators, observers …) but and object with the following attributes (same as inStrategy
):-
params
(orp
) the strategy had for the execution -
analyzers
the strategy has executed
In most occassions, only the analyzers and with which params are the things needed to evaluate a the performance of a strategy. If detailed analysis of the generated values for (for example) indicators is needed, turn this off
-
Data Feed Management
In a Optimization scenario this is a likely combination of Cerebro parameters:
-
preload=True
(default)Data Feeeds will be preloaded before running any backtesting code
-
runonce=True
(default)Indicators will be calculated in batch mode a tight for loop, instead of step by step.
If both conditions are True
and optdatas=True
, then:
- The Data Feeds will be preloaded in the main process before spawning new subprocesses (the ones in charge of executing the backtesting)
Results management
In a Optimization scenario two things should play the most important role when evaluating the different parameters with which each *Strategy was run:
-
strategy.params
(orstrategy.p
)The actual set of values used for the backtesting
-
strategy.analyzers
The objects in charge of providing the evaluation of how the Strategy has actually performed. Example:
SharpeRatio_A
(the annualized SharpeRatio)
When optreturn=True
, instead of returning full strategy instances,
placeholder objects will be created which carry the two attributes
aforementioned to let the evaluation take place.
This avoids passing back lots of generated data like for example the values generated by indicators during the backtesting
Should the full strategy objects be wished, simply set optreturn=False
during cerebro instantiation or when doing cerebro.run
.
Some test runs
The optimization sample in the backtrader sources has been extended to add
control for optdatas
and optreturn
(actually to disable them)
Single Core Run
As a reference what happens when the amount of CPUs is limited to 1
and the
multiprocessing
module is not used:
$ ./optimization.py --maxcpus 1
==================================================
**************************************************
--------------------------------------------------
OrderedDict([(u'smaperiod', 10), (u'macdperiod1', 12), (u'macdperiod2', 26), (u'macdperiod3', 9)])
**************************************************
--------------------------------------------------
OrderedDict([(u'smaperiod', 10), (u'macdperiod1', 13), (u'macdperiod2', 26), (u'macdperiod3', 9)])
...
...
OrderedDict([(u'smaperiod', 29), (u'macdperiod1', 19), (u'macdperiod2', 29), (u'macdperiod3', 14)])
==================================================
Time used: 184.922727833
Multiple Core Runs
Without limiting the number of CPUs, the Python multiprocessing
module will
try to use all of them. optdatas
and optreturn
will be disabled
Both optdata
and optreturn
active
The default behavior:
$ ./optimization.py
...
...
...
==================================================
Time used: 56.5889185394
The total improvement by having multicore and the data feed and results
improvements means going down from 184.92
to 56.58
seconds.
Take into account that the sample is using 252
bars and the indicators
generate only values with a length of 252
points. This is just an example.
The real question is how much of this is attributable to the new behavior.
optreturn
deactivated
Let’s pass full strategy objects back to the caller:
$ ./optimization.py --no-optreturn
...
...
...
==================================================
Time used: 67.056914007
The execution time has increased 18.50%
(or a speed-up of 15.62%
) is in
place.
optdatas
deactivated
Each subproccess is forced to load its own set of values for the data feeds:
$ ./optimization.py --no-optdatas
...
...
...
==================================================
Time used: 72.7238112637
The execution time has increased 28.52%
(or a speed-up of 22.19%
) is in
place.
Both deactivated
Still using multicore but with the old non-improved behavior:
$ ./optimization.py --no-optdatas --no-optreturn
...
...
...
==================================================
Time used: 83.6246643786
The execution time has increased 47.79%
(or a speed-up of 32.34%
) is in
place.
This shows that the used of multiple cores is the major contributor to the time improvement.
Note
The executions have been done in a Laptop with a i7-4710HQ
(4-core / 8
logical) with 16 GBytes of RAM under Windows 10 64bit. The mileage may vary
under other conditions
Concluding
-
The greatest factor in time reduction during optimization is the use of the multiple cores
-
The sample runs with
optdatas
andoptreturn
show speed-ups of around22.19%
and15.62%
each (32.34%
both together in the test)
Sample Usage
$ ./optimization.py --help
usage: optimization.py [-h] [--data DATA] [--fromdate FROMDATE]
[--todate TODATE] [--maxcpus MAXCPUS] [--no-runonce]
[--exactbars EXACTBARS] [--no-optdatas]
[--no-optreturn] [--ma_low MA_LOW] [--ma_high MA_HIGH]
[--m1_low M1_LOW] [--m1_high M1_HIGH] [--m2_low M2_LOW]
[--m2_high M2_HIGH] [--m3_low M3_LOW]
[--m3_high M3_HIGH]
Optimization
optional arguments:
-h, --help show this help message and exit
--data DATA, -d DATA data to add to the system
--fromdate FROMDATE, -f FROMDATE
Starting date in YYYY-MM-DD format
--todate TODATE, -t TODATE
Starting date in YYYY-MM-DD format
--maxcpus MAXCPUS, -m MAXCPUS
Number of CPUs to use in the optimization
- 0 (default): use all available CPUs
- 1 -> n: use as many as specified
--no-runonce Run in next mode
--exactbars EXACTBARS
Use the specified exactbars still compatible with preload
0 No memory savings
-1 Moderate memory savings
-2 Less moderate memory savings
--no-optdatas Do not optimize data preloading in optimization
--no-optreturn Do not optimize the returned values to save time
--ma_low MA_LOW SMA range low to optimize
--ma_high MA_HIGH SMA range high to optimize
--m1_low M1_LOW MACD Fast MA range low to optimize
--m1_high M1_HIGH MACD Fast MA range high to optimize
--m2_low M2_LOW MACD Slow MA range low to optimize
--m2_high M2_HIGH MACD Slow MA range high to optimize
--m3_low M3_LOW MACD Signal range low to optimize
--m3_high M3_HIGH MACD Signal range high to optimize
Sample Code
from __future__ import (absolute_import, division, print_function,
unicode_literals)
import argparse
import datetime
import time
from backtrader.utils.py3 import range
import backtrader as bt
import backtrader.indicators as btind
import backtrader.feeds as btfeeds
class OptimizeStrategy(bt.Strategy):
params = (('smaperiod', 15),
('macdperiod1', 12),
('macdperiod2', 26),
('macdperiod3', 9),
)
def __init__(self):
# Add indicators to add load
btind.SMA(period=self.p.smaperiod)
btind.MACD(period_me1=self.p.macdperiod1,
period_me2=self.p.macdperiod2,
period_signal=self.p.macdperiod3)
def runstrat():
args = parse_args()
# Create a cerebro entity
cerebro = bt.Cerebro(maxcpus=args.maxcpus,
runonce=not args.no_runonce,
exactbars=args.exactbars,
optdatas=not args.no_optdatas,
optreturn=not args.no_optreturn)
# Add a strategy
cerebro.optstrategy(
OptimizeStrategy,
smaperiod=range(args.ma_low, args.ma_high),
macdperiod1=range(args.m1_low, args.m1_high),
macdperiod2=range(args.m2_low, args.m2_high),
macdperiod3=range(args.m3_low, args.m3_high),
)
# Get the dates from the args
fromdate = datetime.datetime.strptime(args.fromdate, '%Y-%m-%d')
todate = datetime.datetime.strptime(args.todate, '%Y-%m-%d')
# Create the 1st data
data = btfeeds.BacktraderCSVData(
dataname=args.data,
fromdate=fromdate,
todate=todate)
# Add the Data Feed to Cerebro
cerebro.adddata(data)
# clock the start of the process
tstart = time.clock()
# Run over everything
stratruns = cerebro.run()
# clock the end of the process
tend = time.clock()
print('==================================================')
for stratrun in stratruns:
print('**************************************************')
for strat in stratrun:
print('--------------------------------------------------')
print(strat.p._getkwargs())
print('==================================================')
# print out the result
print('Time used:', str(tend - tstart))
def parse_args():
parser = argparse.ArgumentParser(
description='Optimization',
formatter_class=argparse.RawTextHelpFormatter,
)
parser.add_argument(
'--data', '-d',
default='../../datas/2006-day-001.txt',
help='data to add to the system')
parser.add_argument(
'--fromdate', '-f',
default='2006-01-01',
help='Starting date in YYYY-MM-DD format')
parser.add_argument(
'--todate', '-t',
default='2006-12-31',
help='Starting date in YYYY-MM-DD format')
parser.add_argument(
'--maxcpus', '-m',
type=int, required=False, default=0,
help=('Number of CPUs to use in the optimization'
'\n'
' - 0 (default): use all available CPUs\n'
' - 1 -> n: use as many as specified\n'))
parser.add_argument(
'--no-runonce', action='store_true', required=False,
help='Run in next mode')
parser.add_argument(
'--exactbars', required=False, type=int, default=0,
help=('Use the specified exactbars still compatible with preload\n'
' 0 No memory savings\n'
' -1 Moderate memory savings\n'
' -2 Less moderate memory savings\n'))
parser.add_argument(
'--no-optdatas', action='store_true', required=False,
help='Do not optimize data preloading in optimization')
parser.add_argument(
'--no-optreturn', action='store_true', required=False,
help='Do not optimize the returned values to save time')
parser.add_argument(
'--ma_low', type=int,
default=10, required=False,
help='SMA range low to optimize')
parser.add_argument(
'--ma_high', type=int,
default=30, required=False,
help='SMA range high to optimize')
parser.add_argument(
'--m1_low', type=int,
default=12, required=False,
help='MACD Fast MA range low to optimize')
parser.add_argument(
'--m1_high', type=int,
default=20, required=False,
help='MACD Fast MA range high to optimize')
parser.add_argument(
'--m2_low', type=int,
default=26, required=False,
help='MACD Slow MA range low to optimize')
parser.add_argument(
'--m2_high', type=int,
default=30, required=False,
help='MACD Slow MA range high to optimize')
parser.add_argument(
'--m3_low', type=int,
default=9, required=False,
help='MACD Signal range low to optimize')
parser.add_argument(
'--m3_high', type=int,
default=15, required=False,
help='MACD Signal range high to optimize')
return parser.parse_args()
if __name__ == '__main__':
runstrat()