Escape from OHLC Land

One of the key concepts applied during the conception and development of backtrader was flexibility. The metaprogramming and introspection capabilities of Python were (and still are) the basis to keep many things flexible whilst still being able to deliver.

An old post shows the extension concept.

Extending a datafeed

The basics:

from backtrader.feeds import GenericCSVData

class GenericCSV_PE(GenericCSVData):
    lines = ('pe',)  # Add 'pe' to already defined lines

Done. backtrader defines in the background the most usual lines: OHLC.

If we digged into the final aspect of GenericCSV_PE, the sum of inherited plus newly defined lines would yield the following lines:

('close', 'open', 'high', 'low', 'volume', 'openinterest', 'datetime', 'pe',)

This can be check at any time with the method getlinealiases (applicable to DataFeeds, Indicators, Strategies and Observers)

The mechanism is flexible and by poking a bit into the internals you could actually get anything, but it has been proven not to be enough.

Ticket #60 asks about supporting High Frequency Data, ie: Bid/Ask data. Which implies that the predefined lines hierarchy in the form of OHLC is not enough. The Bid and Ask prices, volumes and number of trades can be made to fit into the existing OHLC fields, but it wouldn’t feel natural. And if one is only concerned with the Bid and Ask prices, there would be too many fields left untouched.

This called for a solution which has been implemented with Release 1.2.1.88. The idea can be summarized as:

Now it’s not only possible to extend the existing hierarchy, but also to replace the hierarchy with a new one

Only one constraint in place:

There must be a datetime field present (which will hopefully contain meaningful datetime information)

This is so because backtrader needs something for synchronization (multiple datas, multiple timeframes, resampling, replaying) just like Archimedes needed a lever.

Here it is how it works:

from backtrader.feeds import GenericCSVData

class GenericCSV_BidAsk(GenericCSVData):
    linesoverride = True
    lines = ('bid', 'ask', 'datetime')  # Replace hierarchy with this one

Done.

Ok, not fully. But only because we are looking at loading the lines from a csv source. The hierarchy has actually already been replaced with the bid, ask datetime definition thanks to the linesoverride=True setting.

The original GenericCSVData class parses a csv file and needs a hint as to where the fields corresponding to the lines are located. The original definition is:

class GenericCSVData(feed.CSVDataBase):
    params = (
        ('nullvalue', float('NaN')),
        ('dtformat', '%Y-%m-%d %H:%M:%S'),
        ('tmformat', '%H:%M:%S'),

        ('datetime', 0),
        ('time', -1),  # -1 means not present
        ('open', 1),
        ('high', 2),
        ('low', 3),
        ('close', 4),
        ('volume', 5),
        ('openinterest', 6),
    )

The new hierarchy-redefining-class can be completed with a light touch:

from backtrader.feeds import GenericCSVData

class GenericCSV_BidAsk(GenericCSVData):
    linesoverride = True
    lines = ('bid', 'ask', 'datetime')  # Replace hierarchy with this one

    params = (('bid', 1), ('ask', 2))

Indicating that Bid prices are field #1 in the csv stream and Ask prices are field #2. We have left the datetime #0 definition untouched from the base class.

Crafting a small data file for the occasion helps:

TIMESTAMP,BID,ASK
02/03/2010 16:53:50,0.5346,0.5347
02/03/2010 16:53:51,0.5343,0.5347
02/03/2010 16:53:52,0.5543,0.5545
02/03/2010 16:53:53,0.5342,0.5344
02/03/2010 16:53:54,0.5245,0.5464
02/03/2010 16:53:54,0.5460,0.5470
02/03/2010 16:53:56,0.5824,0.5826
02/03/2010 16:53:57,0.5371,0.5374
02/03/2010 16:53:58,0.5793,0.5794
02/03/2010 16:53:59,0.5684,0.5688

Add a small test script to the equation (with some more content for those who just go directly to the samples in the sources) (see full code at the end):

$ ./bidask.py

And the output speaks up for itself:

 1: 2010-02-03T16:53:50 - Bid 0.5346 - 0.5347 Ask
 2: 2010-02-03T16:53:51 - Bid 0.5343 - 0.5347 Ask
 3: 2010-02-03T16:53:52 - Bid 0.5543 - 0.5545 Ask
 4: 2010-02-03T16:53:53 - Bid 0.5342 - 0.5344 Ask
 5: 2010-02-03T16:53:54 - Bid 0.5245 - 0.5464 Ask
 6: 2010-02-03T16:53:54 - Bid 0.5460 - 0.5470 Ask
 7: 2010-02-03T16:53:56 - Bid 0.5824 - 0.5826 Ask
 8: 2010-02-03T16:53:57 - Bid 0.5371 - 0.5374 Ask
 9: 2010-02-03T16:53:58 - Bid 0.5793 - 0.5794 Ask
10: 2010-02-03T16:53:59 - Bid 0.5684 - 0.5688 Ask

Et voilá! The Bid/Ask prices have been properly read, parsed and interpreted and the strategy has been able to access the .bid and .ask lines in the data feed through self.data.

Redefining the lines hierarchy opens a broad question though and that is the usage of the already predefined Indicators.

Example: the Stochastic is an indicator which relies on close, high and low prices to calculate its output

Even if we though about Bid as the close (because is the first) there is only one other price element (Ask) and not two more. And conceptually Ask has nothing to do with high and low

It is probable that someone working with these fields and operating (or researching) in the High Frequency Trading domain is not concerned with Stochastic as an indicator of choice
Other indicators like moving average are perfectly fine. They assume nothing about what the fields mean or imply and will happily take anything. As such one can do:
```
mysma = backtrader.indicators.SMA(self.data.bid, period=5)
```
And an moving average of the last 5 bid prices will be delivered

The test script already supports adding a SMA. Let’s execute:

$ ./bidask.py --sma --period=3

The output:

 3: 2010-02-03T16:53:52 - Bid 0.5543 - 0.5545 Ask - SMA: 0.5411
 4: 2010-02-03T16:53:53 - Bid 0.5342 - 0.5344 Ask - SMA: 0.5409
 5: 2010-02-03T16:53:54 - Bid 0.5245 - 0.5464 Ask - SMA: 0.5377
 6: 2010-02-03T16:53:54 - Bid 0.5460 - 0.5470 Ask - SMA: 0.5349
 7: 2010-02-03T16:53:56 - Bid 0.5824 - 0.5826 Ask - SMA: 0.5510
 8: 2010-02-03T16:53:57 - Bid 0.5371 - 0.5374 Ask - SMA: 0.5552
 9: 2010-02-03T16:53:58 - Bid 0.5793 - 0.5794 Ask - SMA: 0.5663
10: 2010-02-03T16:53:59 - Bid 0.5684 - 0.5688 Ask - SMA: 0.5616

Note

Plotting still relies on open, high, low, close and volume being present in the data feed.

Some cases can be directly covered by simply plotting with a Line on Close and taking just the 1^st defined line in the object. But a sound model has to be developed. For an upcoming version of backtrader

The test script usage:

$ ./bidask.py --help
usage: bidask.py [-h] [--data DATA] [--dtformat DTFORMAT] [--sma]
                 [--period PERIOD]

Bid/Ask Line Hierarchy

optional arguments:
  -h, --help            show this help message and exit
  --data DATA, -d DATA  data to add to the system (default:
                        ../../datas/bidask.csv)
  --dtformat DTFORMAT, -dt DTFORMAT
                        Format of datetime in input (default: %m/%d/%Y
                        %H:%M:%S)
  --sma, -s             Add an SMA to the mix (default: False)
  --period PERIOD, -p PERIOD
                        Period for the sma (default: 5)

And the test script itself (included in the backtrader sources)

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import argparse

import backtrader as bt
import backtrader.feeds as btfeeds
import backtrader.indicators as btind


class BidAskCSV(btfeeds.GenericCSVData):
    linesoverride = True  # discard usual OHLC structure
    # datetime must be present and last
    lines = ('bid', 'ask', 'datetime')
    # datetime (always 1st) and then the desired order for
    params = (
        # (datetime, 0), # inherited from parent class
        ('bid', 1),  # default field pos 1
        ('ask', 2),  # default field pos 2
    )


class St(bt.Strategy):
    params = (('sma', False), ('period', 3))

    def __init__(self):
        if self.p.sma:
            self.sma = btind.SMA(self.data, period=self.p.period)

    def next(self):
        dtstr = self.data.datetime.datetime().isoformat()
        txt = '%4d: %s - Bid %.4f - %.4f Ask' % (
            (len(self), dtstr, self.data.bid[0], self.data.ask[0]))

        if self.p.sma:
            txt += ' - SMA: %.4f' % self.sma[0]
        print(txt)


def parse_args():
    parser = argparse.ArgumentParser(
        description='Bid/Ask Line Hierarchy',
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )

    parser.add_argument('--data', '-d', action='store',
                        required=False, default='../../datas/bidask.csv',
                        help='data to add to the system')

    parser.add_argument('--dtformat', '-dt',
                        required=False, default='%m/%d/%Y %H:%M:%S',
                        help='Format of datetime in input')

    parser.add_argument('--sma', '-s', action='store_true',
                        required=False,
                        help='Add an SMA to the mix')

    parser.add_argument('--period', '-p', action='store',
                        required=False, default=5, type=int,
                        help='Period for the sma')

    return parser.parse_args()


def runstrategy():
    args = parse_args()

    cerebro = bt.Cerebro()  # Create a cerebro

    data = BidAskCSV(dataname=args.data, dtformat=args.dtformat)
    cerebro.adddata(data)  # Add the 1st data to cerebro
    # Add the strategy to cerebro
    cerebro.addstrategy(St, sma=args.sma, period=args.period)
    cerebro.run()


if __name__ == '__main__':
    runstrategy()