Skip to content

Pandas DataFeed Support

Amongst some minor enhancementss and some OrderedDict tweaks for better Python 2.6 support, the latest release from backtrader adds support for analyzing data from a Pandas Dataframe or Time Series.

Note

Obviously pandas and its dependencies have to be installed

This seems to be of concern to lots of people, who rely on the already available parsing code for different data sources (including CSV)

class PandasData(feed.DataBase):
    '''
    The ``dataname`` parameter inherited from ``feed.DataBase``  is the pandas
    Time Series
    '''

    params = (
        # Possible values for datetime (must always be present)
        #  None : datetime is the "index" in the Pandas Dataframe
        #  -1 : autodetect position or case-wise equal name
        #  >= 0 : numeric index to the colum in the pandas dataframe
        #  string : column name (as index) in the pandas dataframe
        ('datetime', None),

        # Possible values below:
        #  None : column not present
        #  -1 : autodetect position or case-wise equal name
        #  >= 0 : numeric index to the colum in the pandas dataframe
        #  string : column name (as index) in the pandas dataframe
        ('open', -1),
        ('high', -1),
        ('low', -1),
        ('close', -1),
        ('volume', -1),
        ('openinterest', -1),
    )

The above excerpt from the PandasData class shows the keys:

  • The dataname parameter to the class during instantiation holds the Pandas Dataframe

    This parameter is inherited from the base class feed.DataBase

  • The new parameters have the names of the regular fields in the DataSeries and follow these conventions

    • datetime (default: None)

    • None : datetime is the “index” in the Pandas Dataframe

    • -1 : autodetect position or case-wise equal name

    • = 0 : numeric index to the colum in the pandas dataframe

    • string : column name (as index) in the pandas dataframe

    • open, high, low, high, close, volume, openinterest (default: -1 for all of them)

    • None : column not present

    • -1 : autodetect position or case-wise equal name

    • = 0 : numeric index to the colum in the pandas dataframe

    • string : column name (as index) in the pandas dataframe

A small sample should be able to load the standar 2006 sample, having been parsed by Pandas, rather than directly by backtrader

Running the sample to use the exiting “headers” in the CSV data:

$ ./panda-test.py
--------------------------------------------------
               Open     High      Low    Close  Volume  OpenInterest
Date
2006-01-02  3578.73  3605.95  3578.73  3604.33       0             0
2006-01-03  3604.08  3638.42  3601.84  3614.34       0             0
2006-01-04  3615.23  3652.46  3615.23  3652.46       0             0

The same but telling the script to skip the headers:

$ ./panda-test.py --noheaders
--------------------------------------------------
                  1        2        3        4  5  6
0
2006-01-02  3578.73  3605.95  3578.73  3604.33  0  0
2006-01-03  3604.08  3638.42  3601.84  3614.34  0  0
2006-01-04  3615.23  3652.46  3615.23  3652.46  0  0

The 2nd run is using tells pandas.read_csv:

  • To skip the first input row (skiprows keyword argument set to 1)

  • Not to look for a headers row (header keyword argument set to None)

The backtrader support for Pandas tries to automatically detect if column names have been used or else numeric indices and acts accordingly, trying to offer a best match.

The following chart is the tribute to success. The Pandas Dataframe has been correctly loaded (in both cases)

image

The sample code for the test.

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import argparse

import backtrader as bt
import backtrader.feeds as btfeeds

import pandas


def runstrat():
    args = parse_args()

    # Create a cerebro entity
    cerebro = bt.Cerebro(stdstats=False)

    # Add a strategy
    cerebro.addstrategy(bt.Strategy)

    # Get a pandas dataframe
    datapath = ('../datas/sample/2006-day-001.txt')

    # Simulate the header row isn't there if noheaders requested
    skiprows = 1 if args.noheaders else 0
    header = None if args.noheaders else 0

    dataframe = pandas.read_csv(datapath,
                                skiprows=skiprows,
                                header=header,
                                parse_dates=True,
                                index_col=0)

    if not args.noprint:
        print('--------------------------------------------------')
        print(dataframe)
        print('--------------------------------------------------')

    # Pass it to the backtrader datafeed and add it to the cerebro
    data = bt.feeds.PandasData(dataname=dataframe)

    cerebro.adddata(data)

    # Run over everything
    cerebro.run()

    # Plot the result
    cerebro.plot(style='bar')


def parse_args():
    parser = argparse.ArgumentParser(
        description='Pandas test script')

    parser.add_argument('--noheaders', action='store_true', default=False,
                        required=False,
                        help='Do not use header rows')

    parser.add_argument('--noprint', action='store_true', default=False,
                        help='Print the dataframe')

    return parser.parse_args()


if __name__ == '__main__':
    runstrat()