Skip to content

Loading Data

No data ... no fun! This should make obvious, that the actual starting point is getting a data feed ready, be it for backtesting or trading.

Many beginners get stuck at this point and call for help on loading a specific format and most of the time simple CSV (Comma Separated Values) files. This is of course due to the lack of programming experience, lack of Python experience and lack of understanding about what is being loaded.

The goal of this section is to try to make it easier to bridge the initial problems with loading the data, which should make it easier to get started writing a trading strategy.

Loading Data in Backtrader

Note

The default timeframe for loading a data feed is to consider a 1-day bar. In the backtrader jargon, this translates to the following named arguments for a data feed

timeframe=bt.TimeFrame.Minutes, compression=1

If the data feed to be loaded has any timeframe/compression combination other than 1-day IT MUST BE EXPLICITLY SPECIFIED

Typically a data feed is created with code that resembles this snippet:

data = bt.feeds.MyChosenFeed(
    dataname=thefile_or_ticker_or_xxxx,
    param1=value1,
    ...
)

At this point in time, data contains NO actual data. It is only a reference which will be used later inside the system to load the data and use it. This may seem awkward at first, but it has a reason

  • Supporting Data Sources which produce the data as a stream, like live data sources for example or even non-live data sources, but which produce data in chunks

People with experience in pandas tend to believe that the code above will have a similar effect as this:

df = pandas.read_csv(....)

And the answer is: No. That reads an entire file and creates a fixed size dataframe with which one can do many things. But such a construct will not support, for example, live streams.

Note

Even if not loaded initially as shown above, backtrader will immediately pre-load all the available data, if possible, once the system is set in motion, if the data source supports it. For a CSV file, like the pandas example above, that will be always be the case

Loading CSV Data Feeds

Let us start with loading data from some CSV files. This should be the easiest for anyone because:

  • The format can be read by a human being
  • Even if not a formal standard, it is a de-facto standard, even if the format is not a fixed format
  • backtrader offers a customizable GenericCSVData which should be able to load (almost) any CSV file out there

First and foremost and being a de-facto standard, the format has usually these characteristics and order for the components:

  • An initial line containing a header of textual representations, indicating what each column is

  • date (with or without time): this is usually a textual representation of the date string but can be also an integer or float representing a timestamp (like a Unix timestamp counting the seconds elapsed since Jan 1st, 1970)

  • time: if not present in date. A textual representatio of the time of day.

  • open a float representing the first price

  • high a float representing the highest price

  • low a float representing the lowest price

  • close: a float representing the last price

  • volume: an integer or float representing the volume negotiated

  • openinterest: an integer representing the open positions (for futures)

The order of the components follow the OHLC convention to name the prices of a candlestick or bar.

With that in mind, the GenericCSVData feed provided by backtrader expects the following as a default:

  • An initial header line which will be ignored

  • A date field at position 0 with the default format: %Y-%m-%d, something like 2010-01-01

    For other formats, the Python documentation for the datetime.strptime (strptime => string parsed (to) time) can be consulted:

  • The open, high, low, close, volume and openinterest fields follow at positions 1, 2, 3, 4, 5 and 6

Let us see some examples. The boilerplate surrounding the code will not be shown to focus on just loading and then adding the data to system to have it at the disposal of a trading strategy later.

Daily data with an extra OpenInterest field

Date,Open,High,Low,Close,Volume,OpenInterest
2006-01-02,3578.73,3605.95,3578.73,3604.33,0,0
2006-01-03,3604.08,3638.42,3601.84,3614.34,0,0

This can loaded and added into backtrader with the following code

data = bt.feeds.CSVGenericData(dataname=thefilename)
cerebro.adddata(data)

This format matches the default configuration of the GenericCSVData feed and therefore the only thing to provide is the filename as the named argument dataname (this named argument is common to all data feeds in backtrader)

Daily with OHLC and Volume

Date,Open,High,Low,Close,Volume
2006-01-02,3578.73,3605.95,3578.73,3604.33,0
2006-01-03,3604.08,3638.42,3601.84,3614.34,0

The format simply removes the the OpenInterest field.

data = bt.feeds.CSVGenericData(dataname=thefilename, openinterest=-1)
cerebro.adddata(data)

The openinterest=-1 tells the data feed not to look for that field.

Intraday with single datetime field

Datetime,Open,High,Low,Close,Volume
2006-01-02T09:01:00,3602.00,3603.00,3597.00,3599.00,5699,0
2006-01-02T09:02:00,3600.00,3601.00,3598.00,3599.00,894,0

The number of fields with respect to the previous example has not changed, but the format of the timestamp has.

data = bt.feeds.CSVGenericData(
    dataname=thefilename,
    dtformat='%Y-%m-%dT%H:%M:%S',
    openinterest=-1,
    timeframe=bt.TimeFrame.Minutes,
    compression=1,
)
cerebro.adddata(data)

Changes:

  • dtformat is changed to %Y-%m-%dT%H:%M:%S to match the content of the data which is 2016-01-02T09:01:00

  • Addition of the timeframe and compression named arguments to indicate that the field is no longer a daily one but a minute one, with 1 minute per bar.

Note

It may be obvious for human beings, that the data feed is a 1-minute one by just looking at the first few lines of the CSV file. But the platform cannot know it. Guessing it (and not failing in the attempt) would mean scanning the entire file. And it would have to account for when the markets are closed (overnight and during the weekends) to discard those long gaps in the data.

Intraday with different fields for date and time

Date,Time,Open,High,Low,Close,Volume
2006-01-02,09:01:00,3602.00,3603.00,3597.00,3599.00,5699,0
2006-01-02,09:02:00,3600.00,3601.00,3598.00,3599.00,894,0

In this case the presence of the extra time field means a shift for all other fields, hence the need to be a lot more verbose

data = bt.feeds.CSVGenericData(
    dataname=thefilename,
    time=1,
    tmformat='%H:%M:%S',
    open=2,
    high=3,
    low=4,
    close=5,
    volume=6,
    openinterest=-1,
    timeframe=bt.TimeFrame.Minutes,
    compression=1,
)
cerebro.adddata(data)

The changes:

  • Specify time=1 to say that the time field is present

  • Set dtformat to %H:%M:%S to match the format in the file

  • Increase the default positions for open, high, low, close and volume by 1, i.e.: offset of 1 with regards to the default, due to the presence of the time field

Intraday with a numeric timestamp

Timestamp,Open,High,Low,Close,Volume
1554897600000,5197.01,5245.0,5195.33,5234.42,1635.59772
1554901200000,5234.42,5246.39,5216.17,5231.85,1342.13614

This is when one needs to have a clear understanding of what actually the timestamp contains, because unless it is a Unix-time timestamp, it will need some processing.

The GenericCSVData data feed in backtrader features a nice trick that allows to parse a datetime by specifying the following for the dtformat named argument:

  • 1: The value is a Unix timestamp of type int representing the number of seconds since Jan 1st, 1970

  • 2: The value is a Unix timestamp of type float representing the number of seconds since Jan 1st, 1970

  • callable: If anything callable in Python, function, method, lambda, an class instance with a __call__ method is given as dtformat, the callable will be executed with the value to parse and the returned value will be used as the timestamp

In the case above, let us see what happens if we pass dtformat=1 as in:

data = bt.feeds.GenericCSVData(
    dataname=thefilename,
    dtformat=1,
    openinterest=-1,
)
cerebro.adddata(data)

The actual output would have been:

...
    self._dtconvert = lambda x: datetime.utcfromtimestamp(int(x))
OSError: [Errno 22] Invalid argument

Although one could imagine that backtrader is broker, the message is actually telling us, that the Standard Library method datetime.utcfromtimestamp cannot actually cope with the input (for example 1554897600000)

If you actually calculated the number of seconds elapsed from the Unix Epoch, the number of digits would be 10 and the timestamp from the sample above has 13. This is a clear hint that shows:

  • milliseconds have been appended to the timestamp.

Note

The Unix Epoch counter is a 32 bit counter and cannot therefore have more than 10 digits in decimal format.

Because a callable can be used, let us use one and remove the last 3 digits:

data = bt.feeds.GenericCSVData(
    dataname=thefilename,
    dtformat=(lambda x: datetime.utcfromtimestamp(int(x) / 1000)),
    timeframe=bt.TimeFrame.Minutes,
    compression=60,
    openinterest=-1,
)
cerebro.adddata(data)

And the output is now:

2019-04-10 12:00:00
2019-04-10 13:00:00

Note

The avid reader has for sure noticed the introduction of

    timeframe=bt.TimeFrame.Minutes,
    compression=60,

when loading the data feed with the callable. Had it not been done so, the output would have been:

2019-04-10 23:59:59.999989
2019-04-10 23:59:59.999989

i.e.: instead of having the 60-minutes timestamps, they would be daily. Remember that the default configuration for any backtrader feed is to consider a 1-day timeframe/compression combination. Because of this and not having more information, the timestamp is pushed to the end-of-the-day. Although this may not seem useful, it actually is when one is mixing intraday and daily timeframes, because the daily timeframe has to always happen after the intraday bars.

Had it not beeen known in advance that the timeframe was 60-minutes, one could have experimented by setting it to timeframe=bt.TimeFrame.Minutes,compression=1 and have seen the output, to conclude what the actually the timestamp meant.

Note

The Standard Library datetime module has no notion of end-of-day as a time. The best than backtrader can do is to push the bar to the closest thing to midnight, without getting into the next day