Rebalancing with the Conservative Formula

The Conservative Formula approach is presented in this paper: The Conservative Formula in Python: Quantitative Investing made Easy

It is one many possible rebalancing approaches, but one that is easy to grasp. A summary of the approach:

x stocks are selected from a universe of Y (100 of 1000)
The selection criteria are
- Low volatility
- High Net Payout Yield
- High Momentum
- Rebalancing every month

With this in mind let's go and present a possible implementation in backtrader

The data

Even if one has a winning strategy, nothing will be actually won if no data is available for the strategy. Which means that it has to be considered how the data looks like and how to load it.

A set of CSV ("comma-separated-values") files is assumed to be available, containing with the following features

ohlcv monthly data
With an extra field after the v containing the Net Payout Yield (npy), to have an ohlcvn data set.

The format of the CSV data will therefore look like this

date, open, high, low, close, volume, npy
2001-12-31, 1.0, 1.0, 1.0, 1.0, 0.5, 3.0
2002-01-31, 2.0, 2.5, 1.1, 1.2, 3.0, 5.0
...

I.e.: one row per month. The data loader engine can now be prepared for which simple extension of the generic built-in CSV loader delivered with backtrader will be created.

class NetPayOutData(bt.feeds.GenericCSVData):
    lines = ('npy',)  # add a line containing the net payout yield
    params = dict(
        npy=6,  # npy field is in the 6th column (0 based index)
        dtformat='%Y-%m-%d',  # fix date format a yyyy-mm-dd
        timeframe=bt.TimeFrame.Months,  # fixed the timeframe
        openinterest=-1,  # -1 indicates there is no openinterest field
    )

And that is. Notice how easy has been to add a point of fundamental data to the ohlcv data stream.

By using the expresion lines=('npy',). The other usual fields (open, high, ...) are already part of GenericCSVData
By indicating the loading position with the params = dict(npy=6). The other fields have a predefined position.

The timeframe has also been updated in the parameters to reflect the monthly nature of the data.

Note

See Docs - Data Feeds Reference - GenericCSVData for the actual fields and loading positions (which can all be customized)

The data loader will have to be properly instantiated with a file name, but that's something for later, when a standard boilerplate is presented below to have a complete script.

The Strategy

Let's put the logic into a standard backtrader strategy. To make it as generic and customizable as possible, the same same params approach will be used, as it was used before with the data.

Before delving into the strategy, let's consider one of the points from the quick summary

x stocks are selected from a universe of Y

The strategy itself is not in charge of adding stocks to the universe, but it is in charge of the selection. One could be in a situation in which only 50 stocks have been added and still try to select 100 if x and Y are fixed in the code. To cope with such situations, the following will be done:

Have a selperc parameter with a value of 0.10 (i.e.: 10%), to indicate the amount of stocks to be selected from the universe.

This means that if 1000 are present, only 100 will be selected and if the universe consist of 50 stocks, only 5 will be selected.

As for the formula ranking the stock, it looks like this:

(momentum * net payout) / volatility

Which means that those with higher momentum, higher payout and lower volatility will have a higher score.

For momentum the RateOfChange indicator (aka ROC) will be used, which measures the ratio of change in prices over a period.

The net payout is already part of the data feed.

To calculate the volatility, the StandardDeviation of the n-periods return of the stock (n-periods, because things will be kept as parameters) will be used.

With this information, the strategy can already be initialize with the right parameters and the setup of the indicators and calculations which will be later used in each monthly iteration.

First the declaration and the parameters

class St(bt.Strategy):
    params = dict(
        selcperc=0.10,  # percentage of stocks to select from the universe
        rperiod=1,  # period for the returns calculation, default 1 period
        vperiod=36,  # lookback period for volatility - default 36 periods
        mperiod=12,  # lookback period for momentum - default 12 periods
        reserve=0.05  # 5% reserve capital
    )

Notice that something not mentioned above has been added, and that is a parameter reserve=0.05 (i.e. 5%), which is used to calculated the percentage allocation per stock, keeping a reserve capital in the bank. Although for a simulation one could conceivable want to use 100% of the capital, one can hit the usual problems doing that, such as price gaps, floating point precision and end up missing some of the market entries.

Before anything else, a small logging method is created, which will allow to log how the portfolio is rebalanced.

    def log(self, arg):
        print('{} {}'.format(self.datetime.date(), arg))

At the beginning of the __init__ method, the number of stocks to rank is calculated and the reserve capital parameter is applied to determine the per stock percentage of the bank.

    def __init__(self):
        # calculate 1st the amount of stocks that will be selected
        self.selnum = int(len(self.datas) * self.p.selcperc)

        # allocation perc per stock
        # reserve kept to make sure orders are not rejected due to
        # margin. Prices are calculated when known (close), but orders can only
        # be executed next day (opening price). Price can gap upwards
        self.perctarget = (1.0 - self.p.reserve) % self.selnum

And finally the initialization is over with the calculation of the per stock indicators for volatility and momentum, which are then applied in the per stock ranking formula calculation.

        # returns, volatilities and momentums
        rs = [bt.ind.PctChange(d, period=self.p.rperiod) for d in self.datas]
        vs = [bt.ind.StdDev(ret, period=self.p.vperiod) for ret in rs]
        ms = [bt.ind.ROC(d, period=self.p.mperiod) for d in self.datas]

        # simple rank formula: (momentum * net payout) / volatility
        # the highest ranked: low vol, large momentum, large payout
        self.ranks = {d: d.npy * m / v for d, v, m in zip(self.datas, vs, ms)}

It's now time to iterate each month. The ranking is available in the self.ranks dictionary. The key/value pairs have to be sorted for each iteration, to get which items have to go and which ones have to be part of the portfolio (remain or be added)

    def next(self):
        # sort data and current rank
        ranks = sorted(
            self.ranks.items(),  # get the (d, rank), pair
            key=lambda x: x[1][0],  # use rank (elem 1) and current time "0"
            reverse=True,  # highest ranked 1st ... please
        )

The iterable is sorted in reverse order, because the ranking formula delivers higher scores for the highest ranked stocks.

Rebalancing is now due.

Rebalancing 1: Get Top Ranked and the stocks with open positions

        # put top ranked in dict with data as key to test for presence
        rtop = dict(ranks[:self.selnum])

        # For logging purposes of stocks leaving the portfolio
        rbot = dict(ranks[self.selnum:])

A bit of Python trickery is happening here, because a dict is being used. The reason is that if the top ranked stocks were put in a list the operator == would be used internally by Python to check for presence with the operator in. And although improbable it would be possible for two stocks to have the same value on the same day. When using a dict a hash value is used when checking for presence of an item as part of the keys.

Note: For logging purposes rbot (ranked bottom) is also created with the stocks not present in rtop.

To later discriminate between stocks that have to leave the portfolio, those which simply have to be rebalanced and the newly top ranked, a current list of stocks in the portfolio is prepared.

        # prepare quick lookup list of stocks currently holding a position
        posdata = [d for d, pos in self.getpositions().items() if pos]

Rebalancing 2: Sell those no longer top ranked

Just like in real world, in the backtrader ecosystem selling before buying is a must to ensure enough cash is there.

        # remove those no longer top ranked
        # do this first to issue sell orders and free cash
        for d in (d for d in posdata if d not in rtop):
            self.log('Exit {} - Rank {:.2f}'.format(d._name, rbot[d][0]))
            self.order_target_percent(d, target=0.0)

Stocks currently with an open position and no longer top ranked are sold (i.e. target=0.0).

Note

A simple self.close(data) would have sufficed here, rather than explicitly stating the target percentage.

Rebalancing 3: Issue a target order for all top ranked stocks

The total portfolio value changes over time and those stocks already in the portfolio may have to slightly increase/reduce the current position to match the expected percentage. order_target_percent is an ideal method to enter the market, because it does automatically calculate whether a buy or a sell order is needed.

        # rebalance those already top ranked and still there
        for d in (d for d in posdata if d in rtop):
            self.log('Rebal {} - Rank {:.2f}'.format(d._name, rtop[d][0]))
            self.order_target_percent(d, target=self.perctarget)
            del rtop[d]  # remove it, to simplify next iteration

Rebalancing the stocks already with a position is done before adding the new ones to the portfolio, as the new one will only issue buy orders and consume cash. Having removed the existing stocks from with rtop[data].pop() after having re-balanced, the remaining stocks in rtop are those which will be newly added to the portfolio.

        # issue a target order for the newly top ranked stocks
        # do this last, as this will generate buy orders consuming cash
        for d in rtop:
            self.log('Enter {} - Rank {:.2f}'.format(d._name, rtop[d][0]))
            self.order_target_percent(d, target=self.perctarget)

Running it all and Evaluating it!

Having a data loader class and the strategy is not enough. Just like with any other framework, some boilerplate is needed. The following code makes it possible.

def run(args=None):
    args = parse_args(args)

    cerebro = bt.Cerebro()

    # Data feed kwargs
    dkwargs = dict(**eval('dict(' + args.dargs + ')'))

    # Parse from/to-date
    dtfmt, tmfmt = '%Y-%m-%d', 'T%H:%M:%S'
    if args.fromdate:
        fmt = dtfmt + tmfmt * ('T' in args.fromdate)
        dkwargs['fromdate'] = datetime.datetime.strptime(args.fromdate, fmt)

    if args.todate:
        fmt = dtfmt + tmfmt * ('T' in args.todate)
        dkwargs['todate'] = datetime.datetime.strptime(args.todate, fmt)

    # add all the data files available in the directory datadir
    for fname in glob.glob(os.path.join(args.datadir, '*')):
        data = NetPayOutData(dataname=fname, **dkwargs)
        cerebro.adddata(data)

    # add strategy
    cerebro.addstrategy(St, **eval('dict(' + args.strat + ')'))

    # set the cash
    cerebro.broker.setcash(args.cash)

    cerebro.run()  # execute it all

    # Basic performance evaluation ... final value ... minus starting cash
    pnl = cerebro.broker.get_value() - args.cash
    print('Profit ... or Loss: {:.2f}'.format(pnl))

Where the following is done:

Parsing arguments and have this available (this is obviously optional, as everything can be hardcoded, but good practices are good practices)
Creating a cerebro engine instance. Yes, this is Spanish for "brain" and is the part of the framework in charge of coordinating the orchestral maneuvers in the dark. Although it can accept several options, the defaults should suffice for most use cases.
Loading the data files, which is done with a simple directory scan of args.datadir is done and all files are loaded with NetPayOutData and added to the cerebro instance
Adding the strategy
Setting the cash, which defaults to 1,000,000. Given that the use case is for 100 stocks in a universe of 500, it seems fair to have some cash to spare. It is also an argument which can be changed.
And calling cerebro.run()
Finally the performance is evaluated

To make it possible to run things with different parameters straight from the command line, an argparse enabled boilerplate is presented below, with the entire code

Performance Evaluation

A naive performance evaluation added in the form of the final resulting value, i.e.: the final net asset value minus the starting cash.

The backtrader ecosystem offers a set of built-in performance analyzers which could also be used, like: SharpeRatio, Variability-Weighted Return, SQN and others. See Docs - Analyzers Reference

The complete script

And finally the bulk of the work presented as whole. Enjoy!

import argparse
import datetime
import glob
import os.path

import backtrader as bt


class NetPayOutData(bt.feeds.GenericCSVData):
    lines = ('npy',)  # add a line containing the net payout yield
    params = dict(
        npy=6,  # npy field is in the 6th column (0 based index)
        dtformat='%Y-%m-%d',  # fix date format a yyyy-mm-dd
        timeframe=bt.TimeFrame.Months,  # fixed the timeframe
        openinterest=-1,  # -1 indicates there is no openinterest field
    )


class St(bt.Strategy):
    params = dict(
        selcperc=0.10,  # percentage of stocks to select from the universe
        rperiod=1,  # period for the returns calculation, default 1 period
        vperiod=36,  # lookback period for volatility - default 36 periods
        mperiod=12,  # lookback period for momentum - default 12 periods
        reserve=0.05  # 5% reserve capital
    )

    def log(self, arg):
        print('{} {}'.format(self.datetime.date(), arg))

    def __init__(self):
        # calculate 1st the amount of stocks that will be selected
        self.selnum = int(len(self.datas) * self.p.selcperc)

        # allocation perc per stock
        # reserve kept to make sure orders are not rejected due to
        # margin. Prices are calculated when known (close), but orders can only
        # be executed next day (opening price). Price can gap upwards
        self.perctarget = (1.0 - self.p.reserve) / self.selnum

        # returns, volatilities and momentums
        rs = [bt.ind.PctChange(d, period=self.p.rperiod) for d in self.datas]
        vs = [bt.ind.StdDev(ret, period=self.p.vperiod) for ret in rs]
        ms = [bt.ind.ROC(d, period=self.p.mperiod) for d in self.datas]

        # simple rank formula: (momentum * net payout) / volatility
        # the highest ranked: low vol, large momentum, large payout
        self.ranks = {d: d.npy * m / v for d, v, m in zip(self.datas, vs, ms)}

    def next(self):
        # sort data and current rank
        ranks = sorted(
            self.ranks.items(),  # get the (d, rank), pair
            key=lambda x: x[1][0],  # use rank (elem 1) and current time "0"
            reverse=True,  # highest ranked 1st ... please
        )

        # put top ranked in dict with data as key to test for presence
        rtop = dict(ranks[:self.selnum])

        # For logging purposes of stocks leaving the portfolio
        rbot = dict(ranks[self.selnum:])

        # prepare quick lookup list of stocks currently holding a position
        posdata = [d for d, pos in self.getpositions().items() if pos]

        # remove those no longer top ranked
        # do this first to issue sell orders and free cash
        for d in (d for d in posdata if d not in rtop):
            self.log('Leave {} - Rank {:.2f}'.format(d._name, rbot[d][0]))
            self.order_target_percent(d, target=0.0)

        # rebalance those already top ranked and still there
        for d in (d for d in posdata if d in rtop):
            self.log('Rebal {} - Rank {:.2f}'.format(d._name, rtop[d][0]))
            self.order_target_percent(d, target=self.perctarget)
            del rtop[d]  # remove it, to simplify next iteration

        # issue a target order for the newly top ranked stocks
        # do this last, as this will generate buy orders consuming cash
        for d in rtop:
            self.log('Enter {} - Rank {:.2f}'.format(d._name, rtop[d][0]))
            self.order_target_percent(d, target=self.perctarget)


def run(args=None):
    args = parse_args(args)

    cerebro = bt.Cerebro()

    # Data feed kwargs
    dkwargs = dict(**eval('dict(' + args.dargs + ')'))

    # Parse from/to-date
    dtfmt, tmfmt = '%Y-%m-%d', 'T%H:%M:%S'
    if args.fromdate:
        fmt = dtfmt + tmfmt * ('T' in args.fromdate)
        dkwargs['fromdate'] = datetime.datetime.strptime(args.fromdate, fmt)

    if args.todate:
        fmt = dtfmt + tmfmt * ('T' in args.todate)
        dkwargs['todate'] = datetime.datetime.strptime(args.todate, fmt)

    # add all the data files available in the directory datadir
    for fname in glob.glob(os.path.join(args.datadir, '*')):
        data = NetPayOutData(dataname=fname, **dkwargs)
        cerebro.adddata(data)

    # add strategy
    cerebro.addstrategy(St, **eval('dict(' + args.strat + ')'))

    # set the cash
    cerebro.broker.setcash(args.cash)

    cerebro.run()  # execute it all

    # Basic performance evaluation ... final value ... minus starting cash
    pnl = cerebro.broker.get_value() - args.cash
    print('Profit ... or Loss: {:.2f}'.format(pnl))


def parse_args(pargs=None):
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
        description=('Rebalancing with the Conservative Formula'),
    )

    parser.add_argument('--datadir', required=True,
                        help='Directory with data files')

    parser.add_argument('--dargs', default='',
                        metavar='kwargs', help='kwargs in k1=v1,k2=v2 format')

    # Defaults for dates
    parser.add_argument('--fromdate', required=False, default='',
                        help='Date[time] in YYYY-MM-DD[THH:MM:SS] format')

    parser.add_argument('--todate', required=False, default='',
                        help='Date[time] in YYYY-MM-DD[THH:MM:SS] format')

    parser.add_argument('--cerebro', required=False, default='',
                        metavar='kwargs', help='kwargs in k1=v1,k2=v2 format')

    parser.add_argument('--cash', default=1000000.0, type=float,
                        metavar='kwargs', help='kwargs in k1=v1,k2=v2 format')

    parser.add_argument('--strat', required=False, default='',
                        metavar='kwargs', help='kwargs in k1=v1,k2=v2 format')

    return parser.parse_args(pargs)


if __name__ == '__main__':
    run()