Beating The Random Entry

There have been recently some threads on reddit/r/algotrading about the success in replicating published algotrading strategies. First

I've reproduced 130+ research papers about "predicting the stock market", coded them from scratch and recorded the results. Here's what I've learnt

Because it has been deleted here a quick summary:
- The strategies do not work
- If the authors claim a given strategy stopped working due to alpha decay, the tests were run against past data and it still did not work
- Bottomline: It's all overfitting, p-hacking or a tiny alpha which doesn't need to decay because commissions do already destroy the alpha.

A replica to the replication problem was written by Artem Kaznatcheev in:

Allegory of the replication crisis in algorithmic trading

Followed by:

Overfitting

The first two were theoretical (even if the first mentioned having implemented the 130 strategies), with "Overfitting" offering actual code.

With so much going on, what about trying to replicate something which was published but not as a paper, and like in the "overfitting" case, with a practical approach. Something published in a well known book.

The target: "Trying to Beat Random Entry". This is a section in Part 3, Chapter 8 of this book:

A structured approach to engage into algotrading is presented in the book with a special emphasis on: position sizing and management of the position (i.e.: when to actually exit a trade). This being a lot more important than, for example, the entry setup, which apparently most people consider the main driver.

In Chapter 8, Van K. Tharp talks to Tom Basso and says: "From what you are saying it sounds like you could make money consistently with a random entry as long as you have good exits and size your positions intelligently.” to which the answer was that he probably could.

Rules:

Coin-Flip based entry
Always in the market - Long or Short
Reentry as soon as the exit signal was given
Volatility of the market is determined by a 10-day Exponential Moving Average of the Average True Range
Trailing stop at distance from the close 3-times the volatility
The stop could only move in the direction of the trade
Fixed Stake (1 contract) or 1% risk model (Chapter 12 in the book)

Results

Tested against 10 markets
Fixed Stake: made money 80% of the time
1% Risk Model: made money 100% of the time
Reliability level: 38% (percentage of winning trades)

The missing bits:

Tested markets
Tested periods
If always in the market means closing the trade "today" and re-entering "tomorrow" or if it means issuing both a the closing/re-opening orders at the same time.

This is actually the easiest to overcome.

For the last two items, the book says the conversation took place in 1991 and futures were used. To be fair to the book, futures data from before 1991 will be used. 1-day price bars are also assumed, given the mention of the 10-day Exponential Moving Average

The most obvious problem seems to get the algorithm right, but in this case the book does a good job at describing the simple algorithm and the results. To complete it, let's summarize the "Percentage Risk Model" (aka "Model 3" in the book) from Chapter 12.

Maximal Loss: limited to x% of the account value (i.e.: the percentage risk)
Risk per Contract : with the given algorithm, it will be the initial stop distance (3 times the volatility) multiplied by the multiplier of the future
Amount of contracts: Maximal Loss / Risk Per Contract

The replication details

The data

The CL (Crude Oil) futures data from 1985 to 1990 (i.e.: 6 full years) will be used. The contract specification has a:

Tick size: 0.01 (i.e.: 100 ticks per point)
Cost per tick: $10

With this in mind, we will use a multiplier of 1000 per 1 point (100 ticks/point x 10 dollars/tick = 1000 dollars)

Commission

A 2.00 monetary units per contract per trip will be used (IB-like)

Some implementation Details

The coin flip is modeled as an indicator, to make it easy to visualize where the flips have taken place (and if several entries, for example, go in the same direction, which is to be expected from randomness)

To also have a good visualization of the stop and how it moves, the stop price calculation and logic are also embedded in a indicator. Notice that the stop calculation logic has two different stages

When a trade starts the stop price has to be set at the given distance from the close with no relation to previous stop prices
When a trade is running, the stop price is adjusted if possible to follow the trend.

Charting

The code produces two types of charts

A chart containing the details a single test run (--plot) option. It makes most sense to use it when running a single iteration (--iterations 1)
A scatter chart showing the Profit and Loss of the runs.

Sample of #1

Sample of #2 for 10 runs

Sample invocations of the script

A single run with fixed size stake of 1 and plotting

./vanktharp-coinflip.py --years 1985-1990 --fixedsize --sizer stake=1 --iterations 1 --plot

**** Iteration:    1
-- PNL: 10482.00
--   Trades 49 - Won 22 - %_Won: 0.45
**** Summary of Runs
-- Total       :        1
-- Won         :        1
-- % Won       : 1.00
**** Summary of Trades
-- Total       :       49
-- Total Won   :       22
-- % Total Won : 0.45

100 runs with a 1% risk model, 10 iterations and a scatter chart

(The output is shortened for practical purposes)

$ ./vanktharp-coinflip.py --years 1985-1990 --percrisk --sizer percrisk=0.01 --iterations 100 --scatter
**** Iteration:    1
-- PNL: -18218.00
--   Trades 60 - Won 24 - %_Won: 0.40
**** Iteration:    2
...
...
**** Iteration:  100
-- PNL: 111366.00
--   Trades 50 - Won 26 - %_Won: 0.52
**** Summary of Runs
-- Total       :      100
-- Won         :       50
-- % Won       : 0.50
**** Summary of Trades
-- Total       :     5504
-- Total Won   :     2284
-- % Total Won : 0.41

Test Runs Mix

10 test runs of 100 iterations were done, mixing the following variables:

Fixed size stake of 1 or percentage risk model with 1%
Executing entry/exit in the same bar or in consecutive bars

Summary of results

In average, 49% of the runs made a profit. Fixed size remained close to 50% in the test runs and the percentage risk model varied more with a test run scoring a minimum of 39% profitable runs and another scoring a maximum of 65% profitable runs (out of 10)
In average, 39% of the trades were profitable (small deviation)

Recalling what the book said:

80% of profitable runs when using a fixed size stake of 1
100% of profitable runs with the 1% percentage risk model
38% of profitable trades

It would therefore seem that:

Only the last of the items was replicated.

Conclusion

As pointed out by Artem Kaznatcheev, the replication crisis can be due to:

Using the wrong data set
Failing to correctly implement the algorithm

Or it could well be that the original implementation did not follow its own rules or that not all details were published.

Note

In any case I do still personally recommend reading the book. Failing to replicate that specific case does not mean the book is not a good read and shows a practical approach at algotrading.

The complete script

Enjoy!

Code Also available at:

https://gist.github.com/mementum/364173371634b226120dc39708cd9610

#!/usr/bin/env python
# -*- coding: utf-8; py-indent-offset:4 -*-
###############################################################################
# Copyright (C) 2019 Daniel Rodriguez - MIT License
#  - https://opensource.org/licenses/MIT
#  - https://en.wikipedia.org/wiki/MIT_License
###############################################################################
import argparse
import random

import pandas as pd

import backtrader as bt


def read_dataframe(filename, years):
    colnames = ['ticker', 'period', 'date', 'time',
                'open', 'high', 'low', 'close', 'volume', 'openinterest']

    colsused = ['date',
                'open', 'high', 'low', 'close', 'volume', 'openinterest']

    df = pd.read_csv(filename,
                     skiprows=1,  # using own column names, skip header
                     names=colnames,
                     usecols=colsused,
                     parse_dates=['date'],
                     index_col='date')

    if years:  # year or year range specified
        ysplit = years.split('-')

        # left side limit
        mask = df.index >= ((ysplit[0] or '0001') + '-01-01')  # support -YYYY

        # right side liit
        if len(ysplit) > 1:  # multiple or open ended (YYYY-ZZZZ or YYYY-)
            if ysplit[1]:  # open ended if not years[1] (YYYY- format)
                mask &= df.index <= (ysplit[1] + '-12-31')
        else:  # single year specified YYYY
            mask &= df.index <= (ysplit[0] + '-12-31')

        df = df.loc[mask]  # select the given date range

    return df


# DEFAULTS - CAN BE CHANGED VIA COMMAND LINE OPTIONS
COMMINFO_DEFAULT = dict(
    stocklike=False,  # Futures-like
    commtype=bt.CommissionInfo.COMM_FIXED,  # fixed price per asset
    commission=2.0,  # Standard IB Price for futures
    mult=1000.0,  # multiplier
    margin=2000.0,  # $50 x 50 => $2500
)


class PercentRiskSizer(bt.Sizer):
    '''Sizer modeling the Percentage Risk sizing model of Van K. Tharp'''
    params = dict(percrisk=0.01)  # 1% percentage risk

    def _getsizing(self, comminfo, cash, data, isbuy):
        # Risk per 1 contract
        risk = comminfo.p.mult * self.strategy.stoptrailer.stop_dist[0]
        # % of account value to risk
        torisk = self.broker.get_value() * self.p.percrisk
        return torisk // risk  # size to risk


class CoinFlip(bt.Indicator):
    lines = ('coinflip',)
    HEAD, TAIL = 1, 0

    def next(self):
        self.l.coinflip[0] = 0.5  # midway
        pass

    def flip(self):
        # self.l.coinflip[0] = cf = random.randrage(-1, 2, 2)  # -1 or 1
        self.l.coinflip[0] = cf = random.randint(0, 1)
        return cf

    def head(self, val=None):
        if val is None:
            return self.lines[0] == self.HEAD

        return val == self.HEAD


class StopTrailer(bt.Indicator):
    _nextforce = True  # force system into step by step calcs

    lines = ('stop_long', 'stop_short',)
    plotinfo = dict(subplot=False, plotlinelabels=True)

    params = dict(
        atrperiod=14,
        emaperiod=10,
        stopfactor=3.0,
    )

    def __init__(self):
        self.strat = self._owner  # alias for clarity

        # Volatility which determines stop distance
        atr = bt.ind.ATR(self.data, period=self.p.atrperiod)
        emaatr = bt.ind.EMA(atr, period=self.p.emaperiod)
        self.stop_dist = emaatr * self.p.stopfactor

        # Running stop price calc, applied in next according to market pos
        self.s_l = self.data - self.stop_dist
        self.s_s = self.data + self.stop_dist

    def next(self):
        # When entering the market, the stop has to be set
        if self.strat.entering > 0:  # entering long
            self.l.stop_long[0] = self.s_l[0]
        elif self.strat.entering < 0:  # entering short
            self.l.stop_short[0] = self.s_s[0]

        else:  # In the market, adjust stop only in the direction of the trade
            if self.strat.position.size > 0:
                self.l.stop_long[0] = max(self.s_l[0], self.l.stop_long[-1])
            elif self.strat.position.size < 0:
                self.l.stop_short[0] = min(self.s_s[0], self.l.stop_short[-1])


class St1(bt.Strategy):
    SHORT, NONE, LONG = -1, 0, 1

    params = dict(
        atrperiod=14,  # measure volatility over x days
        emaperiod=10,  # smooth out period for atr volatility
        stopfactor=3.0,  # actual stop distance for smoothed atr
        verbose=False,  # print out debug info
        samebar=True,  # close and re-open on samebar
    )

    def __init__(self):
        self.coinflip = CoinFlip()

        # Trailing Stop Indicator
        self.stoptrailer = st = StopTrailer(atrperiod=self.p.atrperiod,
                                            emaperiod=self.p.emaperiod,
                                            stopfactor=self.p.stopfactor)

        # Exit Criteria (Stop Trail) for long / short positions
        self.exit_long = bt.ind.CrossDown(self.data,
                                          st.stop_long, plotname='Exit Long')
        self.exit_short = bt.ind.CrossUp(self.data,
                                         st.stop_short, plotname='Exit Short')

    def start(self):
        self.entering = 0
        self.start_val = self.broker.get_value()

    def stop(self):
        self.stop_val = self.broker.get_value()
        self.pnl_val = self.stop_val - self.start_val
        self.log('Start Value: {:.2f}', self.start_val)
        self.log('Final Value: {:.2f}', self.stop_val)
        self.log('PNL   Value: {:.2f}', self.pnl_val)

    def notify_trade(self, trade):
        if trade.size > 0:
            self.log('Long  Entry at: {:.2f}', trade.price)
        elif trade.size < 0:
            self.log('Short Entry at: {:.2f}', trade.price)
        else:  # not trade.size - trade is over
            self.log('Trade PNL: {:.2f}', trade.pnlcomm)

    def next(self):
        self.logdata()

        # logic
        closing = None
        if self.position.size > 0:  # In the market - Long
            self.log('Long Stop Price: {:.2f}', self.stoptrailer.stop_long[0])
            if self.exit_long:
                closing = self.close()

        elif self.position.size < 0:  # In the market - Short
            self.log('Short Stop Price {:.2f}', self.stoptrailer.stop_short[0])
            if self.exit_short:
                closing = self.close()

        self.entering = self.NONE
        if not self.position or (closing and self.p.samebar):
            # Not in the market or closing pos and reenter in samebar
            if self.coinflip.flip():
                self.entering = self.LONG if self.buy() else self.NONE
            else:
                self.entering = self.SHORT if self.sell() else self.NONE

    def logdata(self):
        if self.p.verbose:  # logging
            txt = []
            txt += ['{:.2f}'.format(self.position.size)]
            txt += ['{:.2f}'.format(self.data.open[0])]
            txt += ['{:.2f}'.format(self.data.high[0])]
            txt += ['{:.2f}'.format(self.data.low[0])]
            txt += ['{:.2f}'.format(self.data.close[0])]
            self.log(','.join(txt))

    def log(self, txt, *args):
        if self.p.verbose:
            out = [self.datetime.date().isoformat(), txt.format(*args)]
            print(','.join(out))


def runstrat(args):
    cerebro = bt.Cerebro()

    # Data feed kwargs
    dataargs = dict(dataname=read_dataframe(args.data, args.years))
    dataargs.update(eval('dict(' + args.dargs + ')'))
    cerebro.adddata(bt.feeds.PandasData(**dataargs))

    # Strategy
    cerebro.addstrategy(St1, **eval('dict(' + args.strat + ')'))

    # Broker
    brokerargs = dict(cash=args.cash)
    brokerargs.update(eval('dict(' + args.broker + ')'))
    cerebro.broker = bt.brokers.BackBroker(**brokerargs)

    # Commission
    commargs = COMMINFO_DEFAULT
    commargs.update(eval('dict(' + args.commission + ')'))
    cerebro.broker.setcommission(**commargs)

    # Sizer
    szcls = PercentRiskSizer if args.percrisk else bt.sizers.FixedSize
    cerebro.addsizer(szcls, **(eval('dict(' + args.sizer + ')')))

    # Analyze the trades
    cerebro.addanalyzer(bt.analyzers.TradeAnalyzer, _name='trades')

    # Execute
    strats = cerebro.run(**eval('dict(' + args.cerebro + ')'))

    if args.plot:  # Plot if requested to
        cerebro.plot(**eval('dict(' + args.plot + ')'))

    return strats[0]


def run(args=None):
    args = parse_args(args)

    results = []
    sum_won_trades = 0
    sum_total_trades = 0

    for i in range(0, args.iterations):
        strat = runstrat(args)
        pnl = strat.pnl_val
        results.append(pnl)
        trades = strat.analyzers.trades.get_analysis()

        print('**** Iteration: {:4d}'.format(i + 1))
        print('-- PNL: {:.2f}'.format(pnl))
        total_trades = trades.total.closed
        total_won = trades.won.total
        perc_won = total_won / total_trades
        print('--   Trades {} - Won {} - %_Won: {:.2f}'.format(
            total_trades, total_won, perc_won))

        sum_won_trades += total_won
        sum_total_trades += total_trades

    total = len(results)
    won = sum(1 for x in results if x > 0)
    print('**** Summary of Runs')
    print('-- Total       : {:8d}'.format(total))
    print('-- Won         : {:8d}'.format(won))
    print('-- % Won       : {:.2f}'.format(won / total))

    perc_won = sum_won_trades / sum_total_trades
    print('**** Summary of Trades')
    print('-- Total       : {:8d}'.format(sum_total_trades))
    print('-- Total Won   : {:8d}'.format(sum_won_trades))
    print('-- % Total Won : {:.2f}'.format(perc_won))

    if args.scatter:
        import numpy as np
        import matplotlib.pyplot as plt
        x = np.linspace(min(results), max(results), num=len(results))
        y = np.asarray(results)
        plt.scatter(x, y)
        plt.show()


def parse_args(pargs=None):
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
        description='Van K. Tharp/Basso Random Entry Scenario',
    )

    parser.add_argument('--iterations', default=1, type=int,
                        help='Number of iterations to run the system')

    pgroup = parser.add_argument_group(title='Data Options')
    pgroup.add_argument('--data', default='cl-day-001.txt',
                        help='Data to read in')

    pgroup.add_argument('--years', default='',
                        help='Formats: YYYY-ZZZZ / YYYY / YYYY- / -ZZZZ')

    parser.add_argument('--dargs', required=False, default='',
                        metavar='kwargs', help='kwargs in key=value format')

    pgroup = parser.add_argument_group(title='Cerebro Arguments')
    pgroup.add_argument('--cerebro', default='', metavar='kwargs',
                        help='Cerebro kwargs in key=value format')

    pgroup = parser.add_argument_group(title='Commission Arguments')
    pgroup.add_argument('--commission', default=str(COMMINFO_DEFAULT),
                        metavar='kwargs',
                        help='CommInfo kwargs in key=value format')

    pgroup = parser.add_argument_group(title='Broker Arguments')
    pgroup.add_argument('--broker', default='', metavar='kwargs',
                        help='Broker kwargs in key=value format')

    pgroup.add_argument('--cash', default=1000000.0, type=float,
                        help='Default cash')

    pgroup = parser.add_argument_group(title='Strategy Arguments')
    pgroup.add_argument('--strat', default='', metavar='kwargs',
                        help='Strategy kwargs in key=value format')

    pgroup = parser.add_argument_group(title='Sizer Options')
    pgroup.add_argument('--sizer', default='', metavar='kwargs',
                        help='Sizer kwargs in key=value format')

    pgroup = pgroup.add_mutually_exclusive_group()
    pgroup.add_argument('--percrisk', action='store_true',
                        help='Use Percrisk Sizer')

    pgroup.add_argument('--fixedsize', action='store_true',
                        help='Use Fixed Statke Sizer')

    pgroup = parser.add_argument_group(title='Plotting Options')
    pgroup.add_argument('--plot', default='', nargs='?', const='{}',
                        metavar='kwargs', help='kwargs in key=value format')

    pgroup.add_argument('--scatter', action='store_true',
                        help='Plot a scatter diagram of PNL results')

    return parser.parse_args(pargs)


if __name__ == '__main__':
    run()