数据过滤器 – backtrader中文教程
前段时间,Ticket #23 让我想到了在该票的上下文中进行的讨论的潜在改进。
在票中我添加了一个DataFilter
类,但这太复杂了。实际上让人想起内置的复杂性 DataResampler
和DataReplayer
,用于实现同名功能的类。
因此,由于有几个版本,backtrader
支持向 数据馈送添加(如果您愿意,可以filter
调用它)。processor
使用该功能在内部重新实现了重采样和重放,并且一切似乎都不那么复杂(尽管仍然如此)
工作中的过滤器
给定现有的数据馈送/源,您可以使用addfilter
数据馈送的方法:
data = MyDataFeed(name=myname) data.addfilter(filter, *args, **kwargs)
显然,filter
必须符合给定的接口,即:
- 接受此签名的可调用对象:
callable(data, *args, **kwargs)
或者
- 可以实例化和调用的类
- 在实例化过程中,init方法必须支持签名:
def __init__(self, data, *args, **kwargs)
-
- call和 last 方法这个:
def __call__(self, data) def last(self, data)
将为数据源生成的每个数据可调用的调用/实例。
Ticket #23 的更好解决方案
哪张票想要:
- 日内的相对量指标
- 盘中数据可能缺失
- 非交易时间的数据
实施几个过滤器可以缓解回测环境的情况。
过滤掉前/后市场数据
下面的过滤器(已经在 中可用backtrader
)来拯救:
class SessionFilter(with_metaclass(metabase.MetaParams, object)): ''' This class can be applied to a data source as a filter and will filter out intraday bars which fall outside of the regular session times (ie: pre/post market data) This is a "non-simple" filter and must manage the stack of the data (passed during init and __call__) It needs no "last" method because it has nothing to deliver ''' def __init__(self, data): pass def __call__(self, data): ''' Return Values: - False: data stream was not touched - True: data stream was manipulated (bar outside of session times and - removed) ''' if data.sessionstart <= data.datetime.tm(0) <= data.sessionend: # Both ends of the comparison are in the session return False # say the stream is untouched # bar outside of the regular session times data.backwards() # remove bar from data stack return True # signal the data was manipulated
过滤器使用数据中嵌入的会话开始/结束时间来过滤条形图
- 如果新数据的日期时间在会话时间内,
False
则返回表示数据未触及 - 如果数据时间超出范围,则发送数据源会
backwards
有效地擦除最后生成的数据。并True
返回以指示数据流已被操纵。
Tips:调用data.backwards()
可能/可能是低级别的,过滤器应该有一个处理数据流内部的 API
脚本末尾的示例代码可以在有或没有过滤器的情况下运行。第一次运行是 100% 未过滤且未指定会话时间:
$ ./data-filler.py --writer --wrcsv
查看第一天的开始和结束:
=============================================================================== Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len 1,2006-01-02-volume-min-001,1,2006-01-02 09:01:00,3602.0,3603.0,3597.0,3599.0,5699.0,0.0,Strategy,1 2,2006-01-02-volume-min-001,2,2006-01-02 09:02:00,3600.0,3601.0,3598.0,3599.0,894.0,0.0,Strategy,2 ... ... 581,2006-01-02-volume-min-001,581,2006-01-02 19:59:00,3619.0,3619.0,3619.0,3619.0,1.0,0.0,Strategy,581 582,2006-01-02-volume-min-001,582,2006-01-02 20:00:00,3618.0,3618.0,3617.0,3618.0,242.0,0.0,Strategy,582 583,2006-01-02-volume-min-001,583,2006-01-02 20:01:00,3618.0,3618.0,3617.0,3617.0,15.0,0.0,Strategy,583 584,2006-01-02-volume-min-001,584,2006-01-02 20:04:00,3617.0,3617.0,3617.0,3617.0,107.0,0.0,Strategy,584 585,2006-01-02-volume-min-001,585,2006-01-03 09:01:00,3623.0,3625.0,3622.0,3624.0,4026.0,0.0,Strategy,585 ...
会议时间为 2006 年 1 月 2 日09:01:00至 20:04:00 。
现在运行 aSessionFilter
并告诉脚本使用 09:30 和 17:30 作为会话的开始/结束时间:
$ ./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter =============================================================================== Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len 1,2006-01-02-volume-min-001,1,2006-01-02 09:30:00,3604.0,3605.0,3603.0,3604.0,546.0,0.0,Strategy,1 2,2006-01-02-volume-min-001,2,2006-01-02 09:31:00,3604.0,3606.0,3604.0,3606.0,438.0,0.0,Strategy,2 ... ... 445,2006-01-02-volume-min-001,445,2006-01-02 17:29:00,3621.0,3621.0,3620.0,3620.0,866.0,0.0,Strategy,445 446,2006-01-02-volume-min-001,446,2006-01-02 17:30:00,3620.0,3621.0,3619.0,3621.0,1670.0,0.0,Strategy,446 447,2006-01-02-volume-min-001,447,2006-01-03 09:30:00,3637.0,3638.0,3635.0,3636.0,1458.0,0.0,Strategy,447 ...
数据输出现在从 09:30 开始,到 17:30 结束。上市前/上市后数据已被过滤掉。
填写缺失数据
对输出的更深入检查显示以下内容:
... 61,2006-01-02-volume-min-001,61,2006-01-02 10:30:00,3613.0,3614.0,3613.0,3614.0,112.0,0.0,Strategy,61 62,2006-01-02-volume-min-001,62,2006-01-02 10:31:00,3614.0,3614.0,3614.0,3614.0,183.0,0.0,Strategy,62 63,2006-01-02-volume-min-001,63,2006-01-02 10:34:00,3614.0,3614.0,3614.0,3614.0,841.0,0.0,Strategy,63 64,2006-01-02-volume-min-001,64,2006-01-02 10:35:00,3614.0,3614.0,3614.0,3614.0,17.0,0.0,Strategy,64 ...
缺少 10:32 和 10:33 分钟的数据。作为今年的第一个交易日,可能根本没有谈判。或者数据馈送可能未能捕获该数据。
出于 Ticket #23 的目的,并能够将给定分钟的交易量与前一天的同一分钟进行比较,我们将填写缺失的数据。
已经backtrader
有一个SessionFiller
正如预期的那样填充缺失的数据。代码很长并且比过滤器更复杂(完整实现见最后),但让我们看看类/参数定义:
class SessionFiller(with_metaclass(metabase.MetaParams, object)): ''' Bar Filler for a Data Source inside the declared session start/end times. The fill bars are constructed using the declared Data Source ``timeframe`` and ``compression`` (used to calculate the intervening missing times) Params: - fill_price (def: None): If None is passed, the closing price of the previous bar will be used. To end up with a bar which for example takes time but it is not displayed in a plot ... use float('Nan') - fill_vol (def: float('NaN')): Value to use to fill the missing volume - fill_oi (def: float('NaN')): Value to use to fill the missing Open Interest - skip_first_fill (def: True): Upon seeing the 1st valid bar do not fill from the sessionstart up to that bar ''' params = (('fill_price', None), ('fill_vol', float('NaN')), ('fill_oi', float('NaN')), ('skip_first_fill', True))
示例脚本现在可以过滤和填充数据:
./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter --filler ... 62,2006-01-02-volume-min-001,62,2006-01-02 10:31:00,3614.0,3614.0,3614.0,3614.0,183.0,0.0,Strategy,62 63,2006-01-02-volume-min-001,63,2006-01-02 10:32:00,3614.0,3614.0,3614.0,3614.0,0.0,,Strategy,63 64,2006-01-02-volume-min-001,64,2006-01-02 10:33:00,3614.0,3614.0,3614.0,3614.0,0.0,,Strategy,64 65,2006-01-02-volume-min-001,65,2006-01-02 10:34:00,3614.0,3614.0,3614.0,3614.0,841.0,0.0,Strategy,65 ...
分钟 10:32 和 10:33 在那里。该脚本使用最后一个已知的“收盘价”来填充价格值并将交易量和未平仓量字段设置为 0。该脚本接受一个--fvol
参数以将交易量设置为任何值(包括“NaN”)
完成票 #23
有了SessionFilter
和SessionFiller
以下已经完成:
- 未交付前/后市场数据
- 没有数据(在给定的时间范围内)丢失
RelativeVolume
现在不再需要在 Ticket 23 中讨论的“同步”来实现 指标,因为所有日子都有完全相同的柱数(在示例中,从 09:30 到 17:30 的所有分钟都包括在内)
请记住,默认设置是将缺失的交易量设置为0
一个简单的 RelativeVolume
指标,可以开发:
class RelativeVolume(bt.Indicator): csv = True # show up in csv output (default for indicators is False) lines = ('relvol',) params = ( ('period', 20), ('volisnan', True), ) def __init__(self): if self.p.volisnan: # if missing volume will be NaN, do a simple division # the end result for missing volumes will also be NaN relvol = self.data.volume(-self.p.period) / self.data.volume else: # Else do a controlled Div with a built-in function relvol = bt.DivByZero( self.data.volume(-self.p.period), self.data.volume, zero=0.0) self.lines.relvol = relvol
这很聪明,可以通过使用内置辅助来避免被零除 backtrader
。
在脚本的下一次调用中将所有部分放在一起:
./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter --filler --relvol =============================================================================== Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len,RelativeVolume,len,relvol 1,2006-01-02-volume-min-001,1,2006-01-02 09:30:00,3604.0,3605.0,3603.0,3604.0,546.0,0.0,Strategy,1,RelativeVolume,1, 2,2006-01-02-volume-min-001,2,2006-01-02 09:31:00,3604.0,3606.0,3604.0,3606.0,438.0,0.0,Strategy,2,RelativeVolume,2, ...
RelativeVolume
正如预期的那样,该指标在第 1根柱线期间没有产生任何输出 。周期在脚本中计算为:(17:30 – 09:30 * 60) + 1。让我们直接看一下第二天 10:32 和 10:33 的相对交易量如何,假设 1 st天,体积值被填充0
:
... 543,2006-01-02-volume-min-001,543,2006-01-03 10:31:00,3648.0,3648.0,3647.0,3648.0,56.0,0.0,Strategy,543,RelativeVolume,543,3.26785714286 544,2006-01-02-volume-min-001,544,2006-01-03 10:32:00,3647.0,3648.0,3647.0,3647.0,313.0,0.0,Strategy,544,RelativeVolume,544,0.0 545,2006-01-02-volume-min-001,545,2006-01-03 10:33:00,3647.0,3647.0,3647.0,3647.0,135.0,0.0,Strategy,545,RelativeVolume,545,0.0 546,2006-01-02-volume-min-001,546,2006-01-03 10:34:00,3648.0,3648.0,3647.0,3648.0,171.0,0.0,Strategy,546,RelativeVolume,546,4.91812865497 ...
两者都按预期设置0
。
结论
数据源中的filter
机制开启了完全操纵数据流的可能性。谨慎使用。
脚本代码和用法
可作为以下来源的样本backtrader
:
usage: data-filler.py [-h] [--data DATA] [--filter] [--filler] [--fvol FVOL] [--tstart TSTART] [--tend TEND] [--relvol] [--fromdate FROMDATE] [--todate TODATE] [--writer] [--wrcsv] [--plot] [--numfigs NUMFIGS] DataFilter/DataFiller Sample optional arguments: -h, --help show this help message and exit --data DATA, -d DATA data to add to the system --filter, -ft Filter using session start/end times --filler, -fl Fill missing bars inside start/end times --fvol FVOL Use as fill volume for missing bar (def: 0.0) --tstart TSTART, -ts TSTART Start time for the Session Filter (HH:MM) --tend TEND, -te TEND End time for the Session Filter (HH:MM) --relvol, -rv Add relative volume indicator --fromdate FROMDATE, -f FROMDATE Starting date in YYYY-MM-DD format --todate TODATE, -t TODATE Starting date in YYYY-MM-DD format --writer, -w Add a writer to cerebro --wrcsv, -wc Enable CSV Output in the writer --plot, -p Plot the read data --numfigs NUMFIGS, -n NUMFIGS Plot using numfigs figures
源码:
from __future__ import (absolute_import, division, print_function, unicode_literals) import argparse import datetime import math # The above could be sent to an independent module import backtrader as bt import backtrader.feeds as btfeeds import backtrader.utils.flushfile import backtrader.filters as btfilters from relativevolume import RelativeVolume def runstrategy(): args = parse_args() # Create a cerebro cerebro = bt.Cerebro() # Get the dates from the args fromdate = datetime.datetime.strptime(args.fromdate, '%Y-%m-%d') todate = datetime.datetime.strptime(args.todate, '%Y-%m-%d') # Get the session times to pass them to the indicator # datetime.time has no strptime ... dtstart = datetime.datetime.strptime(args.tstart, '%H:%M') dtend = datetime.datetime.strptime(args.tend, '%H:%M') # Create the 1st data data = btfeeds.BacktraderCSVData( dataname=args.data, fromdate=fromdate, todate=todate, timeframe=bt.TimeFrame.Minutes, compression=1, sessionstart=dtstart, # internally just the "time" part will be used sessionend=dtend, # internally just the "time" part will be used ) if args.filter: data.addfilter(btfilters.SessionFilter) if args.filler: data.addfilter(btfilters.SessionFiller, fill_vol=args.fvol) # Add the data to cerebro cerebro.adddata(data) if args.relvol: # Calculate backward period - tend tstart are in same day # + 1 to include last moment of the interval dstart <-> dtend td = ((dtend - dtstart).seconds // 60) + 1 cerebro.addindicator(RelativeVolume, period=td, volisnan=math.isnan(args.fvol)) # Add an empty strategy cerebro.addstrategy(bt.Strategy) # Add a writer with CSV if args.writer: cerebro.addwriter(bt.WriterFile, csv=args.wrcsv) # And run it - no trading - disable stdstats cerebro.run(stdstats=False) # Plot if requested if args.plot: cerebro.plot(numfigs=args.numfigs, volume=True) def parse_args(): parser = argparse.ArgumentParser( description='DataFilter/DataFiller Sample') parser.add_argument('--data', '-d', default='../../datas/2006-01-02-volume-min-001.txt', help='data to add to the system') parser.add_argument('--filter', '-ft', action='store_true', help='Filter using session start/end times') parser.add_argument('--filler', '-fl', action='store_true', help='Fill missing bars inside start/end times') parser.add_argument('--fvol', required=False, default=0.0, type=float, help='Use as fill volume for missing bar (def: 0.0)') parser.add_argument('--tstart', '-ts', # default='09:14:59', # help='Start time for the Session Filter (%H:%M:%S)') default='09:15', help='Start time for the Session Filter (HH:MM)') parser.add_argument('--tend', '-te', # default='17:15:59', # help='End time for the Session Filter (%H:%M:%S)') default='17:15', help='End time for the Session Filter (HH:MM)') parser.add_argument('--relvol', '-rv', action='store_true', help='Add relative volume indicator') parser.add_argument('--fromdate', '-f', default='2006-01-01', help='Starting date in YYYY-MM-DD format') parser.add_argument('--todate', '-t', default='2006-12-31', help='Starting date in YYYY-MM-DD format') parser.add_argument('--writer', '-w', action='store_true', help='Add a writer to cerebro') parser.add_argument('--wrcsv', '-wc', action='store_true', help='Enable CSV Output in the writer') parser.add_argument('--plot', '-p', action='store_true', help='Plot the read data') parser.add_argument('--numfigs', '-n', default=1, help='Plot using numfigs figures') return parser.parse_args() if __name__ == '__main__': runstrategy()
评论被关闭。