Why is my backtest so slow?

kc975943kc975943 Posts: 55
edited November 2017 in Under The Hood

This is not a frivolous question. My code is structured around using on_timer function that has several once-a-day scheduled events. The log file below shows they are only executed once a day, and in this particular instance I am only looking at the day when no trading is scheduled. According to the log the whole thing should take no more than 10 seconds, yet the actual runtime is upwards of 200 seconds. Is there something that I need to do to optimize the code? Relevant snippets and log follow:

def on_start(self, md, order, service, account):
    t=datetime.now()
    if md.symbol==self.one_symbol:
           service.add_time_trigger(md.market_close_time-service.time_interval(minutes=30),
                                 timer_id="calcAndTrade")
            service.add_time_trigger(md.market_close_time-service.time_interval(minutes=30),
                                     timer_id="modelFit")
            service.add_time_trigger(md.market_open_time,timer_id="riskCalc")
    print(service.time_to_string(service.system_time)," - on_start: ",timedelta.total_seconds(datetime.now()-t))



 def on_timer(self, event, md, order, service, account):
    t1=datetime.now()
    if event.timer_id=='calcAndTrade':
        self.calcAndTrade(md,order,service,account)
    elif event.timer_id=='modelFit':
        self.model_fit(md,service)
    elif event.timer_id=='riskCalc':
        df=self._getHistoryAll(md,63,'daily')
        self.risk_data.append(df.groupby('symbol')['dp'].std())    
    print(service.time_to_string(service.system_time)," - ",event.timer_id,": ",timedelta.total_seconds(datetime.now()-t1))    

2015-01-02 09:30:00.000000 - on_start: 0.001431
2015-01-02 09:30:00.000000 - riskCalc : 3.355212
2015-01-02 15:30:00.000000 - calcAndTrade : 0.00023
Obtained history for 500 symbols, total size = (125567, 11)
After pivoting: (251, 500)
Remaining tickers after filtering for missing data: 500
2015-01-02 15:30:00.000000 - modelFit : 2.521188

Comments

  • ptunneyptunney Posts: 246

    At a first stab I would say make sure you only subscribe to the services you need...

    https://app.cloudquant.com/#/glossary/145

    Add the following to your On_Start

        service.clear_event_triggers()
        service.add_event_trigger([md.symbol], [])
    

    My suspicion is that your one symbol is quite a busy one, that you are running on CQ Elite and so getting called a lot on other events.

    I would be interested to see the results after shutting them down.

    Excellent post BTW - very detailed.

  • Thanks for the suggestion. These are good to know. Unfortunately, it had absolutely no impact on the observed runtime.

  • ptunneyptunney Posts: 246

    You have not said how many symbols you are subscribing to in is_symbol_qualified and you only show the timing for one symbol.
    I assume from your output you are looking at all symbols and selecting 500??

    For every backtest day the system will spin up 8000+ copies of your script, one for each US equity..
    If False is returned from **is symbol qualified ** or if a **service.terminate() ** command is sent during the trading day then that script is shut down.

    I would assume you are running the same functions on a number of symbols.

    I you provide me with the unique Submission HASH number from one of your results pages I can have the developers look to see if they can provide more information.

  • ptunneyptunney Posts: 246


    Also you can schedule a backtest to stop and start multiple times... Just be sure you are not getting into or getting out of a postion at the time you shut it down.

  • I am using the S&P500 list + SPY

    @classmethod
    def on_strategy_start(cls, md, service, account):
        sec_list = service.symbol_list.get_handle('9a802d98-a2d7-4326-af64-cea18f8b5d61')
        cls.universe = ['SPY']
        for md_info in md:
            if service.symbol_list.in_list(sec_list,md_info.symbol):
                cls.universe.append(md_info.symbol)
        cls.one_symbol='SPY'
    
    @classmethod
    def is_symbol_qualified(cls, symbol, md, service, account):
        # Return only the symbols that I am interested in trading or researching.
        return symbol in cls.universe
    

    Thanks for your help with this. I ran a short test a few days long: hash id = 910eb22d49a1f01417c3d54dca7052d1

  • ptunneyptunney Posts: 246
    edited November 2017

    Again, don't want to pry too much into what you are doing.. but if you are wanting to just be able to trade SP500 + SPY then you only need...

    @classmethod
    def is_symbol_qualified(cls, symbol, md, service, account):
        sp500=service.symbol_list.in_list(service.symbol_list.get_handle('9a802d98-a2d7-4326-af64-cea18f8b5d61'),symbol)
        return symbol == 'SPY' or sp500

    That will return True if the symbol is SPY or any of the SP500 symbols and false for all others.. So you will only be running 501 scripts.
  • Ok, thanks for the tip, i see you are using a function that checks if a ticker is in the list. However, the code I am currently using also does that. So, it doesn't seem like the problem is that the script is running for 8000 securities.
    Perhaps, if there were an example that trades SP500 universe and runs much faster than 3-5 min a day, I would be able to identify the bottleneck?

  • ptunneyptunney Posts: 246
    edited November 2017
        @classmethod
        def is_symbol_qualified(cls, symbol, md, service, account):
            sp500=service.symbol_list.in_list(service.symbol_list.get_handle('9a802d98-a2d7-4326-af64-cea18f8b5d61'),symbol)
            return symbol == 'SPY' or sp500
    
        def on_start(self, md, order, service, account):
            order.algo_buy(self.symbol, algorithm="arca_moo_buy", intent="init",order_quantity=1000)
            order.algo_sell(self.symbol, algorithm="arca_moc_sell", intent="none",order_quantity=1000)
    

    That took 172 seconds run on 1/3/2017.

    It's a little bit of a cheat but it works.

    Now there was a delay before it started running.
    We attempt to use all facilities as much as possible and there can be up to a 5 minute delay before a model will start if everything is at capacity.

    If your model does not hold positions over multiple days then the model can parallelize on the system. For example...

    I ran the same model for the first six months of 2017

    `

    Submission 123 new jobs             : 12:05:24 
    First job changed to running        : 12:06:37
    Final job completed                 : 12:17:28
    Average time per job from scorecard : 102.05 seconds 
    Total job time                      : 00:12:04 / 123 = 5.88 seconds per job.
    

    `

    So during your development process it can take a little longer to run your single day jobs but, once you have everything in order, parallelized backtest speeds are fine.

    My own model reports backtest speeds per day from the scorecard at 2-3 minutes.. That is for a complex script and hundreds of symbols. But I never hold multiday so I can parallelize.

  • ptunneyptunney Posts: 246
    edited November 2017

    Also...

    Have you considered running the profiler..

    You can profile on up to 5 symbols at once and see where the time is being spent in your code...

    Note..
    Your method of constructing the SP500 list is probably faster than mine..
    Mine does the list checks for every one of the 8000+ symbols..
    You create your list once in on_strategy_start!

    Nice job!

Sign In or Register to comment.