Limitations on the amount of sentiment that can be retrieved

I am trying to a get a week worth of sentiment for the S&P500 universe. Following the original example, I have this code in on_strategy_start:

cls.sources = {'SM_TW':'!sentiment/sma/tw/15min',
                               'SM_ST':'!sentiment/sma/st/15min'}
cls.sent_scores={symbol:np.zeros(3) for symbol in cls.universe}
cls.buzz={symbol:0 for symbol in cls.universe}
 source='SM_TW'
 hours=168
 for symbol in cls.universe:
        sent_data=[]
        buzz_data=[] 
        data = service.query_data(cls.sources[source],
                                    symbol,
                                    start_timestamp=(service.time() - service.time_interval(hours = hours )))
        for item in data:
                if item['s-score'] != 0 :
                    sent_data.append(item['s-score'])
                    buzz_data.append(item['s-buzz'])
        # some extra code that reduces the above arrays down to 3-4 numbers for given symbol 
        # and populates the cls.sent_scores and cls.buzz dictionaries

Running it for one day, after quite a long time, I get a console message that simply says: "Killed" which presumably means it ran out of memory.
Is there a better way to get weekly sentiment?

Comments

  • ptunneyptunney Posts: 246
    edited January 2019

    As an initial immediate suggestion without testing anything at all I would say the easiest thing you could do is separate the data sources.
    Why would you combine StockTwits and Twitter?
    Just pull one, or at the most one at a time.
    I would regard them as quite different beasts and both will generate quite different signals.

  • I am only using one source in the above:) (line 5.)

  • ptunneyptunney Posts: 246
    edited January 2019

    Unfortunately, at the moment, there is no consistent way to get large amounts of data.
    The issue is pulling random data from a large source in many steps.
    With Liberator we give users the ability to hand a list of symbols to a single query and get a block of results back.
    service.query_data is meant for smaller calls, normally just before a trading decision, "should I get in or not?"
    Most of the sentiment data is a rolling value anyway so there is no need to pull historical data unless you want to detect the trend.
    We are looking at adding the SMA Twitter and Stocktwits data to the Liberator query but yours was the first use case identified so there is no option in place at the moment.
    If you pull for just one sentiment (stocktwits or twitter), for less symbols, for a shorter period of time you will probably have success with service.query_data.

    So something like this should probably work as a starting point, here the query is run in on_strategy_start (ideally it would be a single liberator query run once in on_strategy_start)...

    from cloudquant.interfaces import Strategy
    
    class SMA_on_strat_start(Strategy):
    
        @classmethod
        def on_strategy_start(cls, md, service, account):
            days = 7
            sp100=[item.symbol for item in md if service.symbol_list.in_list(service.symbol_list.get_handle('5af64352-6bc4-47cf-a73d-09925dab62bb'), item.symbol)]
    #        sp500=[item.symbol for item in md if service.symbol_list.in_list(service.symbol_list.get_handle('9a802d98-a2d7-4326-af64-cea18f8b5d61'), item.symbol)]
            startTime=(md.market_open_time - service.time_interval(hours=24*days)) 
            for my_symbol in sp100:
                sma_st = service.query_data('!sentiment/sma/st/15min', my_symbol, start_timestamp=(startTime))
                print my_symbol,len(sma_st)
    

    or if you prefer to do it in on_start for each individual symbol

    from cloudquant.interfaces import Strategy
    import traceback
    
    class SMA_on_start(Strategy):
    
        @classmethod
        def is_symbol_qualified(cls, symbol, md, service, account):
    #        sp500=service.symbol_list.in_list(service.symbol_list.get_handle('9a802d98-a2d7-4326-af64-cea18f8b5d61'), symbol)
            sp100=service.symbol_list.in_list(service.symbol_list.get_handle('5af64352-6bc4-47cf-a73d-09925dab62bb'), symbol)
            return sp100 #or symbol=="SPY"
    
        def on_start(self, md, order, service, account):
            startTime=(md.market_open_time - service.time_interval(hours=24*7)) 
            sma_st = service.query_data('!sentiment/sma/st/15min', md.symbol,start_timestamp=(startTime))
            counter = 0
            for item in sma_st:
                counter +=1
    #            print "sma_st history",str(item),
            print service.time_to_string(service.system_time),self.symbol,"SMA Stocktwits HISTORY",counter
            sma_st = "" # clear it out once I have finished with it.
    
Sign In or Register to comment.