QuantRocket logo

Disclaimer

Universe Selection

In this notebook we will use Zipline's Pipeline API to select our daily universe of stocks.

Universe size

Due to the size of the US stock market, universe selection is an important first step for any Zipline strategy that targets US stocks. The limiting factor for universe size is real-time data collection. There are about 8,000 listed US stocks, but depending on your data provider, you may not be able to stream real-time data for all 8,000 stocks at once, due to concurrent ticker limits imposed by your data provider. Even if your data provider supports unlimited streaming, database size and performance are still important considerations. (See the usage guide for more on this topic.) Therefore, a recommended first task is to use the Pipeline API to screen the entire US stock universe each day and select a smaller universe of candidate stocks for which to request real-time data and make trading decisions.

Pipeline screen rules

Ideally, we would screen for stocks with opening gaps, since those are the stocks we want to trade, but this won't work, because detecting opening gaps requires real-time data (in live trading) in order to know the opening price. The pipeline screen must only use rules that can be known before the market opens. We will use the following rules:

Test the number of candidate stocks

One benefit of running a pipeline interactively is that we can see how many candidate stocks pass the screen and iteratively refine our rules until the resulting universe is a suitable size.

First, we set the default bundle:

Then, we define a pipeline whose screen is based on the above universe selection rules. We also compute several columns that, while not yet necessary at this exploratory stage, will be useful later in identifying gaps: the moving average, the prior day's low, and the standard deviation of the stock's closing price (which will be used to make sure the gap is sufficiently large).

Then we run the pipeline over a representative date range and plot the number of securities that passed the screen each day:

If the number of candidate securities is too high for your real-time data limits, you can experiment with the pipeline screen until the number is more suitable.

Also keep in mind that you need not collect real-time data for every security that passes your screen. You could further filter the pipeline output in before_trading_start() and only initiate real-time data collection for this subset of securities. For example, if you wanted to impose a hard 100-ticker limit on real-time data collection, but your pipeline might return more than 100 securities, you could rank the pipeline output by one of your columns and take the top 100:

Code Reuse

An an optional step to facilitate code reuse, we copy our pipeline code to pipeline.py which will allow us to import it in other notebooks and in our Zipline strategy, without re-typing it every time.


Next Up

Part 3: Interactive Strategy Development