QuantRocket logo

Disclaimer

Pairs Selection Pipeline

The previous notebook showed deteriorating performance in the example pair GLD-GDX over time. In this notebook we aim to mitigate the problem of deteriorating performance by exploring a more dynamic approach to identifying and selecting pairs.

We follow a 4-step process:

  1. Filter the universe of ETFs to those meeting our dollar volume requirements
  2. Run the Johansen test on all possible pairs to identify cointegrating pairs in-sample
  3. Run in-sample backtests on all cointegrating pairs to find the best performing pairs
  4. Select the 5 best performing pairs for an out-of-sample backtest

Variables

Setting the following variables appropriately will allow the remainder of the notebook to be adapted to different universes, liquidity filters, or date ranges.

Step 1: Filter by dollar volume

First we filter the universe of ETFs to include only securities having average dollar volume of at least $80M USD in the in-sample period.

117 ETFs meet our threshold.

Step 2: Find Cointegrating Pairs

Next, we combine the liquid ETFs into all possible pairs:

This results in 4,000 combinations.

Then we use the coint_johansen function from the statsmodels library to identify pairs that cointegrate with at least 90% confidence:

In the get_hedge_ratio method in the Moonshot strategy, we used the Johansen test to obtain hedge ratios but ignored the test statistics and critical values (i.e. we don't actually test for cointegration in the pairs backtest itself). Here, we do the opposite: we don't need the hedge ratios but rather check the test statistics against the critical values to determine if there is cointegration.

We find there are 81 cointegrating pairs:

Step 3: Run In-Sample Backtests on All Cointegrating Pairs

Having identified all cointegrating pairs, the next step is to run an in-sample backtest on each cointegrating pair. The in-sample backtest period is subsequent to the cointegration test period, but we still consider it in-sample because we will use the performance results from the in-sample backtests to select a portfolio of pairs for out-of-sample testing.

First, we download symbols and names from the securities master database to help us know what we're testing:

To run backtests on so many pairs, we take advantage of Moonshot's ability to set strategy parameters dynamically using the params argument. We use Moonchart to calculate the Sharpe ratios of each pair strategy:

We sort by Sharpe ratio, and show the best 5 performers:

Step 4: Out-of-sample Backtest

Having found the 5 best performing pairs, we create a Moonshot strategy for each pair and run an out-of-sample backtest on the portfolio of pairs.

In order to avoid lots of typing, we use the code below to print out the Moonshot subclasses, which we can then copy and paste into pairs.py:

Having copied the above code into pairs.py, we are ready to run the out-of-sample backtest:

The tear sheet shows good performance in the first two years out of sample followed by deteriorating performance, perhaps indicating that we need to re-run cointegration tests and in-sample backtests on a forward basis and update our portfolio of best pairs every year or two.


Back to Introduction