## 13.1 – Tracking the pair data

We have finally reached a point where we are through with all the background theory knowledge required for Pair Trading. I know most of you have been waiting for this moment ☺

In this last and final chapter of pair trading, we will take up an example of a live trade and discuss factors that influence the trade.

Here is a quick recap of pre-trade theory –

- Basic overview of linear regression and how to perform one
- Linear regression requires you to regress an independent variable X against a dependent variable Y
- The output of linear regression includes the intercept, slope, residuals, standard error, and the standard error of the intercept
- The decision to classify a stock as dependent (Y) and independent (X) depends the error ratio
- Error ratio is defined as the ratio of standard error of intercept/standard error
- We calculate the error ratio by interchanging both X and Y. The combination which offers the lowest error ratio will define which stock is assigned X and which on as Y
- The residuals obtained from the regression should be stationary. If they are stationary, then we can conclude that the two stocks are co-integrated
- If the stocks are cointegrated, then they move together
- Stationarity of a series can be evaluated by running an ADF test
- The ADF value of an ideal pair should be less than 0.05

Over the last few chapters, we have discussed each point in great details. These points help us understand which pairs are worth considering for pair trading. In a nutshell, we take any two stocks (from the same sector), run a linear regression on it, check the error ratio and identify which stock is X and which is Y. We now run an ADF test on the residual of the pair. A pair is considered worth tracking (and trading) only if the ADF is 0.05 or lower. If the pair qualifies this, we then track the residuals on a daily basis and try to spot trading opportunities.

A pair trade opportunity arises when –

- The residuals hit -2 standard deviations (-2SD). This is a long signal on the pair, so we buy Y and sell X
- The residuals hit +2 standard deviation (+2SD). This is a short signal on the pair, so we sell Y and buy X

Having said so, I generally prefer to initiate the trade when the residuals hit 2.5 SD or thereabouts. Once the trade is initiated, the stop loss is -3 SD for long trades and +3SD for short trades and the target is -1 SD and +1 SD for long and short trades respectively. This also means, once you initiate a pair trade, you will have to track the residual value to know where it lies and plan your trades. Of course, we will discuss more on this later in this chapter.

## 13.2 – Note for the programmers

In **chapter 11**, I introduced the ‘Pair Data’ sheet. This sheet is an output of the Pair Trading Algo. The pair trading algo basically does the following –

- Downloads the last 200-day closing prices of the underlying. You can do this from NSE’s bhavcopy, in fact, automate the same by running a script.
- The list of stock and its sector classification is already done. Hence the download is more organized
- Runs a series of regressions and calculates the ‘error ratio’ for each regression. For example, if we are talking about RBL Bank and Kotak Bank, then the regression module would regress RBL (X) and Kotak (Y) and Kotak (X) and RBL (Y). The combination which has the lowest error ratio is considered and the other combination is ignored
- The adf test is applied on the residuals, for the combination which has the lowest error ratio.
- A report (pair data) is generated with all the viable X-Y combination and its respective intercepts, beta, adf value, standard error, and sigma are noted. I know we have not discussed sigma yet, I will shortly.

If you are a programmer, I would suggest you use this as a guideline to develop your own pair trading algo.

Anyway, in chapter 11, I had briefly explained how to read the data from the Pair data, but I guess it’s time to dig into the details of this output sheet. Here is the snapshot of the Pair data excel sheet –

Look at the highlighted data. The Y stock is Bajaj Auto and X stock is TVS. Now because this combination is present in the report, it implies – Bajaj as Y and TVS as X has a lower standard error ratio, which further implies that Bajaj as X and TVS as Y is not a viable pair owing to higher error ratio, hence you will not find this combination (Bajaj as X and TVS as Y) in this report.

Along with identifying which one is X and Y, the report also gives you the following information –

- Intercept – 1172.72
- Beta – 2.804
- ADF value – 0.012
- Std_err – -0.77
- Sigma – 103.94

I’m assuming (and hopeful) you are aware of the first three variables i.e intercept, Beta, and ADF value so I won’t get into explaining this all over again. I’d like to quickly talk about the last two variables.

Standard Error (or Std_err) as mentioned in the report is essentially a ratio of Today’s residual over the standard error of the residual. Please note, this can get a little confusing here because there are two standard error’s we are talking about. The 2^{nd} standard error is the standard error of the residual, which is reported in the regression output. Let me explain this with an example.

Have a look at the snapshot below –

This is the regression output summary of Yes Bank versus South Indian Bank. I’ve highlighted standard error (22.776). This is the standard error of the residuals. Do recollect, we have discussed this earlier in this module.

The second highlight is 20.914, which is the residual.

The std_err in the report is simply a ratio of –

Today’s residual / Standard Error of the residual

= 20.92404/22.776

= 0.91822

Yes, I agree calling this number std_err is not the best choice, but please bear with it for now ☺

This number gives me information of how today’s residual is position in the context of the standard distribution. This is the number which is the key trigger for the trade. A long position is hit if this number is -2.5 or higher with -3.0 as stop loss. A short position is initiated if this number reads +2.5 or higher with a stop loss at +3.0. In case of long, target is at -1 or lower and in case of short, the target is +1 or lower.

This also means, the std_err number has to be calculated on a daily basis and tracked to identify trading opportunities. More on this in a bit.

The sigma value in the pair data report is simply the standard error of the residual, which in the above case is 22.776.

So now if you read through the pair data sheet, you should be able to understand the details completely.

Alright, let us jump to the trade now ☺

## 13.3 – Live example

I have been running the pair trading algo to look for opportunities, and I found one on 10^{th} May 2018. Here is the snapshot of the pair data, you can download the same towards the end of this chapter. Do recollect, this pair trading algo was generated using the closing prices of 10^{th} May.

Look at the data highlighted in red. This is Tata Motors Ltd as Y (dependent) and Tata Motors DVR as X (independent).

The ADF value reads, 0.0179 (less than the threshold of 0.05), and I think this is an excellent adf value. Do recollect, ADF value of less than 0.05 indicates that the residual is stationary, which is exactly what we are looking for.

The std_err reads -2.54, which means the residuals is close has diverged (sufficiently enough) away from the mean and therefore one can look at setting up a long trade. Since this is a long trade, one is required to buy the dependent stock (Tata Motors) and short the independent stock (Tata Motors DVR). This trade was supposed to be taken on 11^{th} May Morning (Friday), but for some reason, I was unable to place the trade. However, I did take the trade on 14^{th} May (Monday) morning at a slightly bad rate, nevertheless, the intention was to showcase the trade and not really chase the P&L.

Here are the trade execution details –

You may have two questions at this point. Let me list them for you –

**Question** – Did I actually execute the trade without checking for prices? As in I didn’t even look at what price the stocks, I didn’t look at support, resistance, RSI etc. Is it not required?

**Answer **– No, none of that is required. The only thing that matters is where the residual is trading, which is exactly what I looked for.

**Question –** On what basis did I choose to trade 1 lot each? Why can’t I trade 2 lots of TM and 3 lots of TMD?

**Answer** – Well this depends on the beta of the stock. We will use the beta and identify the number of stocks of X &Y to ensure we are **beta neutral** in this position. The beta neutrality states that for every 1 stock of Y, we need to have beta*X stock of X. For example, in the Tata Motors (Y) and Tata Motors DVR (X) for example, the beta is 1.59. This means, for every 1 stock of Tata Motors (Y), I need to have 1.59 stocks of Tata Motors DVR (X).

Going by this proportion, the lot size of Tata Motors (Y) is 1500, so we need 1500*1.59 or 2385 shares of Tata Motors DVR (X). The lot size is 2400, quite close to 2385, hence I decided to go with 1 lot each. But I’m aware this trade is slightly more skewed towards the long side since I’m buying additional 115.

Also, please note, because of this constraint, we cannot really trade pairs if the beta is –ve, at least, not always.

Remember, I initiated this trade when the residual value was -2.54. The idea was to keep the position open and wait for the target (-1 on residual) or stop loss (-3 on residual) was hit. Until then, it was just a waiting game.

To track the position live, I’ve developed a basic excel tracker. Of course, if you are a programmer, you can do much better with these accessories, but given my limited abilities, I put up a basic position tracker in excel. Here is the snapshot, of course, you can download this sheet from the link posted below.

The position tracker has all the basic information about the pair. I’m guessing this is a fairly easy sheet to understand. I’ve designed it in such a way that upon entering the current values of X & Y, the latest Z score is calculated and also the P&L. I’d encourage you to play around this sheet, even better if you can build one yourself ☺

Once the position is taken, all one has to do is track the z-score of the residual. This means you have to keep tracking the values and the respective z-scores. This is exactly what I did. In fact, for the sake of this chapter, my colleague, Faisal, logged all the values (except for the 14^{th} and 15^{th}). Here are the logs –

As you can see, the current values were tracked and the latest z-score was calculated several times a day. The position was open for nearly 7 trading session and this is quite common with pair trading. I’ve experienced positions where they were open for nearly 22 -25 trading sessions. But here is the thing – as long as your math is right, you just have to wait for the target or SL to trigger.

Finally, on 23^{rd} May morning, the z-score dropped to the target level and there was a window of opportunity to close this trade. Here is the snapshot –

Notice, the gains in Tata Motors DVR is much larger than the loss in Tata Motors. In fact, when we take the trade, we will never know which of the two positions will make us the money. The idea, however, is that one of them will move in our favor and the other won’t (or may). It’s however, just not possible to identify which one will be the breadwinner.

The position tracker for the final day (23^{rd} May) looked like this –

The P&L was roughly Rs.14,000/-, not bad I’d say for a relatively low-risk trade.☺

## 13.4 – Final words on Pair Trading

Alright guys, over the last 13 chapter, we have discussed everything I know about pair trading. I personally thing this is a very exciting way of trading rather than blind speculative trading. Although less risky, pair trade has its own share of risk and you need to be aware of the risk. One of the common ways to lose money is when the pair can continue to diverge after you initiate the position, leaving you with a deep loss. Further, the margin requirements are slightly higher since there are two contracts you are dealing with. This also means you need to have some buffer money in your account to accommodate daily M2M.

There could be situations where you will need to take a position in the spot market as well. For example on 23^{rd} May, there was a signal to go short on Allahabad Bank (Y) and long on Union Bank (X). The z-score was 2.64 and the beta for this pair is 0.437.

Going by beta neutrality, for every 1 share of Allahabad Bank (Y), I need 0.437 shares of Union Bank (X). The Lot size of Allahabad Bank is 10,000, this implies I need to buy 4378 shares of Union Bank. However, the lot size of Union Bank is 4000, hence I had to buy 370 shares in the spot market.

Well, I hope I trade is successful ☺

I know most of you would want the pair data sheet made available. We are working on making this sheet available to you on a daily basis so that you can track the pairs. Meanwhile, I would suggest you try and build this algo yourself. If you have concerns, please post it below and I will be happy to assist.

If you don’t know how to program then you have no option but to find someone who knows programming and convince him or her that there is money to be made, this is exactly what I did ☺

Lastly, I would like to leave you with a thought –

- We run a linear regression of Stock A with Stock B to figure out if the two stocks are cointegrated with their residuals being stationary
- What if Stock A with Stock B is not stationary, but instead Stock A is stationary with stock B & C as a combined entity?

Beyond Pair, trading lies something called as multivariate regression. By no stretch of the imagination is this easy to understand, but let me tell you if you can graduate to this arena, the game is different.

Download the Position Tracker and Pair Datasheet below:

### Key takeaways from this chapter

- The trigger to trade a pair comes from the residual’s current value
- Check for beta neutrality of the pair to identify the number of stock required in X and Y
- If the beta of the pair is negative, then it may not be possible to set up the trade
- Once the trade is initiated, check the z-score movement to trade its current position
- The price of the futures does not really matter, the emphasis is only on the z-score

Hey Karthik,

how do we position size the trade? Is it possible for us to know the amount we would lose if the trade goes wrong?

You will have to go by beta neutrality to position size this trade. Have explained this in the chapter.

It is hard to know how much you may lose in advance as the trade is on the z-score.

hey Karthik,

I understood the beta neutrality part of position size. My question is, let me take an example from above. lets say 1 lot of tata motors is equal to 1 lot of tata motors DVR. In this case if I want to trade 2 lots of each of the script then how do I know if it is safe for me to position 2 lots assuming I have funds to take the position. That’s why I wanted to know if their is any way if we could know the amount of loss we would suffer in case the SL hits. Assuming I am willing to risk 5% of my account in one trade. If trading 1 lot of both has a SL at 2.5% then I can probably do 2 lots right? Is there any rough approximation of finding out the loss. I am not worried about winning because we are entering at 2SD level. And from the trade you mentioned above, you have entered the trade of tata and tata dvr on 14th and profit was taken on 23rd. Assume you have not got the profit signal and the trade is open, there was a quaterly result announcement on 24th, what do you recommend us to do? Should we close the trade or keep it open if there is any situation like this?

Thank you.

I get your concern, Nikhil. Unfortunately, there is no way this can translate to a Rupee value. Or maybe I need to do some research, I will certainly get back to you on this. But here is what I have observed, whenever this position makes a loss (assuming its initiated at 2.5SD), it is usually in the range of 10-12K per beta neutral pair. This is purely from my observation and no concrete science backing this.

Thanks Karthik.

What about the 2nd question on result announcement part?

Hello sir

I have a doubt. Are the regression parameters “static” or “dynamic” in your basic excel tracker sheet? As we know these estimators (coefficients, std. error….) have variance. Everytime we regress the securities the values of the estimators get varied. So taking the parameters of the eqn. y=mx+c as constant and calculating Z-Score based on earlier calculated estimators might give some error, i guess.

What is your view, sir?

Thank you very much for this most awaited chapter, sir!

Varsity student

You are right on that. Given this, you may also want to look at the variance or standard error of the intercept as well. However, these trades are open for few days, unless there is a drastic move in the share price, you will not experience big changes in the variables.

When we run linear regression using python statsmodels library we don’t get STD error of residuals in the output. Is there any formula to calculate the STD error using residual values?

You actually don’t need the standard error of residuals, you only need the residuals. The standard deviation of the residuals is the standard error of residuals.

How to calculate z score

You can divide the residual by standard error of residual to get the Z-score.

Sir what is use of Z-score.M not geeting.

Z-Score tells you where the variable is with respect to its mean. For example, a Z-score of 2.5 indicates that the variable is 2.5 standard deviations away from the mean.

Thanks a lot sir

Welcome!

Can this method be used in cash also or in future only because I have less capital also trial purpose can pair with small quantities

Its best if you do it in futures because the strategy requires you to short stocks as well, which is possible only in futures.

Hello sir.. was eagerly waiting for it. A big thanks. 👍

Happy learning, Thirumal!

Thanks for educating us sir 🙂 . It would be of great help if you could share/made available the pair data with ‘P’ values on weekly basis.

Thanks for everything again.

I’m trying to figure a way out to share the excel sheet on a daily basis. Hopefully it should be possible 🙂

Hello sir

When i regressed TM(Y) & TMD(X) over one year data set(249 data), it didn’t show any trading opportunity on 11/05/18. While comparing the two regression analysis i noticed that the regression done on 200 set of data generated better results than that of one year data analysis. Also the ADF value was lesser in the former(200 day data) regression.

1) Is 200 day data optimum for data analysis? Can we regress 100 set of data too?

2) The ADF value i got after regressing the same set of data as done here on Varsity is 0.0567 while your value is 0.017. I used both Schwarz Info criterion & Alkaike Info criterion with the lag length of 14 (it was automatic). Which criterion should i use and what should be the optimum leg length to run ADF test? I had asked this question almost 3 weeks ago and you told me to wait.

Thank you, sir 🙂

Varsity student

1) I remember running some sort of optimization years ago and realized 200-day data is a sweet spot, sticking to it since then

2) The lag is usually set to the square root of the number of data points…so a lag of 14 is good (sqr root of 200). Btw, I’m really not sure why there is a difference in ADF values.

Hi karthik,

Had one question. If Lot size of X is , say 800, and Y is 1000.

From Linear Reg. slope comes out to be very low, say 0.001.

Then how do we calculate hedge ratio in this scenario?

This would be tricky as the beta is near zero. YOu cannot really trade this pair.

Karthik,

In your live trade example, the trade was open for almost 7 days. How is it that the institutional traders did not spot this opportunity to jump in and close the divergence more quickly? Am I missing something?

Thanks and regards,

Samir

Samir, as retail traders, we have the advantage of liquidity (or maybe illiquidity). Institutes want to trade few 100 of lots with low slippages… a couple of lots here and there won’t matter to them, but it would make a big difference to an individual.

hii

i am new at treading and not so familiar with ADF test but i have software known as E VIEWS which will test ADF. but i don’t know how to run with it. how to enter data in software if sum one know let me know.

Thnx in advance..

Sir, have you tried pair trading using 2 ATM options? Since historically option premiums data are not available I couldn’t test the effectiveness of it.

No, have not really tried pair trading with options. There would be other dynamics when you trade options.

Hi karthik sir,

Thanks for your lessons on pair trading. I want to know is it possible to do pair trade in commodities as well like GOLD:SILVER, COPPER:NICKEL, LEAD:ALLUMINIUM?

Yes, you certainly can.

Dear karthik,

There is one website, pairtradepro. Com which is giving pair trade signals.

Can you anything sure about that???

I’ve never really tried it, Ashwin. Hence cant comment on it.

I was testing my algo on the trade that you took on 23rd May

x = UnionBank

y = ALBK

beta = 0.437

for that trade I am getting adf_p_value = 0.3637

am I missing something here sir?

also sir for TATAMOTORS–TATAMTRDVR trade what was your look back period sir? because for that also my backtest values are not matching

Look back period is 200 days.

but I am gettings same backtest value for ICICI HDFC price data that was provided in the earlier chapters sir, is it possible for you to share closing price data for TATAMOTORS–TATAMTRDVR, I think that would help me a lot.

Mani, go back 200 trading days from 10th May, that would be my closing price data. Also, this data was obtained from NSE bhav copy.

What lag did you select for the ADF test, Mani? I think the algo we use has a lag of 14.

Look back period is 200 days.

Sir what I found is that there is a slight change in beta = 1.67, because of which intercept = 21.96, residual=-19.21 and sigma=5.82 is also different. Because of all these changes adf is different

Value that I got from my algo is same as excel output, so I guess I can continue.

Is my conclusion good enough sir?

Hmm, maybe Mani. Request you to check this across multiples examples to be doubly sure.

Hello sir

When I run the regression b/w Unionbank & ALBK for past 200 days beginning from 2nd of Aug’17, the trading signal generated on 22nd May. The important parameters are as follows:

Unionbank(X) and ALBK(Y)

BETA: 0.437861306

INTERCEPT: 7.503194585 (19% of ALBK price)

STD_ERR: 2.714924731

Z-SCORE : -2.64

ADF: 0.0601(Lag length 14,Schwarz info criterion)

The trading signal was generated on 22nd May to go long on ALBK and short on Unionbank. Oh my God! It is just opposite of what you traded. I have cross checked my regression analysis. Please help me, sir.

Thanks

Kumar, you are right. I’m guilty of this trade information that I put up. It misread this as 2.64 instead of -2.64, but I think I lucked out because the Z-score expanded over the next few days and I actually was in profit when I realized my mistake.

Btw, great job on this one 🙂

Sir here kumar used price data from 2nd of Aug’17 to 22nd May’18, if so doesn’t it workout to be more than 200 days? I think I am missing something sir?

I guess its ard 200, need to check the data points, Mani.

I get it sir, trading days.

I made a mistake in my code that calculates number of trading days sir. So was using less data points.

Glad you could rectify it.

Thank you, sir 🙂

While updating the pair data sheet, luckily i got the following trading signal on 4th June’18.

BANKBARODA(Y) & INDIANB(X) (200 trading day data)

Beta= 0.272

Intercept= 61.9 (45% of Y)

Z-Score= -2.89

ADF= 0.0311(lag length 14)

Lot size Baroda=4000, Lot size Indianbank=2000

If i take 8000 Baroda then according to the eqn the lot size of Indianbank should be 2176.

What is you view, sir?

Except for the intercept, everything is convincing enough 🙂

Did you take the trade? If not, at least track it on paper and please share the results here. I’m not tracking this.

Hello sir

I didn’t take the trade but i am tracking it on paper & definitely will share it here. Btw, its my first pair trade though it is on paper 🙂

I have a question, sir. When i checked the 100 trading day data for the same pair, the ADF value was 0.5135 and Zscore was -3.58. Is it convincing that we should check this way or should stick with 200 data set?

Thank you 🙂

I remember doing some optimization on look-back period and if I remember right, I figured 200 days is the best look back period. You may want to do this yourself once 🙂

I have taken 2 lots (8000) of Baroda and 1 lot (2000) of Indianbank. But for 8000 of Baroda the lot size of Indianbank should be 2176 or for 2000 of Indian bank the Baroda should be 7352.

Thanks

Sure, as long as you ensure its beta neutral.

While optimizing lookback, is net profit only criteria or any other factors need to be considered sir?

You can optimize this across any criteria – frequency of signals, profitability, risk, ADF etc

Hello sir,

can adf p_value be used for any timeseries data to confirm its normal distribution property sir, does it have any limitations?

Yes, it can be applied on any series on which you want to check the stationarity.

if a time series data passes stationarity test, does it mean that it is normally distributed?

I’m not very sure about this, but yes, I’m inclined to believe this is true.

ok sir thanks

Ohh beta “neutrality”, been wrecking my brains to understand what beta natality could be, theek kar do sir. Also, pair trading ki updated sheet de do?

Beta is a measure of volatility between two stocks, so when you beta neutral two stocks, you are essentially trying to minimize the volatility between them.

I meant its been misspelled so I was trying to figure out what beta natality could be…also where can we find the bi-weekly update to the pairs data?

I get it now. Will make the change 🙂

I’ll have the sheet updated t’row. Thanks.

Sir wt is use of sigma.

With due permission from Karthik sir, I will try to explain sigma,

Sigma is the standard deviation of residual, which we get while we hedge Y in terms of X.

Hope I am right!!!

That’s perfect, Mani!

Thanks Mani..

Sigma tells you the standard deviation of the residual.

Thanks a lot sir

Welcome!

Great chapter, really enjoyed it. Also are you going to be giving the pair data sheet?

Will try and upload the sheet today.

Hi Sir,

Could you please provide the data excel for a single pair from the ‘Pair-Data.csv’ . For example please provide the data excel for Hero.MotoCorp.Ltd as YStock and Bajaj.Auto.Ltd as XStock. So that the values (intercept, beta, adf_test_P.val etc) generated at my end

can be verified against the Pair-Data.csv

That would be difficult, Manoj. I’m trying to set up a daily source of pair data to be made available on the site. This may take some time.

Hi Sir,

Ok, understood. In that case could you please run the adf test (using your plugin) against the ‘Chapter-10_Residuals.xlsx’ excel file from chapter 10 and provide the intercept, beta, adf_test_P.val etc. So that I can check the values generated at my side against the values generated using your plugin. I need to verify this only once to make sure that the logic at my side is correct.

Sure, Manoj. I will try and do that over the weekend.