## 13.1 – Tracking the pair data

We have finally reached a point where we are through with all the background theory knowledge required for Pair Trading. I know most of you have been waiting for this moment ☺

In this last and final chapter of pair trading, we will take up an example of a live trade and discuss factors that influence the trade.

Here is a quick recap of pre-trade theory –

- Basic overview of linear regression and how to perform one
- Linear regression requires you to regress an independent variable X against a dependent variable Y
- The output of linear regression includes the intercept, slope, residuals, standard error, and the standard error of the intercept
- The decision to classify a stock as dependent (Y) and independent (X) depends the error ratio
- Error ratio is defined as the ratio of standard error of intercept/standard error
- We calculate the error ratio by interchanging both X and Y. The combination which offers the lowest error ratio will define which stock is assigned X and which on as Y
- The residuals obtained from the regression should be stationary. If they are stationary, then we can conclude that the two stocks are co-integrated
- If the stocks are cointegrated, then they move together
- Stationarity of a series can be evaluated by running an ADF test
- The ADF value of an ideal pair should be less than 0.05

Over the last few chapters, we have discussed each point in great details. These points help us understand which pairs are worth considering for pair trading. In a nutshell, we take any two stocks (from the same sector), run a linear regression on it, check the error ratio and identify which stock is X and which is Y. We now run an ADF test on the residual of the pair. A pair is considered worth tracking (and trading) only if the ADF is 0.05 or lower. If the pair qualifies this, we then track the residuals on a daily basis and try to spot trading opportunities.

A pair trade opportunity arises when –

- The residuals hit -2 standard deviations (-2SD). This is a long signal on the pair, so we buy Y and sell X
- The residuals hit +2 standard deviation (+2SD). This is a short signal on the pair, so we sell Y and buy X

Having said so, I generally prefer to initiate the trade when the residuals hit 2.5 SD or thereabouts. Once the trade is initiated, the stop loss is -3 SD for long trades and +3SD for short trades and the target is -1 SD and +1 SD for long and short trades respectively. This also means, once you initiate a pair trade, you will have to track the residual value to know where it lies and plan your trades. Of course, we will discuss more on this later in this chapter.

## 13.2 – Note for the programmers

In **chapter 11**, I introduced the ‘Pair Data’ sheet. This sheet is an output of the Pair Trading Algo. The pair trading algo basically does the following –

- Downloads the last 200-day closing prices of the underlying. You can do this from NSE’s bhavcopy, in fact, automate the same by running a script.
- The list of stock and its sector classification is already done. Hence the download is more organized
- Runs a series of regressions and calculates the ‘error ratio’ for each regression. For example, if we are talking about RBL Bank and Kotak Bank, then the regression module would regress RBL (X) and Kotak (Y) and Kotak (X) and RBL (Y). The combination which has the lowest error ratio is considered and the other combination is ignored
- The adf test is applied on the residuals, for the combination which has the lowest error ratio.
- A report (pair data) is generated with all the viable X-Y combination and its respective intercepts, beta, adf value, standard error, and sigma are noted. I know we have not discussed sigma yet, I will shortly.

If you are a programmer, I would suggest you use this as a guideline to develop your own pair trading algo.

Anyway, in chapter 11, I had briefly explained how to read the data from the Pair data, but I guess it’s time to dig into the details of this output sheet. Here is the snapshot of the Pair data excel sheet –

Look at the highlighted data. The Y stock is Bajaj Auto and X stock is TVS. Now because this combination is present in the report, it implies – Bajaj as Y and TVS as X has a lower standard error ratio, which further implies that Bajaj as X and TVS as Y is not a viable pair owing to higher error ratio, hence you will not find this combination (Bajaj as X and TVS as Y) in this report.

Along with identifying which one is X and Y, the report also gives you the following information –

- Intercept – 1172.72
- Beta – 2.804
- ADF value – 0.012
- Std_err – -0.77
- Sigma – 103.94

I’m assuming (and hopeful) you are aware of the first three variables i.e intercept, Beta, and ADF value so I won’t get into explaining this all over again. I’d like to quickly talk about the last two variables.

Standard Error (or Std_err) as mentioned in the report is essentially a ratio of Today’s residual over the standard error of the residual. Please note, this can get a little confusing here because there are two standard error’s we are talking about. The 2^{nd} standard error is the standard error of the residual, which is reported in the regression output. Let me explain this with an example.

Have a look at the snapshot below –

This is the regression output summary of Yes Bank versus South Indian Bank. I’ve highlighted standard error (22.776). This is the standard error of the residuals. Do recollect, we have discussed this earlier in this module.

The second highlight is 20.914, which is the residual.

The std_err in the report is simply a ratio of –

Today’s residual / Standard Error of the residual

= 20.92404/22.776

= 0.91822

Yes, I agree calling this number std_err is not the best choice, but please bear with it for now ☺

This number gives me information of how today’s residual is position in the context of the standard distribution. This is the number which is the key trigger for the trade. A long position is hit if this number is -2.5 or higher with -3.0 as stop loss. A short position is initiated if this number reads +2.5 or higher with a stop loss at +3.0. In case of long, target is at -1 or lower and in case of short, the target is +1 or lower.

This also means, the std_err number has to be calculated on a daily basis and tracked to identify trading opportunities. More on this in a bit.

The sigma value in the pair data report is simply the standard error of the residual, which in the above case is 22.776.

So now if you read through the pair data sheet, you should be able to understand the details completely.

Alright, let us jump to the trade now ☺

## 13.3 – Live example

I have been running the pair trading algo to look for opportunities, and I found one on 10^{th} May 2018. Here is the snapshot of the pair data, you can download the same towards the end of this chapter. Do recollect, this pair trading algo was generated using the closing prices of 10^{th} May.

Look at the data highlighted in red. This is Tata Motors Ltd as Y (dependent) and Tata Motors DVR as X (independent).

The ADF value reads, 0.0179 (less than the threshold of 0.05), and I think this is an excellent adf value. Do recollect, ADF value of less than 0.05 indicates that the residual is stationary, which is exactly what we are looking for.

The std_err reads -2.54, which means the residuals is close has diverged (sufficiently enough) away from the mean and therefore one can look at setting up a long trade. Since this is a long trade, one is required to buy the dependent stock (Tata Motors) and short the independent stock (Tata Motors DVR). This trade was supposed to be taken on 11^{th} May Morning (Friday), but for some reason, I was unable to place the trade. However, I did take the trade on 14^{th} May (Monday) morning at a slightly bad rate, nevertheless, the intention was to showcase the trade and not really chase the P&L.

Here are the trade execution details –

You may have two questions at this point. Let me list them for you –

**Question** – Did I actually execute the trade without checking for prices? As in I didn’t even look at what price the stocks, I didn’t look at support, resistance, RSI etc. Is it not required?

**Answer **– No, none of that is required. The only thing that matters is where the residual is trading, which is exactly what I looked for.

**Question –** On what basis did I choose to trade 1 lot each? Why can’t I trade 2 lots of TM and 3 lots of TMD?

**Answer** – Well this depends on the beta of the stock. We will use the beta and identify the number of stocks of X &Y to ensure we are **beta neutral** in this position. The beta neutrality states that for every 1 stock of Y, we need to have beta*X stock of X. For example, in the Tata Motors (Y) and Tata Motors DVR (X) for example, the beta is 1.59. This means, for every 1 stock of Tata Motors (Y), I need to have 1.59 stocks of Tata Motors DVR (X).

Going by this proportion, the lot size of Tata Motors (Y) is 1500, so we need 1500*1.59 or 2385 shares of Tata Motors DVR (X). The lot size is 2400, quite close to 2385, hence I decided to go with 1 lot each. But I’m aware this trade is slightly more skewed towards the long side since I’m buying additional 115.

Also, please note, because of this constraint, we cannot really trade pairs if the beta is –ve, at least, not always.

Remember, I initiated this trade when the residual value was -2.54. The idea was to keep the position open and wait for the target (-1 on residual) or stop loss (-3 on residual) was hit. Until then, it was just a waiting game.

To track the position live, I’ve developed a basic excel tracker. Of course, if you are a programmer, you can do much better with these accessories, but given my limited abilities, I put up a basic position tracker in excel. Here is the snapshot, of course, you can download this sheet from the link posted below.

The position tracker has all the basic information about the pair. I’m guessing this is a fairly easy sheet to understand. I’ve designed it in such a way that upon entering the current values of X & Y, the latest Z score is calculated and also the P&L. I’d encourage you to play around this sheet, even better if you can build one yourself ☺

Once the position is taken, all one has to do is track the z-score of the residual. This means you have to keep tracking the values and the respective z-scores. This is exactly what I did. In fact, for the sake of this chapter, my colleague, Faisal, logged all the values (except for the 14^{th} and 15^{th}). Here are the logs –

As you can see, the current values were tracked and the latest z-score was calculated several times a day. The position was open for nearly 7 trading session and this is quite common with pair trading. I’ve experienced positions where they were open for nearly 22 -25 trading sessions. But here is the thing – as long as your math is right, you just have to wait for the target or SL to trigger.

Finally, on 23^{rd} May morning, the z-score dropped to the target level and there was a window of opportunity to close this trade. Here is the snapshot –

Notice, the gains in Tata Motors DVR is much larger than the loss in Tata Motors. In fact, when we take the trade, we will never know which of the two positions will make us the money. The idea, however, is that one of them will move in our favor and the other won’t (or may). It’s however, just not possible to identify which one will be the breadwinner.

The position tracker for the final day (23^{rd} May) looked like this –

The P&L was roughly Rs.14,000/-, not bad I’d say for a relatively low-risk trade.☺

## 13.4 – Final words on Pair Trading

Alright guys, over the last 13 chapter, we have discussed everything I know about pair trading. I personally thing this is a very exciting way of trading rather than blind speculative trading. Although less risky, pair trade has its own share of risk and you need to be aware of the risk. One of the common ways to lose money is when the pair can continue to diverge after you initiate the position, leaving you with a deep loss. Further, the margin requirements are slightly higher since there are two contracts you are dealing with. This also means you need to have some buffer money in your account to accommodate daily M2M.

There could be situations where you will need to take a position in the spot market as well. For example on 23^{rd} May, there was a signal to go short on Allahabad Bank (Y) and long on Union Bank (X). The z-score was 2.64 and the beta for this pair is 0.437.

Going by beta neutrality, for every 1 share of Allahabad Bank (Y), I need 0.437 shares of Union Bank (X). The Lot size of Allahabad Bank is 10,000, this implies I need to buy 4378 shares of Union Bank. However, the lot size of Union Bank is 4000, hence I had to buy 370 shares in the spot market.

Well, I hope I trade is successful ☺

I know most of you would want the pair data sheet made available. We are working on making this sheet available to you on a daily basis so that you can track the pairs. Meanwhile, I would suggest you try and build this algo yourself. If you have concerns, please post it below and I will be happy to assist.

If you don’t know how to program then you have no option but to find someone who knows programming and convince him or her that there is money to be made, this is exactly what I did ☺

Lastly, I would like to leave you with a thought –

- We run a linear regression of Stock A with Stock B to figure out if the two stocks are cointegrated with their residuals being stationary
- What if Stock A with Stock B is not stationary, but instead Stock A is stationary with stock B & C as a combined entity?

Beyond Pair, trading lies something called as multivariate regression. By no stretch of the imagination is this easy to understand, but let me tell you if you can graduate to this arena, the game is different.

Download the Position Tracker and Pair Datasheet below:

### Key takeaways from this chapter

- The trigger to trade a pair comes from the residual’s current value
- Check for beta neutrality of the pair to identify the number of stock required in X and Y
- If the beta of the pair is negative, then it may not be possible to set up the trade
- Once the trade is initiated, check the z-score movement to trade its current position
- The price of the futures does not really matter, the emphasis is only on the z-score

Hey Karthik,

how do we position size the trade? Is it possible for us to know the amount we would lose if the trade goes wrong?

You will have to go by beta neutrality to position size this trade. Have explained this in the chapter.

It is hard to know how much you may lose in advance as the trade is on the z-score.

hey Karthik,

I understood the beta neutrality part of position size. My question is, let me take an example from above. lets say 1 lot of tata motors is equal to 1 lot of tata motors DVR. In this case if I want to trade 2 lots of each of the script then how do I know if it is safe for me to position 2 lots assuming I have funds to take the position. That’s why I wanted to know if their is any way if we could know the amount of loss we would suffer in case the SL hits. Assuming I am willing to risk 5% of my account in one trade. If trading 1 lot of both has a SL at 2.5% then I can probably do 2 lots right? Is there any rough approximation of finding out the loss. I am not worried about winning because we are entering at 2SD level. And from the trade you mentioned above, you have entered the trade of tata and tata dvr on 14th and profit was taken on 23rd. Assume you have not got the profit signal and the trade is open, there was a quaterly result announcement on 24th, what do you recommend us to do? Should we close the trade or keep it open if there is any situation like this?

Thank you.

I get your concern, Nikhil. Unfortunately, there is no way this can translate to a Rupee value. Or maybe I need to do some research, I will certainly get back to you on this. But here is what I have observed, whenever this position makes a loss (assuming its initiated at 2.5SD), it is usually in the range of 10-12K per beta neutral pair. This is purely from my observation and no concrete science backing this.

Thanks Karthik.

What about the 2nd question on result announcement part?

Hello sir

I have a doubt. Are the regression parameters “static” or “dynamic” in your basic excel tracker sheet? As we know these estimators (coefficients, std. error….) have variance. Everytime we regress the securities the values of the estimators get varied. So taking the parameters of the eqn. y=mx+c as constant and calculating Z-Score based on earlier calculated estimators might give some error, i guess.

What is your view, sir?

Thank you very much for this most awaited chapter, sir!

Varsity student

You are right on that. Given this, you may also want to look at the variance or standard error of the intercept as well. However, these trades are open for few days, unless there is a drastic move in the share price, you will not experience big changes in the variables.

When we run linear regression using python statsmodels library we don’t get STD error of residuals in the output. Is there any formula to calculate the STD error using residual values?

You actually don’t need the standard error of residuals, you only need the residuals. The standard deviation of the residuals is the standard error of residuals.

How to calculate z score

You can divide the residual by standard error of residual to get the Z-score.

Sir what is use of Z-score.M not geeting.

Z-Score tells you where the variable is with respect to its mean. For example, a Z-score of 2.5 indicates that the variable is 2.5 standard deviations away from the mean.

Thanks a lot sir

Welcome!

Can this method be used in cash also or in future only because I have less capital also trial purpose can pair with small quantities

Its best if you do it in futures because the strategy requires you to short stocks as well, which is possible only in futures.

Hello sir.. was eagerly waiting for it. A big thanks. ?

Happy learning, Thirumal!

Thanks for educating us sir 🙂 . It would be of great help if you could share/made available the pair data with ‘P’ values on weekly basis.

Thanks for everything again.

I’m trying to figure a way out to share the excel sheet on a daily basis. Hopefully it should be possible 🙂

Hello sir

When i regressed TM(Y) & TMD(X) over one year data set(249 data), it didn’t show any trading opportunity on 11/05/18. While comparing the two regression analysis i noticed that the regression done on 200 set of data generated better results than that of one year data analysis. Also the ADF value was lesser in the former(200 day data) regression.

1) Is 200 day data optimum for data analysis? Can we regress 100 set of data too?

2) The ADF value i got after regressing the same set of data as done here on Varsity is 0.0567 while your value is 0.017. I used both Schwarz Info criterion & Alkaike Info criterion with the lag length of 14 (it was automatic). Which criterion should i use and what should be the optimum leg length to run ADF test? I had asked this question almost 3 weeks ago and you told me to wait.

Thank you, sir 🙂

Varsity student

1) I remember running some sort of optimization years ago and realized 200-day data is a sweet spot, sticking to it since then

2) The lag is usually set to the square root of the number of data points…so a lag of 14 is good (sqr root of 200). Btw, I’m really not sure why there is a difference in ADF values.

Hi karthik,

Had one question. If Lot size of X is , say 800, and Y is 1000.

From Linear Reg. slope comes out to be very low, say 0.001.

Then how do we calculate hedge ratio in this scenario?

This would be tricky as the beta is near zero. YOu cannot really trade this pair.

Karthik,

In your live trade example, the trade was open for almost 7 days. How is it that the institutional traders did not spot this opportunity to jump in and close the divergence more quickly? Am I missing something?

Thanks and regards,

Samir

Samir, as retail traders, we have the advantage of liquidity (or maybe illiquidity). Institutes want to trade few 100 of lots with low slippages… a couple of lots here and there won’t matter to them, but it would make a big difference to an individual.

hii

i am new at treading and not so familiar with ADF test but i have software known as E VIEWS which will test ADF. but i don’t know how to run with it. how to enter data in software if sum one know let me know.

Thnx in advance..

Sir, have you tried pair trading using 2 ATM options? Since historically option premiums data are not available I couldn’t test the effectiveness of it.

No, have not really tried pair trading with options. There would be other dynamics when you trade options.

Hi karthik sir,

Thanks for your lessons on pair trading. I want to know is it possible to do pair trade in commodities as well like GOLD:SILVER, COPPER:NICKEL, LEAD:ALLUMINIUM?

Yes, you certainly can.

Dear karthik,

There is one website, pairtradepro. Com which is giving pair trade signals.

Can you anything sure about that???

I’ve never really tried it, Ashwin. Hence cant comment on it.

I was testing my algo on the trade that you took on 23rd May

x = UnionBank

y = ALBK

beta = 0.437

for that trade I am getting adf_p_value = 0.3637

am I missing something here sir?

also sir for TATAMOTORS–TATAMTRDVR trade what was your look back period sir? because for that also my backtest values are not matching

Look back period is 200 days.

but I am gettings same backtest value for ICICI HDFC price data that was provided in the earlier chapters sir, is it possible for you to share closing price data for TATAMOTORS–TATAMTRDVR, I think that would help me a lot.

Mani, go back 200 trading days from 10th May, that would be my closing price data. Also, this data was obtained from NSE bhav copy.

What lag did you select for the ADF test, Mani? I think the algo we use has a lag of 14.

Look back period is 200 days.

Sir what I found is that there is a slight change in beta = 1.67, because of which intercept = 21.96, residual=-19.21 and sigma=5.82 is also different. Because of all these changes adf is different

Value that I got from my algo is same as excel output, so I guess I can continue.

Is my conclusion good enough sir?

Hmm, maybe Mani. Request you to check this across multiples examples to be doubly sure.

Hello sir

When I run the regression b/w Unionbank & ALBK for past 200 days beginning from 2nd of Aug’17, the trading signal generated on 22nd May. The important parameters are as follows:

Unionbank(X) and ALBK(Y)

BETA: 0.437861306

INTERCEPT: 7.503194585 (19% of ALBK price)

STD_ERR: 2.714924731

Z-SCORE : -2.64

ADF: 0.0601(Lag length 14,Schwarz info criterion)

The trading signal was generated on 22nd May to go long on ALBK and short on Unionbank. Oh my God! It is just opposite of what you traded. I have cross checked my regression analysis. Please help me, sir.

Thanks

Kumar, you are right. I’m guilty of this trade information that I put up. It misread this as 2.64 instead of -2.64, but I think I lucked out because the Z-score expanded over the next few days and I actually was in profit when I realized my mistake.

Btw, great job on this one 🙂

Sir here kumar used price data from 2nd of Aug’17 to 22nd May’18, if so doesn’t it workout to be more than 200 days? I think I am missing something sir?

I guess its ard 200, need to check the data points, Mani.

I get it sir, trading days.

I made a mistake in my code that calculates number of trading days sir. So was using less data points.

Glad you could rectify it.

Thank you, sir 🙂

While updating the pair data sheet, luckily i got the following trading signal on 4th June’18.

BANKBARODA(Y) & INDIANB(X) (200 trading day data)

Beta= 0.272

Intercept= 61.9 (45% of Y)

Z-Score= -2.89

ADF= 0.0311(lag length 14)

Lot size Baroda=4000, Lot size Indianbank=2000

If i take 8000 Baroda then according to the eqn the lot size of Indianbank should be 2176.

What is you view, sir?

Except for the intercept, everything is convincing enough 🙂

Did you take the trade? If not, at least track it on paper and please share the results here. I’m not tracking this.

Hello sir

I didn’t take the trade but i am tracking it on paper & definitely will share it here. Btw, its my first pair trade though it is on paper 🙂

I have a question, sir. When i checked the 100 trading day data for the same pair, the ADF value was 0.5135 and Zscore was -3.58. Is it convincing that we should check this way or should stick with 200 data set?

Thank you 🙂

I remember doing some optimization on look-back period and if I remember right, I figured 200 days is the best look back period. You may want to do this yourself once 🙂

Hi Mayank,

Thanks for constantly sharing your insights on the pairs. I check the z-score and getting -2.09 for 200 days. Not sure where the error is ? Can we connect on at deepucal at gmail dot com.

Thanks

Deepu

I have taken 2 lots (8000) of Baroda and 1 lot (2000) of Indianbank. But for 8000 of Baroda the lot size of Indianbank should be 2176 or for 2000 of Indian bank the Baroda should be 7352.

Thanks

Sure, as long as you ensure its beta neutral.

While optimizing lookback, is net profit only criteria or any other factors need to be considered sir?

You can optimize this across any criteria – frequency of signals, profitability, risk, ADF etc

Hello sir,

can adf p_value be used for any timeseries data to confirm its normal distribution property sir, does it have any limitations?

Yes, it can be applied on any series on which you want to check the stationarity.

if a time series data passes stationarity test, does it mean that it is normally distributed?

I’m not very sure about this, but yes, I’m inclined to believe this is true.

ok sir thanks

Ohh beta “neutrality”, been wrecking my brains to understand what beta natality could be, theek kar do sir. Also, pair trading ki updated sheet de do?

Beta is a measure of volatility between two stocks, so when you beta neutral two stocks, you are essentially trying to minimize the volatility between them.

I meant its been misspelled so I was trying to figure out what beta natality could be…also where can we find the bi-weekly update to the pairs data?

I get it now. Will make the change 🙂

I’ll have the sheet updated t’row. Thanks.

Sir wt is use of sigma.

With due permission from Karthik sir, I will try to explain sigma,

Sigma is the standard deviation of residual, which we get while we hedge Y in terms of X.

Hope I am right!!!

That’s perfect, Mani!

Thanks Mani..

Sigma tells you the standard deviation of the residual.

Thanks a lot sir

Welcome!

Great chapter, really enjoyed it. Also are you going to be giving the pair data sheet?

Will try and upload the sheet today.

Hi Sir,

Could you please provide the data excel for a single pair from the ‘Pair-Data.csv’ . For example please provide the data excel for Hero.MotoCorp.Ltd as YStock and Bajaj.Auto.Ltd as XStock. So that the values (intercept, beta, adf_test_P.val etc) generated at my end

can be verified against the Pair-Data.csv

That would be difficult, Manoj. I’m trying to set up a daily source of pair data to be made available on the site. This may take some time.

Hi Sir,

Ok, understood. In that case could you please run the adf test (using your plugin) against the ‘Chapter-10_Residuals.xlsx’ excel file from chapter 10 and provide the intercept, beta, adf_test_P.val etc. So that I can check the values generated at my side against the values generated using your plugin. I need to verify this only once to make sure that the logic at my side is correct.

Sure, Manoj. I will try and do that over the weekend.

Hello Sir,

I am confused as to, for tracking position after executing the trade, we will have to calculate the std_err by using the current price based sigma , beta and intercept (by doing regression by replacing the old with new data for past 200 look back period) or the initial values which we used to execute the trade?

Because, the beta, sigma and the intercept values will change the next day if we add the recent data in the 200 look back period and so will the std_err.

In the position tracker excel which you provided the beta, intercept and sigma are kept same for 14th and 23rd of May.

Please, help me clear this doubt.

Thank you, in advance.

Once the position is taken, the new data set will exclude the oldest data and include the latest data point. The position tracker considers the pair data at the time of trade trigger.

Thank you so much for very easy understanding language in explanation. I went through all the content here on varsity, also i went through a course for stock trading with one institute. Anyways my little confusion is when i was in Options chapter i did paper trading and made few conclusion on my trading way. Now after this chapter im in a small dilemma whether to trade options or the pair trading. First i use to trade equity, after learning options i felt equity day trading is more on luck than rational trading. Pair trading and the calculation are more rational now. Please help me understanding which is more right way to trade and more rational between options and pair trading. Happy i found Varsity,thank you sir.

Kiran, both are very different because the instruments and their risk-reward profile is very different. I’d suggest you pick up both techniques and eventually decide which one suits you better. Good luck.

`Dear All I wish to know in 8k option strategy how do choose??

sir unable to download 200 days closing price data from NSE’s bhavcopy. it is providing day wise data for all securities. kindly guide is there any link to download nse bhavcopy 2oo days for particulaar stock together

I’d suggest you download the eq data, Vikas.

Sir can u post the exact link?? Even I am unable to find the 200 day report of stocks

We have not updated this, Karthik.

Hello Sir, thank you for strategy. Can i do this in equity than futures? the only reason being capital.

You cannot really short in Equity, so you cant.

Hello Karthik,

Thank you for the wonderful explanation. In some cases, I have observed that the std. error went down to -4 as a result of certain events, case in point being YESBANK from Sept 21 to 28, was a drastic drop.

Do you recommend going long when I see the error being much lower or higher than -2.5/2.5 trigger or do you think i should avoid it due to volatility. ( the error has gone back to around -2 now so if i took the trade I would have profit). On the other hand, If i took the trade at -2.5 I would have lost money going down to -4. Any idea if there is any other parameter to look at to avoid this?

These are fundamentally driven events, Sudharsan. The best case for pair trading is when all else are equal. Given this, I’d not initiate a trade even if I saw a 4SD in the backdrop on fundamental events.

Hi Karthik,

Thanks for this wonderful module, got to learn another very useful trading system. Towards the end of this chapter, you mentioned something about multivariate regression, I really wanted to know more about this. I have good or decent enough background in linear algebra, regression analysis, and programming, hence, I really want to dig deeper into this subject. Could you please provide some pointers to get me started on this? Thanks for all your help!

Glad you really liked the module 🙂

Unfortunately, there is not much content on Multivariate and I’m really not sure if we should put up anything, it could be a little hard for the readers here.

“What if Stock A with Stock B is not stationary, but instead Stock A is stationary with stock B & C as a combined entity?”

Does the statement mean that we have two independent variables and one dependent variable and if we regress stock A(dependent) against stock B and stock C combined we can generate one column of residuals and hence by that we can check for cointegration by the equation Y=mX + nZ + constant…where Y is stock A ,X is B and Z is C, m and n are betas for B and C and the constant may be considered as equivalent to intercept???

Exactly! This is how multivariate regression for trading stock pairs works.

HI Karthik,

Can you help me with link on NSE site that gives sectorial categorization of stocks and their EOD data ?

I need to look for this myself, Ravi. Will share the link if I find.

Thanks Karthik, I thought it is available handy as you mentioned in the point below,. i tried to look it up but couldn’t find. we can form this ourselves but its already done by NSE and readily available, one less thing to automate 🙂

“2.The list of stock and its sector classification is already done. Hence the download is more organized”

That was the plan, Ravi, but I had few operational issues to deal with. Will check on what best can be done 🙂

Hi Ravi, any progress made on the sector wise classification of F&O stocks ?

Hi Karthik,

First of all loved the way you explained this complex subject.

I’m bit confused while selecting the lot sizes of X and Y. If Y= m*X (m is slope/beta) then for 1 lot of X is equivalent to m lots of Y (substitute X=1 in equation Y=mX ). But you have taken other way round. Am I missing something when you say ” beta neutral ” ?

Thanks and cheers. 🙂

Govil, Y = Beta * X, restating this

X = Y divided by Beta

Not sure if I’m missing your point.

Thanks for the reply Karthik,

You are right X = Y divided by m. So for one lot of X you need m lots of Y (substitute X=1 in Y= mX). BUT in 13.3 LIVE EXAMPLE of Tata motors and Tata motors DVR, this is what written (copying directly)

“The beta neutrality states that for every 1 stock of Y, we need to have beta*X stock of X”. This is applicable to equation X=mY by substituting Y=1lot, you get X=m lots. As per my understanding, its not Y=mX equation. Please correct me if I’m wrong. Thanks.

Govil, let me go through this again.

Hello,

how to convert MIS to CNC before squareoff.?

thanks in advance

Check this – https://support.zerodha.com/category/trading-and-markets/margin-leverage-and-product-and-order-types/articles/how-to-convert-mis-to-cnc-nrml-and-vice-versa

Hello Karthik,

Thank you for interesting series. I have 2 questions:

1: Given that X and Y are stock prices – shouldnt the beta neutrality be applied on nominal exposure so if tata motor (Y) – Tata Motor DVR (X) has a BETA of 1.59 shouldnt it mean that exposure on tata motor should be equal to 1.59 times the exposure on Tata Motor DVR? so if nominal value of one lot of tata motor is (1500*331.65) Rs 497,475 the exposure i need to take in DVR is 1.59*497,475 = 790,985 which translates into 4064 shares of DVR @ 194.65. please clarify?

2: In computing the Z score shouldnt we see how far is the current datapoint from mean rather than absolute value. so Z score should be (current value of residual – mean of last 200 onservations)/ standard deviation of 200 observations of residual. Please clarify?

Many thanks,

Vipin

Given that X and Y are stock prices – shouldnt the beta neutrality be applied on nominal exposure so if tata motor (Y) – Tata Motor DVR (X) has a BETA of 1.59 shouldnt it mean that exposure on tata motor should be equal to 1.59 times the exposure on Tata Motor DVR? ——-> This is correct.

so if nominal value of one lot of tata motor is (1500*331.65) Rs 497,475 the exposure i need to take in DVR is 1.59*497,475 = 790,985 which translates into 4064 shares of DVR @ 194.65. please clarify? ————-> Beta adjustment should happen on the stock price, not the contract value. Remember, price is already factored in when computing the Beta.

In computing the Z score shouldnt we see how far is the current datapoint from mean rather than absolute value. so Z score should be (current value of residual – mean of last 200 onservations)/ standard deviation of 200 observations of residual. Please clarify? —————> This is how Z score is calcualted. Not sure if I’m missing the point here.

Dear Karthik,

I have 2 queries.

1. Initially it was told that data available (lock back period) shall be atleast 1 year, it is better if it is 2 years but at the end of module, it is told that 200 days data is best? which statement is correct then?

2. If I see today that two stocks are co-integrated and their residuals are stationary then does it mean that they will remain co-integrated (along with stationary residual) in future too? or we need to check co-integration and stationarity at frequent intervals?

1) 200 data points is roughly a year, Arun. So stick to at least a year.

2) Not necessary, the stationarity may break based on price movement (drastic price moves).

Regarding Point No. 2, does it mean that I need to run adf test DAILY till I find p-value<0.05?

Yes sir, its better that way.

Dear Karthik,

Some of the stocks don’t have Futures but only Spot. for ex. Andhra Bank, Bandhan Bank etc. hence they don’t have lot sizes.

Do we need to avoid such stocks or there is a way of pair trading in them?

Hmm, as long as you only need to go long on this and short the other stock’s future, it should be ok.

Dear Karthik,

My apologies, I didn’t get it. can u plz elaborate, probably with example?

Meaning, the X and Y ordering should be such that the non-derivatives stock always needs to be bought and the derivative on needs to be shorted, if this is taken care off then you can look at the combination of non-derivative stock + derivative stock.

Sir, how do we do this on intraday basis?

Should we use last 200 EOD prices and track z score at suitable intraday frequency to look for opportunities or should we use last 200 candles of corresponding intraday frequency?

I’d suggest last 200 days EOD candles, Kushal.

Hi Whats your take on USDINR & EURINR PAIR Trading. Have you analysed or tracked the pair.

I’ve not really looked into this.

Dear Nithin , I want your advice on technical course available online ,which as per your recommendation is best suited for professional who want to understand the technical analysis from beginning to advance level .

Dharmendra, frankly whatever you need to know is already available online on Varsity. Why do you want to spend money?

Hi when I downloaded that excel file I can just see the variables confined to that particular date and time. What can I do to run the excel file for any given date and time?

That sheet is just to demo the trade, Kaushik. It does not run on its own.

Can u tell me how u calculated adf value and error ratio for all pairs of stock in excel?

Kaushik, its explained in the chapter itself.

I recently read this thesis paper by Hakon Andersen & Hakon Tronvoll in which they did pair trading using PCA(Principal component analysis) and Density Based Clustering. Which one is better? Linear regression method or PCA method. Can u add and explain this PCA method in zerodha varsity website?

I’ve never used PCA method, Kaushik. Rather, I’m not sure about the technique to do so. But if I were to guess, PCA would be nice, since PCA emphasis is only on factors which explain the maximum variance, ignoring the other factors.

Dear Karthik

At the end of the chapter you talked about trading using multi variate regression. Could you please point to a source where I can read this?

Thank you so much.

Unfortunately, I’m unable to find a good/reliable source for this, Prafulla.

Thanks

Welcome!

Hi Karthik

Do we not have to run Correlations between all the stocks of a sector, say all stocks in nifty bank, BEFORE doing the regression analysis?

We did that in the first case.

Thank you.

Should we not run regression on only those stocks that are statistically significantly correlated >.75?

Should we run regression on all stocks without looking at their correlation?

Thanks

Is the stationarity of the residuals a necessary and sufficient condition of a statistically significant correlation?

Would running correlations apriori not reduce the number of regressions needed to be run?

Would running correlations apriori exclude some trading opportunities that would otherwise have been spotted by NOT doing a correlation analysis and only looking at the regression and cointegration of the pair?

Thanks

I’d suggest you to not look at correlations if you are using this approach to pair trade.

You can, this really depends on the program that you have developed.

Thanks

Hello sir, small correction in log of 18th may, 3.30 PM values of X and Y are entered opposite. I was going through and it showed me profit of 518350. For a second i had a big smile,haha!

Ah, need to check this, Kiran. Thanks for pointing this out.

Sir pair data sheet which is available to download is it updated or we have to download 200 day values, regress it and do it ourselves? Thank you.

Its not updated, Kiran. We stopped doing it long ago.

Okay. Will you suggest any vendors or sites where we can subscribe and get latest pair data updates? Thank You.

I’m not too sure of vendors who provide this data, Kiran.

Sir, I used some online information for calculating ADF wherein the regression is run between Delta of residuals and t-1 residual. Is this right way? Because as you have stopped updating the pair data, the most calculations must have changed. When I did Eicher (Y) and Bajaj Auto (X) I got following

Intercept = 12263

Std error = 3274.478

Slope = 4.26

ADF = 19.14%

So I have two questions to ask:

1. Should I change Bajaj Auto as Y and Eicher as X. What will be risk. (I guess as you mentioned in the 1st few chapters that you may get 2 or 3 trades in one pair a year, changing X to Y and vice versa may not help. Also it means one must wait for the ADF to be favorable over time.)

Or

2. Where can I get ADF plugin for Excel. (I have no idea how to use python or do the same in R even if there is plugin available)

Regards,

Narendra

Delta of residuals and t-1 residual. Is this the right way? — I dont think so, Narendra, but I cant point a finger and say why. Will get back on this.

1) The decision of which one is X and Y is really depended on how strongly one can explain the other (in terms of daily variation). Go ahead and do it and check what results you get

2) This even I’m not sure 🙂

please see the weblink. I used this method to find ADF. For your ref.

https://www.quantinsti.com/blog/augmented-dickey-fuller-adf-test-for-a-pairs-trading-strategy

The second highlight is 20.914, which is the residual.

The std_err in the report is simply a ratio of –

Today’s residual / Standard Error of the residual

= 20.92404/22.776

= 0.91822

sir, I got confused with this calculation as in the paragraph u have written that u want to find the position of the current residual in the distribution of residuals if I am not correct u I trying to calculate the zscore of the residual I am may be wrong but for zscore (data point -mean of the distribution /std of the distribution but above u directly divided the residual with std is something else u are trying to calculate pls clear

Sahil, yes…the idea is to figure out the position of residual wrt to its average. In other words, the Zscore of the residual.

sir , as u have been mentioning in all the chapters in order which pair should be y and which should be x we look and error ratio wich std error of intercept /std error of the residuals but sir I couldn’t get the intuition behind it like simple intuition for r squared is tries tell how much y explained by x could be wrong this pls correct. So how is error ratio helping

Sahil, the idea is to pick a pair (by identifying x and y), in such a way that one variable explains the maximum of the other.

Sir, I know u must be busy but it would be great if also do a chapter on PCA and if u can point me to some resource which u came across which explains PCA without heavy maths pls share

Sahil, here is a chapter on PCR – https://zerodha.com/varsity/chapter/max-pain-pcr-ratio/

sir, I was asking the reference to principal component analysis pls share

We dont have anything on the Principal component analysis yet. Will try and see what best we can do about this.

The second highlight is 20.914, which is the residual.

The std_err in the report is simply a ratio of –

Today’s residual / Standard Error of the residual

= 20.92404/22.776

= 0.91822

sir, I got confused with this calculation as in the paragraph u have written that u want to find the position of the current residual in the distribution of residuals if I am not correct u I trying to calculate the zscore of the residual I am may be wrong but for zscore (data point -mean of the distribution /std of the distribution but above u directly divided the residual with std is something else u are trying to calculate pls clear

pls clear do we subtract the mean from residual value or not to get the z score

Yes, thats right Shashank. The idea is to figure the z-score of the residual. I guess someone had a similar query, have answered the same. Can you please run through the comments?

Can we use error ratio concept to decide which one is stock A and stock B 1st method of pair trade you explained? If not so how can we decide stock A and Stock B to calculate the ratio StockA/Stock?

Hmm, but since this largely based on correlation technique, it does not really matter which is Stock A and B.

SInce I am not a coder I tried for some free sources to calculate adf test and found an excel plugin called as Real stats(

SInce I am not a coder I tried for some free sources to calculate ADF test and found an excel plugin called as Real stats (http://www.real-statistics.com/) If you spare some time to look into it and write an article how to use the plugin it would be helpful

Thanks, I will try and do that as soon as time permits. Thanks.

i taken data 10 minutes time frame for 60 days in excel i got nearly 1640 cells .

correlation bpcl vs hpcl 0.72

but, density curve reaches to 0.000234 below or 0.99986 above near to 1

my question was what is the differences between collecting data for 2 years and 10 minutes time frame for 60 days ?

which one should i need follow ? sir

I’d suggest you stick to daily EOD data, simply because the noise component in your intraday data is quite high.

In chapter 15 you share a Exls how to calculate Intrinsic Value in excel

Share Price (INR) = (F23-F26)*10^7/F29

Share Price (INR) = (Total PV of cash flow-Net Debt)*10^7/Number of Shares

in formula 10 is the 10% then i can’t able understand the 7 which u used

please clarify my doubt on 10 (if not 10%) and 7

Alekha, are you sure you are referring to a chapter in this module? I’m unable to get the context here.

Karthik,

First of all thanks alot for this knowledge bite series on trading analysis and specially on pair trading. Love this piece, so simple to understand.

Request you to kindly tell from where we can download 200 day (historical) data in one go for all equities? On NSE’s Bhavcopy, it is daily data, which needs to be compiled for 200 days. Please help where to get this data. Thanks

Hiren, I’m glad you liked the content. Bhavcopy is the best source, but you will have to figure a way to build a script which will compile and download in 1 go.

Sir,

I run pair trading algo myself and found some pair like…

x = RELIANCE

y = IOC

beta =-0.1058

intercept = 280.036

closing price of IOC on 30 Aug = 122.45

Now as closing price of IOC for 30 Aug is 122.45, so as you pointed out in very last chapter of Pair-trade that, intercept is value by which model can’t estimate value of y-stock (IOC in this case). So,if we apply this logic here then,it implies that,Regression model can’t predicate IOC(y-stock) price 280.036 out of 122.45(or around -128%). I know this is very tricky.So, can you please clarify this situation as I am getting same thing with all pair which have negative beta.

Thanks

Kevin, -ve beta is a tricky situation and from experience, the model break when we face such a situation. For this reason, I avoid trading such pairs.

Karthik,

Since I have no programming experience, request you to kindly share last 300 days data for all equity based securities, will add the daily data myself going forward. Or if this is not feasible then please share some script with instructions. Thanks in advance!

You can always download the same from the NSE’s bhav copy or you could subscribe to data services from a data vendor.

Hi Karthik,

With respect to negative Beta.

UBL (Y) = -0.49 (Mcdowell)

may be it works out to be 4 lots of UBL to 1350 shares in mcdowel.

I know we avoid trading negative beta. what are your comments on this position sizing?

Thanks

Sunil, I’d suggest you ignore pairs with -ve beta. It is very hard to get the positions right in such cases.

Hi karthik,

One last question with respect to today’s residual/sigma.

Then excel data imported contains close prices in chronological order ie oldest close in first row and latest in last row below.

After running the linear regression which residual do I use. The example shown by you inputs observation 1 residual as numerator.

In my calculation should I use the last row observation 200. kindly correct me if am doing something wrong here.

Yes, I’d suggest you use the last row.

Hi Karthik,

I wholeheartly thankful to you and team Zerodha for such a wonderful educating efforts.

my query is,

When i initiated trade, Z-Score for my pair is 2.56 and ADF test p-value <5 % [all other parameters are within as per your explaination]

but after 2 days Z-Score is 2.93 but ADF test p-value is 5.5 % [crossing critical value of 5 %]

So is it worth to hold the trade till Z-Score to be (3 for SL or 1 for Target) or to exit the trade as ADF test p-value is not favorable now?

Depends on you risk appetite, Sorabh. I tend to hold till 3SD, but there are times when I’ve gone ahead with gut feel and cut/booked earlier to 3SD and that has proved to be the right call.

Hello Sir

I tried using your pair trading method by writing an algorithm in Python. The algorithm followed the following steps :

1. Extract Stock prices for all Stocks (for which Futures are available) for past 200 trading sessions

2. Run Linear Regression on all possible combinations of pairs, identify the independent variable & apply ADF test

3. Identify all pairs having Z-Score either greater than 2.5 or less than -2.5

4. Track daily the Z-Score basis the regression outputs

But more than 50% of times, the prices were diverging i.e. the Z-Score crossed 3 or -3.

I also backtested the approach and calculated next 15 trading session Z-Score for past 200 Trading session prices but found no visible pattern for high success rate.

Am I doing something wrong or this trading system doesn’t actually work ?

Aseem, these are the broad steps. You will have to start calibrating this for results. For example, try -1.5 and +1.5 with 2 as SL or something like that. This is true with all strategies, you will have to calibrate the parameters and find your edge 🙂

Thanks for the quick reply.

But using different Z-Scores wouldn’t invalidate the hypothesis we were using ?

We were taking Z-Score bracket of 2.5 & -2.5 since it is highly unlikely to reach that level & it would come back to Z-Score of 0 with high probability, making it profitable.

I created a simulation to track the Z-Scores of eligible Pairs and found no discernible pattern which has high probability of profitable trade. The Z-Scores were diverging or converging randomly.

I can share the simulated data with you, if you want.

You are right on the Z score, but we are trying to establish the possible pattern for stocks. Some may just trade within -1.5 to +1.5, who knows.

Btw, when markets are trending, almost all stocks move in the same direction, due to which you will get high r2 score and low p-value. One should be cautious about it. Instead of number of data points, I’d suggest a window where the stock pair trend had deviated but not less than 100 points to avoid sampling error. Ideally, you should perform adf test for all lags, however, multiple experiments have shown that performing up to a lag of cube root of 3 is good enough. This is more a thumb rule followed by traders rather than a mathematical proof.

Trending Markets do make it tougher to recognise actual trading opportunities. Since my complete model depends on historical data of past 200 trading sessions, then my output would become wrong in case of wrong selection.

I am actually new to trading, can you please elaborate on your suggestions.

1. You said about taking data of stock pair deviating at least 100 points. Is this mean to take you want the actual difference be more than 100 and consider only those data points for my Regression model instead of considering continuous daily data.

2. You talked about using different lags in adf test. Currently I am using ad fuller algorithm in python from stats model library. By default it uses some lag value based on a formula or we can explicitly input it. While researching I found using AIC is best option for lag. You said to use up to cube root of 3, does that mean use multiple lags till that number ? Please clarify

hai sir gold vs silver correlation 0.78 but adf test is 0.832 and z score was 2.70 is it trade worthy sir.

Probably 🙂

Brilliant stuff Karthik! I read multiple places about stat arb but yours is the best explanation.

I followed your steps but I’m getting some weird beta values. In addition to negative ones, which I’m ignoring, I also get combinations that are highly impossible. E.g. 1 TataMotors to 0.2 AshokLey i.e. 5 TaMo to 1 AL. Are such values normal or do you think there is something wrong with my process? Thanks in advance

What is the time frame you are looking at? This maybe possible if there are crazy movements in stocks. I’d suggest you skim through the data set once. Also do a hygiene check on the data, ensure its clean for all sorts of corporate action.

Based on your next chapter, it appears like such ratios are common. However, will appreciate if you can confirm based on your experience.

Yup, it is.