13.1 – Tracking the pair data

We have finally reached a point where we are through with all the background theory knowledge required for Pair Trading. I know most of you have been waiting for this moment ☺

In this last and final chapter of pair trading, we will take up an example of a live trade and discuss factors that influence the trade.

Here is a quick recap of pre-trade theory –

  1. Basic overview of linear regression and how to perform one
  2. Linear regression requires you to regress an independent variable X against a dependent variable Y
  3. The output of linear regression includes the intercept, slope, residuals, standard error, and the standard error of the intercept
  4. The decision to classify a stock as dependent (Y) and independent (X) depends the error ratio
  5. Error ratio is defined as the ratio of standard error of intercept/standard error
  6. We calculate the error ratio by interchanging both X and Y. The combination which offers the lowest error ratio will define which stock is assigned X and which on as Y
  7. The residuals obtained from the regression should be stationary. If they are stationary, then we can conclude that the two stocks are co-integrated
  8. If the stocks are cointegrated, then they move together
  9. Stationarity of a series can be evaluated by running an ADF test
  10. The ADF value of an ideal pair should be less than 0.05

Over the last few chapters, we have discussed each point in great details. These points help us understand which pairs are worth considering for pair trading. In a nutshell, we take any two stocks (from the same sector), run a linear regression on it, check the error ratio and identify which stock is X and which is Y. We now run an ADF test on the residual of the pair. A pair is considered worth tracking (and trading) only if the ADF is 0.05 or lower. If the pair qualifies this, we then track the residuals on a daily basis and try to spot trading opportunities.

A pair trade opportunity arises when –

  1. The residuals hit -2 standard deviations (-2SD). This is a long signal on the pair, so we buy Y and sell X
  2. The residuals hit +2 standard deviation (+2SD). This is a short signal on the pair, so we sell Y and buy X

Having said so, I generally prefer to initiate the trade when the residuals hit 2.5 SD or thereabouts. Once the trade is initiated, the stop loss is -3 SD for long trades and +3SD for short trades and the target is -1 SD and +1 SD for long and short trades respectively. This also means, once you initiate a pair trade, you will have to track the residual value to know where it lies and plan your trades. Of course, we will discuss more on this later in this chapter.

13.2 – Note for the programmers

In chapter 11, I introduced the ‘Pair Data’ sheet. This sheet is an output of the Pair Trading Algo. The pair trading algo basically does the following –

  1. Downloads the last 200-day closing prices of the underlying. You can do this from NSE’s bhavcopy, in fact, automate the same by running a script.
  2. The list of stock and its sector classification is already done. Hence the download is more organized
  3. Runs a series of regressions and calculates the ‘error ratio’ for each regression. For example, if we are talking about RBL Bank and Kotak Bank, then the regression module would regress RBL (X) and Kotak (Y) and Kotak (X) and RBL (Y). The combination which has the lowest error ratio is considered and the other combination is ignored
  4. The adf test is applied on the residuals, for the combination which has the lowest error ratio.
  5. A report (pair data) is generated with all the viable X-Y combination and its respective intercepts, beta, adf value, standard error, and sigma are noted. I know we have not discussed sigma yet, I will shortly.

If you are a programmer, I would suggest you use this as a guideline to develop your own pair trading algo.

Anyway, in chapter 11, I had briefly explained how to read the data from the Pair data, but I guess it’s time to dig into the details of this output sheet.  Here is the snapshot of the Pair data excel sheet –

Look at the highlighted data. The Y stock is Bajaj Auto and X stock is TVS. Now because this combination is present in the report, it implies – Bajaj as Y and TVS as X has a lower standard error ratio, which further implies that Bajaj as X and TVS as Y is not a viable pair owing to higher error ratio, hence you will not find this combination (Bajaj as X and TVS as Y) in this report.

Along with identifying which one is X and Y, the report also gives you the following information –

  1. Intercept – 1172.72
  2. Beta – 2.804
  3. ADF value – 0.012
  4. Std_err – -0.77
  5. Sigma – 103.94

I’m assuming (and hopeful) you are aware of the first three variables i.e intercept, Beta, and ADF value so I won’t get into explaining this all over again. I’d like to quickly talk about the last two variables.

Standard Error (or Std_err) as mentioned in the report is essentially a ratio of Today’s residual over the standard error of the residual. Please note, this can get a little confusing here because there are two standard error’s we are talking about. The 2nd standard error is the standard error of the residual, which is reported in the regression output. Let me explain this with an example.

Have a look at the snapshot below –

This is the regression output summary of Yes Bank versus South Indian Bank. I’ve highlighted standard error (22.776). This is the standard error of the residuals. Do recollect, we have discussed this earlier in this module.

The second highlight is 20.914, which is the residual.

The std_err in the report is simply a ratio of –

Today’s residual / Standard Error of the residual

= 20.92404/22.776

= 0.91822

Yes, I agree calling this number std_err is not the best choice, but please bear with it for now ☺

This number gives me information of how today’s residual is position in the context of the standard distribution. This is the number which is the key trigger for the trade. A long position is hit if this number is -2.5 or higher with -3.0 as stop loss. A short position is initiated if this number reads +2.5 or higher with a stop loss at +3.0. In case of long, target is at -1 or lower and in case of short, the target is +1 or lower.

This also means, the std_err number has to be calculated on a daily basis and tracked to identify trading opportunities. More on this in a bit.

The sigma value in the pair data report is simply the standard error of the residual, which in the above case is 22.776.

So now if you read through the pair data sheet, you should be able to understand the details completely.

Alright, let us jump to the trade now ☺

13.3 – Live example

I have been running the pair trading algo to look for opportunities, and I found one on 10th May 2018. Here is the snapshot of the pair data, you can download the same towards the end of this chapter. Do recollect, this pair trading algo was generated using the closing prices of 10th May.

Look at the data highlighted in red. This is Tata Motors Ltd as Y (dependent) and Tata Motors DVR as X (independent).

The ADF value reads, 0.0179 (less than the threshold of 0.05), and I think this is an excellent adf value. Do recollect, ADF value of less than 0.05 indicates that the residual is stationary, which is exactly what we are looking for.

The std_err reads -2.54, which means the residuals is close has diverged (sufficiently enough) away from the mean and therefore one can look at setting up a long trade. Since this is a long trade, one is required to buy the dependent stock (Tata Motors) and short the independent stock (Tata Motors DVR). This trade was supposed to be taken on 11th May Morning (Friday), but for some reason, I was unable to place the trade. However, I did take the trade on 14th May (Monday) morning at a slightly bad rate, nevertheless, the intention was to showcase the trade and not really chase the P&L.

Here are the trade execution details –

You may have two questions at this point. Let me list them for you –

Question – Did I actually execute the trade without checking for prices? As in I didn’t even look at what price the stocks, I didn’t look at support, resistance, RSI etc. Is it not required?

Answer – No, none of that is required. The only thing that matters is where the residual is trading, which is exactly what I looked for.

Question – On what basis did I choose to trade 1 lot each? Why can’t I trade 2 lots of TM and 3 lots of TMD?

Answer – Well this depends on the beta of the stock. We will use the beta and identify the number of stocks of X &Y to ensure we are beta neutral in this position.  The beta neutrality states that for every 1 stock of Y, we need to have beta*X stock of X. For example, in the Tata Motors (Y) and Tata Motors DVR (X) for example, the beta is 1.59. This means, for every 1 stock of  Tata Motors (Y), I need to have 1.59 stocks of Tata Motors DVR (X).

Going by this proportion, the lot size of Tata Motors (Y) is 1500, so we need 1500*1.59 or 2385 shares of Tata Motors DVR (X). The lot size is 2400, quite close to 2385, hence I decided to go with 1 lot each. But I’m aware this trade is slightly more skewed towards the long side since I’m buying additional 115.

Also, please note, because of this constraint, we cannot really trade pairs if the beta is –ve, at least, not always.

Remember, I initiated this trade when the residual value was -2.54. The idea was to keep the position open and wait for the target (-1 on residual) or stop loss (-3 on residual) was hit. Until then, it was just a waiting game.

To track the position live, I’ve developed a basic excel tracker. Of course, if you are a programmer, you can do much better with these accessories, but given my limited abilities, I put up a basic position tracker in excel. Here is the snapshot, of course, you can download this sheet from the link posted below.

The position tracker has all the basic information about the pair. I’m guessing this is a fairly easy sheet to understand. I’ve designed it in such a way that upon entering the current values of X & Y, the latest Z score is calculated and also the P&L. I’d encourage you to play around this sheet, even better if you can build one yourself ☺

Once the position is taken, all one has to do is track the z-score of the residual. This means you have to keep tracking the values and the respective z-scores. This is exactly what I did. In fact, for the sake of this chapter, my colleague, Faisal, logged all the values (except for the 14th and 15th). Here are the logs –

As you can see, the current values were tracked and the latest z-score was calculated several times a day. The position was open for nearly 7 trading session and this is quite common with pair trading. I’ve experienced positions where they were open for nearly 22 -25 trading sessions. But here is the thing – as long as your math is right, you just have to wait for the target or SL to trigger.

Finally, on 23rd May morning, the z-score dropped to the target level and there was a window of opportunity to close this trade. Here is the snapshot –

Notice, the gains in Tata Motors DVR is much larger than the loss in Tata Motors. In fact, when we take the trade, we will never know which of the two positions will make us the money. The idea, however, is that one of them will move in our favor and the other won’t (or may). It’s however, just not possible to identify which one will be the breadwinner.

The position tracker for the final day (23rd May) looked like this –

The P&L was roughly Rs.14,000/-, not bad I’d say for a relatively low-risk trade.☺

13.4 – Final words on Pair Trading

Alright guys, over the last 13 chapter, we have discussed everything I know about pair trading. I personally thing this is a very exciting way of trading rather than blind speculative trading. Although less risky, pair trade has its own share of risk and you need to be aware of the risk. One of the common ways to lose money is when the pair can continue to diverge after you initiate the position, leaving you with a deep loss. Further, the margin requirements are slightly higher since there are two contracts you are dealing with. This also means you need to have some buffer money in your account to accommodate daily M2M.

There could be situations where you will need to take a position in the spot market as well. For example on 23rd May, there was a signal to go short on Allahabad Bank (Y) and long on Union Bank (X). The z-score was 2.64 and the beta for this pair is 0.437.

Going by beta neutrality, for every 1 share of Allahabad Bank (Y), I need 0.437 shares of Union Bank (X). The Lot size of Allahabad Bank is 10,000, this implies I need to buy 4378 shares of Union Bank. However, the lot size of Union Bank is 4000, hence I had to buy 370 shares in the spot market.

Well, I hope I trade is successful ☺

I know most of you would want the pair data sheet made available. We are working on making this sheet available to you on a daily basis so that you can track the pairs. Meanwhile, I would suggest you try and build this algo yourself. If you have concerns, please post it below and I will be happy to assist.

If you don’t know how to program then you have no option but to find someone who knows programming and convince him or her that there is money to be made, this is exactly what I did ☺

Lastly, I would like to leave you with a thought –

  1. We run a linear regression of Stock A with Stock B to figure out if the two stocks are cointegrated with their residuals being stationary
  2. What if Stock A with Stock B is not stationary, but instead Stock A is stationary with stock B & C as a combined entity?

Beyond Pair, trading lies something called as multivariate regression. By no stretch of the imagination is this easy to understand, but let me tell you if you can graduate to this arena, the game is different.

Download the Position Tracker and Pair Datasheet below:

Download Position Tracker

Download Pair Datasheet

Key takeaways from this chapter

  1. The trigger to trade a pair comes from the residual’s current value
  2. Check for beta neutrality of the pair to identify the number of stock required in X and Y
  3. If the beta of the pair is negative, then it may not be possible to set up the trade
  4. Once the trade is initiated, check the z-score movement to trade its current position
  5. The price of the futures does not really matter, the emphasis is only on the z-score


  1. Nikil says:

    Hey Karthik,
    how do we position size the trade? Is it possible for us to know the amount we would lose if the trade goes wrong?

    • Karthik Rangappa says:

      You will have to go by beta neutrality to position size this trade. Have explained this in the chapter.

      It is hard to know how much you may lose in advance as the trade is on the z-score.

      • Nikil says:

        hey Karthik,
        I understood the beta neutrality part of position size. My question is, let me take an example from above. lets say 1 lot of tata motors is equal to 1 lot of tata motors DVR. In this case if I want to trade 2 lots of each of the script then how do I know if it is safe for me to position 2 lots assuming I have funds to take the position. That’s why I wanted to know if their is any way if we could know the amount of loss we would suffer in case the SL hits. Assuming I am willing to risk 5% of my account in one trade. If trading 1 lot of both has a SL at 2.5% then I can probably do 2 lots right? Is there any rough approximation of finding out the loss. I am not worried about winning because we are entering at 2SD level. And from the trade you mentioned above, you have entered the trade of tata and tata dvr on 14th and profit was taken on 23rd. Assume you have not got the profit signal and the trade is open, there was a quaterly result announcement on 24th, what do you recommend us to do? Should we close the trade or keep it open if there is any situation like this?
        Thank you.

        • Karthik Rangappa says:

          I get your concern, Nikhil. Unfortunately, there is no way this can translate to a Rupee value. Or maybe I need to do some research, I will certainly get back to you on this. But here is what I have observed, whenever this position makes a loss (assuming its initiated at 2.5SD), it is usually in the range of 10-12K per beta neutral pair. This is purely from my observation and no concrete science backing this.

  2. KUMAR MAYANK says:

    Hello sir
    I have a doubt. Are the regression parameters “static” or “dynamic” in your basic excel tracker sheet? As we know these estimators (coefficients, std. error….) have variance. Everytime we regress the securities the values of the estimators get varied. So taking the parameters of the eqn. y=mx+c as constant and calculating Z-Score based on earlier calculated estimators might give some error, i guess.
    What is your view, sir?
    Thank you very much for this most awaited chapter, sir!
    Varsity student

    • Karthik Rangappa says:

      You are right on that. Given this, you may also want to look at the variance or standard error of the intercept as well. However, these trades are open for few days, unless there is a drastic move in the share price, you will not experience big changes in the variables.

  3. Sumil says:

    When we run linear regression using python statsmodels library we don’t get STD error of residuals in the output. Is there any formula to calculate the STD error using residual values?

    • Karthik Rangappa says:

      You actually don’t need the standard error of residuals, you only need the residuals. The standard deviation of the residuals is the standard error of residuals.

  4. Vijay says:

    How to calculate z score

  5. Vijay says:

    Can this method be used in cash also or in future only because I have less capital also trial purpose can pair with small quantities

    • Karthik Rangappa says:

      Its best if you do it in futures because the strategy requires you to short stocks as well, which is possible only in futures.

  6. Thirumal Sharma says:

    Hello sir.. was eagerly waiting for it. A big thanks. ?

    • Karthik Rangappa says:

      Happy learning, Thirumal!

      • Thirumal Sharma says:

        Thanks for educating us sir 🙂 . It would be of great help if you could share/made available the pair data with ‘P’ values on weekly basis.

        Thanks for everything again.

        • Karthik Rangappa says:

          I’m trying to figure a way out to share the excel sheet on a daily basis. Hopefully it should be possible 🙂

  7. KUMAR MAYANK says:

    Hello sir
    When i regressed TM(Y) & TMD(X) over one year data set(249 data), it didn’t show any trading opportunity on 11/05/18. While comparing the two regression analysis i noticed that the regression done on 200 set of data generated better results than that of one year data analysis. Also the ADF value was lesser in the former(200 day data) regression.
    1) Is 200 day data optimum for data analysis? Can we regress 100 set of data too?

    2) The ADF value i got after regressing the same set of data as done here on Varsity is 0.0567 while your value is 0.017. I used both Schwarz Info criterion & Alkaike Info criterion with the lag length of 14 (it was automatic). Which criterion should i use and what should be the optimum leg length to run ADF test? I had asked this question almost 3 weeks ago and you told me to wait.
    Thank you, sir 🙂
    Varsity student

    • Karthik Rangappa says:

      1) I remember running some sort of optimization years ago and realized 200-day data is a sweet spot, sticking to it since then
      2) The lag is usually set to the square root of the number of data points…so a lag of 14 is good (sqr root of 200). Btw, I’m really not sure why there is a difference in ADF values.

  8. Sumil says:

    Hi karthik,

    Had one question. If Lot size of X is , say 800, and Y is 1000.
    From Linear Reg. slope comes out to be very low, say 0.001.
    Then how do we calculate hedge ratio in this scenario?

  9. SAMIR says:

    In your live trade example, the trade was open for almost 7 days. How is it that the institutional traders did not spot this opportunity to jump in and close the divergence more quickly? Am I missing something?
    Thanks and regards,

    • Karthik Rangappa says:

      Samir, as retail traders, we have the advantage of liquidity (or maybe illiquidity). Institutes want to trade few 100 of lots with low slippages… a couple of lots here and there won’t matter to them, but it would make a big difference to an individual.

  10. ricky patel says:

    i am new at treading and not so familiar with ADF test but i have software known as E VIEWS which will test ADF. but i don’t know how to run with it. how to enter data in software if sum one know let me know.

    Thnx in advance..

  11. Mani says:

    Sir, have you tried pair trading using 2 ATM options? Since historically option premiums data are not available I couldn’t test the effectiveness of it.

    • Karthik Rangappa says:

      No, have not really tried pair trading with options. There would be other dynamics when you trade options.

  12. Puran parakash bhojak says:

    Hi karthik sir,
    Thanks for your lessons on pair trading. I want to know is it possible to do pair trade in commodities as well like GOLD:SILVER, COPPER:NICKEL, LEAD:ALLUMINIUM?

  13. Ashwin says:

    Dear karthik,
    There is one website, pairtradepro. Com which is giving pair trade signals.
    Can you anything sure about that???

  14. Mani says:

    I was testing my algo on the trade that you took on 23rd May
    x = UnionBank
    y = ALBK
    beta = 0.437
    for that trade I am getting adf_p_value = 0.3637

    am I missing something here sir?

    • Mani says:

      also sir for TATAMOTORS–TATAMTRDVR trade what was your look back period sir? because for that also my backtest values are not matching

    • Mani says:

      but I am gettings same backtest value for ICICI HDFC price data that was provided in the earlier chapters sir, is it possible for you to share closing price data for TATAMOTORS–TATAMTRDVR, I think that would help me a lot.

      • Karthik Rangappa says:

        Mani, go back 200 trading days from 10th May, that would be my closing price data. Also, this data was obtained from NSE bhav copy.

    • Karthik Rangappa says:

      What lag did you select for the ADF test, Mani? I think the algo we use has a lag of 14.

    • Karthik Rangappa says:

      Look back period is 200 days.

      • Mani says:

        Sir what I found is that there is a slight change in beta = 1.67, because of which intercept = 21.96, residual=-19.21 and sigma=5.82 is also different. Because of all these changes adf is different
        Value that I got from my algo is same as excel output, so I guess I can continue.
        Is my conclusion good enough sir?

  15. KUMAR MAYANK says:

    Hello sir
    When I run the regression b/w Unionbank & ALBK for past 200 days beginning from 2nd of Aug’17, the trading signal generated on 22nd May. The important parameters are as follows:
    Unionbank(X) and ALBK(Y)
    BETA: 0.437861306
    INTERCEPT: 7.503194585 (19% of ALBK price)
    STD_ERR: 2.714924731
    Z-SCORE : -2.64
    ADF: 0.0601(Lag length 14,Schwarz info criterion)
    The trading signal was generated on 22nd May to go long on ALBK and short on Unionbank. Oh my God! It is just opposite of what you traded. I have cross checked my regression analysis. Please help me, sir.

    • Karthik Rangappa says:

      Kumar, you are right. I’m guilty of this trade information that I put up. It misread this as 2.64 instead of -2.64, but I think I lucked out because the Z-score expanded over the next few days and I actually was in profit when I realized my mistake.

      Btw, great job on this one 🙂

      • Mani says:

        Sir here kumar used price data from 2nd of Aug’17 to 22nd May’18, if so doesn’t it workout to be more than 200 days? I think I am missing something sir?

      • Kumar Mayank says:

        Thank you, sir 🙂
        While updating the pair data sheet, luckily i got the following trading signal on 4th June’18.
        BANKBARODA(Y) & INDIANB(X) (200 trading day data)
        Beta= 0.272
        Intercept= 61.9 (45% of Y)
        Z-Score= -2.89
        ADF= 0.0311(lag length 14)
        Lot size Baroda=4000, Lot size Indianbank=2000
        If i take 8000 Baroda then according to the eqn the lot size of Indianbank should be 2176.
        What is you view, sir?

        • Karthik Rangappa says:

          Except for the intercept, everything is convincing enough 🙂
          Did you take the trade? If not, at least track it on paper and please share the results here. I’m not tracking this.

          • Kumar Mayank says:

            Hello sir
            I didn’t take the trade but i am tracking it on paper & definitely will share it here. Btw, its my first pair trade though it is on paper 🙂
            I have a question, sir. When i checked the 100 trading day data for the same pair, the ADF value was 0.5135 and Zscore was -3.58. Is it convincing that we should check this way or should stick with 200 data set?
            Thank you 🙂

          • Karthik Rangappa says:

            I remember doing some optimization on look-back period and if I remember right, I figured 200 days is the best look back period. You may want to do this yourself once 🙂

        • deepu says:

          Hi Mayank,
          Thanks for constantly sharing your insights on the pairs. I check the z-score and getting -2.09 for 200 days. Not sure where the error is ? Can we connect on at deepucal at gmail dot com.


  16. Kumar Mayank says:

    I have taken 2 lots (8000) of Baroda and 1 lot (2000) of Indianbank. But for 8000 of Baroda the lot size of Indianbank should be 2176 or for 2000 of Indian bank the Baroda should be 7352.

  17. Mani says:

    While optimizing lookback, is net profit only criteria or any other factors need to be considered sir?

    • Karthik Rangappa says:

      You can optimize this across any criteria – frequency of signals, profitability, risk, ADF etc

  18. Mani says:

    Hello sir,
    can adf p_value be used for any timeseries data to confirm its normal distribution property sir, does it have any limitations?

  19. Labeeb says:

    Ohh beta “neutrality”, been wrecking my brains to understand what beta natality could be, theek kar do sir. Also, pair trading ki updated sheet de do?

    • Karthik Rangappa says:

      Beta is a measure of volatility between two stocks, so when you beta neutral two stocks, you are essentially trying to minimize the volatility between them.

      • Labeeb says:

        I meant its been misspelled so I was trying to figure out what beta natality could be…also where can we find the bi-weekly update to the pairs data?

        • Karthik Rangappa says:

          I get it now. Will make the change 🙂
          I’ll have the sheet updated t’row. Thanks.

  20. Vinay says:

    Sir wt is use of sigma.

  21. Shivansh Juneja says:

    Great chapter, really enjoyed it. Also are you going to be giving the pair data sheet?

  22. Manoj says:

    Hi Sir,
    Could you please provide the data excel for a single pair from the ‘Pair-Data.csv’ . For example please provide the data excel for Hero.MotoCorp.Ltd as YStock and Bajaj.Auto.Ltd as XStock. So that the values (intercept, beta, adf_test_P.val etc) generated at my end
    can be verified against the Pair-Data.csv

    • Karthik Rangappa says:

      That would be difficult, Manoj. I’m trying to set up a daily source of pair data to be made available on the site. This may take some time.

      • Manoj says:

        Hi Sir,

        Ok, understood. In that case could you please run the adf test (using your plugin) against the ‘Chapter-10_Residuals.xlsx’ excel file from chapter 10 and provide the intercept, beta, adf_test_P.val etc. So that I can check the values generated at my side against the values generated using your plugin. I need to verify this only once to make sure that the logic at my side is correct.

  23. Pritam Shetty says:

    Hello Sir,
    I am confused as to, for tracking position after executing the trade, we will have to calculate the std_err by using the current price based sigma , beta and intercept (by doing regression by replacing the old with new data for past 200 look back period) or the initial values which we used to execute the trade?
    Because, the beta, sigma and the intercept values will change the next day if we add the recent data in the 200 look back period and so will the std_err.
    In the position tracker excel which you provided the beta, intercept and sigma are kept same for 14th and 23rd of May.
    Please, help me clear this doubt.
    Thank you, in advance.

    • Karthik Rangappa says:

      Once the position is taken, the new data set will exclude the oldest data and include the latest data point. The position tracker considers the pair data at the time of trade trigger.

  24. kiran says:

    Thank you so much for very easy understanding language in explanation. I went through all the content here on varsity, also i went through a course for stock trading with one institute. Anyways my little confusion is when i was in Options chapter i did paper trading and made few conclusion on my trading way. Now after this chapter im in a small dilemma whether to trade options or the pair trading. First i use to trade equity, after learning options i felt equity day trading is more on luck than rational trading. Pair trading and the calculation are more rational now. Please help me understanding which is more right way to trade and more rational between options and pair trading. Happy i found Varsity,thank you sir.

    • Karthik Rangappa says:

      Kiran, both are very different because the instruments and their risk-reward profile is very different. I’d suggest you pick up both techniques and eventually decide which one suits you better. Good luck.

  25. SANDEEP says:

    `Dear All I wish to know in 8k option strategy how do choose??

  26. vikas says:

    sir unable to download 200 days closing price data from NSE’s bhavcopy. it is providing day wise data for all securities. kindly guide is there any link to download nse bhavcopy 2oo days for particulaar stock together

  27. kiran says:

    Hello Sir, thank you for strategy. Can i do this in equity than futures? the only reason being capital.

  28. Sudharsan says:

    Hello Karthik,

    Thank you for the wonderful explanation. In some cases, I have observed that the std. error went down to -4 as a result of certain events, case in point being YESBANK from Sept 21 to 28, was a drastic drop.

    Do you recommend going long when I see the error being much lower or higher than -2.5/2.5 trigger or do you think i should avoid it due to volatility. ( the error has gone back to around -2 now so if i took the trade I would have profit). On the other hand, If i took the trade at -2.5 I would have lost money going down to -4. Any idea if there is any other parameter to look at to avoid this?

    • Karthik Rangappa says:

      These are fundamentally driven events, Sudharsan. The best case for pair trading is when all else are equal. Given this, I’d not initiate a trade even if I saw a 4SD in the backdrop on fundamental events.

  29. Nobal says:

    Hi Karthik,
    Thanks for this wonderful module, got to learn another very useful trading system. Towards the end of this chapter, you mentioned something about multivariate regression, I really wanted to know more about this. I have good or decent enough background in linear algebra, regression analysis, and programming, hence, I really want to dig deeper into this subject. Could you please provide some pointers to get me started on this? Thanks for all your help!

    • Karthik Rangappa says:

      Glad you really liked the module 🙂

      Unfortunately, there is not much content on Multivariate and I’m really not sure if we should put up anything, it could be a little hard for the readers here.

  30. Kaushal says:

    “What if Stock A with Stock B is not stationary, but instead Stock A is stationary with stock B & C as a combined entity?”
    Does the statement mean that we have two independent variables and one dependent variable and if we regress stock A(dependent) against stock B and stock C combined we can generate one column of residuals and hence by that we can check for cointegration by the equation Y=mX + nZ + constant…where Y is stock A ,X is B and Z is C, m and n are betas for B and C and the constant may be considered as equivalent to intercept???

  31. Ravi says:

    HI Karthik,
    Can you help me with link on NSE site that gives sectorial categorization of stocks and their EOD data ?

    • Karthik Rangappa says:

      I need to look for this myself, Ravi. Will share the link if I find.

      • Ravi says:

        Thanks Karthik, I thought it is available handy as you mentioned in the point below,. i tried to look it up but couldn’t find. we can form this ourselves but its already done by NSE and readily available, one less thing to automate 🙂

        “2.The list of stock and its sector classification is already done. Hence the download is more organized”

        • Karthik Rangappa says:

          That was the plan, Ravi, but I had few operational issues to deal with. Will check on what best can be done 🙂

    • Sandeep says:

      Hi Ravi, any progress made on the sector wise classification of F&O stocks ?

  32. Govil says:

    Hi Karthik,
    First of all loved the way you explained this complex subject.
    I’m bit confused while selecting the lot sizes of X and Y. If Y= m*X (m is slope/beta) then for 1 lot of X is equivalent to m lots of Y (substitute X=1 in equation Y=mX ). But you have taken other way round. Am I missing something when you say ” beta neutral ” ?
    Thanks and cheers. 🙂

    • Karthik Rangappa says:

      Govil, Y = Beta * X, restating this
      X = Y divided by Beta

      Not sure if I’m missing your point.

      • Govil Bhole says:

        Thanks for the reply Karthik,
        You are right X = Y divided by m. So for one lot of X you need m lots of Y (substitute X=1 in Y= mX). BUT in 13.3 LIVE EXAMPLE of Tata motors and Tata motors DVR, this is what written (copying directly)
        “The beta neutrality states that for every 1 stock of Y, we need to have beta*X stock of X”. This is applicable to equation X=mY by substituting Y=1lot, you get X=m lots. As per my understanding, its not Y=mX equation. Please correct me if I’m wrong. Thanks.

  33. Amit says:

    how to convert MIS to CNC before squareoff.?

    thanks in advance

  34. Vipin says:

    Hello Karthik,

    Thank you for interesting series. I have 2 questions:

    1: Given that X and Y are stock prices – shouldnt the beta neutrality be applied on nominal exposure so if tata motor (Y) – Tata Motor DVR (X) has a BETA of 1.59 shouldnt it mean that exposure on tata motor should be equal to 1.59 times the exposure on Tata Motor DVR? so if nominal value of one lot of tata motor is (1500*331.65) Rs 497,475 the exposure i need to take in DVR is 1.59*497,475 = 790,985 which translates into 4064 shares of DVR @ 194.65. please clarify?

    2: In computing the Z score shouldnt we see how far is the current datapoint from mean rather than absolute value. so Z score should be (current value of residual – mean of last 200 onservations)/ standard deviation of 200 observations of residual. Please clarify?

    Many thanks,

    • Karthik Rangappa says:

      Given that X and Y are stock prices – shouldnt the beta neutrality be applied on nominal exposure so if tata motor (Y) – Tata Motor DVR (X) has a BETA of 1.59 shouldnt it mean that exposure on tata motor should be equal to 1.59 times the exposure on Tata Motor DVR? ——-> This is correct.

      so if nominal value of one lot of tata motor is (1500*331.65) Rs 497,475 the exposure i need to take in DVR is 1.59*497,475 = 790,985 which translates into 4064 shares of DVR @ 194.65. please clarify? ————-> Beta adjustment should happen on the stock price, not the contract value. Remember, price is already factored in when computing the Beta.

      In computing the Z score shouldnt we see how far is the current datapoint from mean rather than absolute value. so Z score should be (current value of residual – mean of last 200 onservations)/ standard deviation of 200 observations of residual. Please clarify? —————> This is how Z score is calcualted. Not sure if I’m missing the point here.

  35. Arun says:

    Dear Karthik,

    I have 2 queries.
    1. Initially it was told that data available (lock back period) shall be atleast 1 year, it is better if it is 2 years but at the end of module, it is told that 200 days data is best? which statement is correct then?
    2. If I see today that two stocks are co-integrated and their residuals are stationary then does it mean that they will remain co-integrated (along with stationary residual) in future too? or we need to check co-integration and stationarity at frequent intervals?

  36. Arun says:

    Dear Karthik,

    Some of the stocks don’t have Futures but only Spot. for ex. Andhra Bank, Bandhan Bank etc. hence they don’t have lot sizes.
    Do we need to avoid such stocks or there is a way of pair trading in them?

    • Karthik Rangappa says:

      Hmm, as long as you only need to go long on this and short the other stock’s future, it should be ok.

      • Arun says:

        Dear Karthik,

        My apologies, I didn’t get it. can u plz elaborate, probably with example?

        • Karthik Rangappa says:

          Meaning, the X and Y ordering should be such that the non-derivatives stock always needs to be bought and the derivative on needs to be shorted, if this is taken care off then you can look at the combination of non-derivative stock + derivative stock.

  37. Kaushal says:

    Sir, how do we do this on intraday basis?
    Should we use last 200 EOD prices and track z score at suitable intraday frequency to look for opportunities or should we use last 200 candles of corresponding intraday frequency?

  38. Mihirsinh Jagdishsinh Parmar says:

    Hi Whats your take on USDINR & EURINR PAIR Trading. Have you analysed or tracked the pair.


    Dear Nithin , I want your advice on technical course available online ,which as per your recommendation is best suited for professional who want to understand the technical analysis from beginning to advance level .

    • Karthik Rangappa says:

      Dharmendra, frankly whatever you need to know is already available online on Varsity. Why do you want to spend money?

  40. kaushik ramnath says:

    Hi when I downloaded that excel file I can just see the variables confined to that particular date and time. What can I do to run the excel file for any given date and time?

    • Karthik Rangappa says:

      That sheet is just to demo the trade, Kaushik. It does not run on its own.

      • kaushik ramnath says:

        Can u tell me how u calculated adf value and error ratio for all pairs of stock in excel?

        • Karthik Rangappa says:

          Kaushik, its explained in the chapter itself.

          • kaushik ramnath says:

            I recently read this thesis paper by Hakon Andersen & Hakon Tronvoll in which they did pair trading using PCA(Principal component analysis) and Density Based Clustering. Which one is better? Linear regression method or PCA method. Can u add and explain this PCA method in zerodha varsity website?

          • Karthik Rangappa says:

            I’ve never used PCA method, Kaushik. Rather, I’m not sure about the technique to do so. But if I were to guess, PCA would be nice, since PCA emphasis is only on factors which explain the maximum variance, ignoring the other factors.

  41. Prafulla says:

    Dear Karthik

    At the end of the chapter you talked about trading using multi variate regression. Could you please point to a source where I can read this?

    Thank you so much.

  42. Prafulla says:

    Hi Karthik

    Do we not have to run Correlations between all the stocks of a sector, say all stocks in nifty bank, BEFORE doing the regression analysis?
    We did that in the first case.

    Thank you.

    • Prafulla says:

      Should we not run regression on only those stocks that are statistically significantly correlated >.75?
      Should we run regression on all stocks without looking at their correlation?


      • Prafulla says:

        Is the stationarity of the residuals a necessary and sufficient condition of a statistically significant correlation?
        Would running correlations apriori not reduce the number of regressions needed to be run?
        Would running correlations apriori exclude some trading opportunities that would otherwise have been spotted by NOT doing a correlation analysis and only looking at the regression and cointegration of the pair?


      • Karthik Rangappa says:

        I’d suggest you to not look at correlations if you are using this approach to pair trade.

    • Karthik Rangappa says:

      You can, this really depends on the program that you have developed.

  43. kiran says:

    Hello sir, small correction in log of 18th may, 3.30 PM values of X and Y are entered opposite. I was going through and it showed me profit of 518350. For a second i had a big smile,haha!

  44. kiran says:

    Sir pair data sheet which is available to download is it updated or we have to download 200 day values, regress it and do it ourselves? Thank you.

  45. Narendra Bande says:

    Sir, I used some online information for calculating ADF wherein the regression is run between Delta of residuals and t-1 residual. Is this right way? Because as you have stopped updating the pair data, the most calculations must have changed. When I did Eicher (Y) and Bajaj Auto (X) I got following
    Intercept = 12263
    Std error = 3274.478
    Slope = 4.26
    ADF = 19.14%

    So I have two questions to ask:
    1. Should I change Bajaj Auto as Y and Eicher as X. What will be risk. (I guess as you mentioned in the 1st few chapters that you may get 2 or 3 trades in one pair a year, changing X to Y and vice versa may not help. Also it means one must wait for the ADF to be favorable over time.)
    2. Where can I get ADF plugin for Excel. (I have no idea how to use python or do the same in R even if there is plugin available)

    • Karthik Rangappa says:

      Delta of residuals and t-1 residual. Is this the right way? — I dont think so, Narendra, but I cant point a finger and say why. Will get back on this.

      1) The decision of which one is X and Y is really depended on how strongly one can explain the other (in terms of daily variation). Go ahead and do it and check what results you get
      2) This even I’m not sure 🙂

  46. Narendra Bande says:

    please see the weblink. I used this method to find ADF. For your ref.


  47. sahil swaroop says:

    The second highlight is 20.914, which is the residual.

    The std_err in the report is simply a ratio of –

    Today’s residual / Standard Error of the residual

    = 20.92404/22.776

    = 0.91822
    sir, I got confused with this calculation as in the paragraph u have written that u want to find the position of the current residual in the distribution of residuals if I am not correct u I trying to calculate the zscore of the residual I am may be wrong but for zscore (data point -mean of the distribution /std of the distribution but above u directly divided the residual with std is something else u are trying to calculate pls clear

    • Karthik Rangappa says:

      Sahil, yes…the idea is to figure out the position of residual wrt to its average. In other words, the Zscore of the residual.

  48. sahil swaroop says:

    sir , as u have been mentioning in all the chapters in order which pair should be y and which should be x we look and error ratio wich std error of intercept /std error of the residuals but sir I couldn’t get the intuition behind it like simple intuition for r squared is tries tell how much y explained by x could be wrong this pls correct. So how is error ratio helping

    • Karthik Rangappa says:

      Sahil, the idea is to pick a pair (by identifying x and y), in such a way that one variable explains the maximum of the other.

  49. sahil swaroop says:

    Sir, I know u must be busy but it would be great if also do a chapter on PCA and if u can point me to some resource which u came across which explains PCA without heavy maths pls share

  50. Shashank Sinha says:

    The second highlight is 20.914, which is the residual.

    The std_err in the report is simply a ratio of –

    Today’s residual / Standard Error of the residual

    = 20.92404/22.776

    = 0.91822
    sir, I got confused with this calculation as in the paragraph u have written that u want to find the position of the current residual in the distribution of residuals if I am not correct u I trying to calculate the zscore of the residual I am may be wrong but for zscore (data point -mean of the distribution /std of the distribution but above u directly divided the residual with std is something else u are trying to calculate pls clear

    pls clear do we subtract the mean from residual value or not to get the z score

    • Karthik Rangappa says:

      Yes, thats right Shashank. The idea is to figure the z-score of the residual. I guess someone had a similar query, have answered the same. Can you please run through the comments?

  51. avinash pudi says:

    Can we use error ratio concept to decide which one is stock A and stock B 1st method of pair trade you explained? If not so how can we decide stock A and Stock B to calculate the ratio StockA/Stock?

    • Karthik Rangappa says:

      Hmm, but since this largely based on correlation technique, it does not really matter which is Stock A and B.

  52. avinashpudi says:

    SInce I am not a coder I tried for some free sources to calculate adf test and found an excel plugin called as Real stats(

  53. avinashpudi says:

    SInce I am not a coder I tried for some free sources to calculate ADF test and found an excel plugin called as Real stats (http://www.real-statistics.com/) If you spare some time to look into it and write an article how to use the plugin it would be helpful

  54. jaya says:

    i taken data 10 minutes time frame for 60 days in excel i got nearly 1640 cells .
    correlation bpcl vs hpcl 0.72
    but, density curve reaches to 0.000234 below or 0.99986 above near to 1
    my question was what is the differences between collecting data for 2 years and 10 minutes time frame for 60 days ?
    which one should i need follow ? sir

    • Karthik Rangappa says:

      I’d suggest you stick to daily EOD data, simply because the noise component in your intraday data is quite high.

  55. Alekha says:

    In chapter 15 you share a Exls how to calculate Intrinsic Value in excel
    Share Price (INR) = (F23-F26)*10^7/F29
    Share Price (INR) = (Total PV of cash flow-Net Debt)*10^7/Number of Shares
    in formula 10 is the 10% then i can’t able understand the 7 which u used
    please clarify my doubt on 10 (if not 10%) and 7

    • Karthik Rangappa says:

      Alekha, are you sure you are referring to a chapter in this module? I’m unable to get the context here.

  56. Hiren says:


    First of all thanks alot for this knowledge bite series on trading analysis and specially on pair trading. Love this piece, so simple to understand.

    Request you to kindly tell from where we can download 200 day (historical) data in one go for all equities? On NSE’s Bhavcopy, it is daily data, which needs to be compiled for 200 days. Please help where to get this data. Thanks

    • Karthik Rangappa says:

      Hiren, I’m glad you liked the content. Bhavcopy is the best source, but you will have to figure a way to build a script which will compile and download in 1 go.

  57. Kevin says:

    I run pair trading algo myself and found some pair like…
    x = RELIANCE
    y = IOC
    beta =-0.1058
    intercept = 280.036
    closing price of IOC on 30 Aug = 122.45

    Now as closing price of IOC for 30 Aug is 122.45, so as you pointed out in very last chapter of Pair-trade that, intercept is value by which model can’t estimate value of y-stock (IOC in this case). So,if we apply this logic here then,it implies that,Regression model can’t predicate IOC(y-stock) price 280.036 out of 122.45(or around -128%). I know this is very tricky.So, can you please clarify this situation as I am getting same thing with all pair which have negative beta.


    • Karthik Rangappa says:

      Kevin, -ve beta is a tricky situation and from experience, the model break when we face such a situation. For this reason, I avoid trading such pairs.

  58. Hiren says:


    Since I have no programming experience, request you to kindly share last 300 days data for all equity based securities, will add the daily data myself going forward. Or if this is not feasible then please share some script with instructions. Thanks in advance!

    • Karthik Rangappa says:

      You can always download the same from the NSE’s bhav copy or you could subscribe to data services from a data vendor.

  59. Sunil says:

    Hi Karthik,

    With respect to negative Beta.
    UBL (Y) = -0.49 (Mcdowell)

    may be it works out to be 4 lots of UBL to 1350 shares in mcdowel.

    I know we avoid trading negative beta. what are your comments on this position sizing?


  60. Sunil says:

    Hi karthik,
    One last question with respect to today’s residual/sigma.

    Then excel data imported contains close prices in chronological order ie oldest close in first row and latest in last row below.

    After running the linear regression which residual do I use. The example shown by you inputs observation 1 residual as numerator.

    In my calculation should I use the last row observation 200. kindly correct me if am doing something wrong here.

  61. Sorabh Pathan says:

    Hi Karthik,
    I wholeheartly thankful to you and team Zerodha for such a wonderful educating efforts.
    my query is,
    When i initiated trade, Z-Score for my pair is 2.56 and ADF test p-value <5 % [all other parameters are within as per your explaination]
    but after 2 days Z-Score is 2.93 but ADF test p-value is 5.5 % [crossing critical value of 5 %]
    So is it worth to hold the trade till Z-Score to be (3 for SL or 1 for Target) or to exit the trade as ADF test p-value is not favorable now?

    • Karthik Rangappa says:

      Depends on you risk appetite, Sorabh. I tend to hold till 3SD, but there are times when I’ve gone ahead with gut feel and cut/booked earlier to 3SD and that has proved to be the right call.

  62. ASEEM GOYAL says:

    Hello Sir

    I tried using your pair trading method by writing an algorithm in Python. The algorithm followed the following steps :
    1. Extract Stock prices for all Stocks (for which Futures are available) for past 200 trading sessions
    2. Run Linear Regression on all possible combinations of pairs, identify the independent variable & apply ADF test
    3. Identify all pairs having Z-Score either greater than 2.5 or less than -2.5
    4. Track daily the Z-Score basis the regression outputs

    But more than 50% of times, the prices were diverging i.e. the Z-Score crossed 3 or -3.
    I also backtested the approach and calculated next 15 trading session Z-Score for past 200 Trading session prices but found no visible pattern for high success rate.

    Am I doing something wrong or this trading system doesn’t actually work ?

    • Karthik Rangappa says:

      Aseem, these are the broad steps. You will have to start calibrating this for results. For example, try -1.5 and +1.5 with 2 as SL or something like that. This is true with all strategies, you will have to calibrate the parameters and find your edge 🙂

  63. Aseem Goyal says:

    Thanks for the quick reply.

    But using different Z-Scores wouldn’t invalidate the hypothesis we were using ?
    We were taking Z-Score bracket of 2.5 & -2.5 since it is highly unlikely to reach that level & it would come back to Z-Score of 0 with high probability, making it profitable.

    I created a simulation to track the Z-Scores of eligible Pairs and found no discernible pattern which has high probability of profitable trade. The Z-Scores were diverging or converging randomly.

    I can share the simulated data with you, if you want.

    • Karthik Rangappa says:

      You are right on the Z score, but we are trying to establish the possible pattern for stocks. Some may just trade within -1.5 to +1.5, who knows.

      Btw, when markets are trending, almost all stocks move in the same direction, due to which you will get high r2 score and low p-value. One should be cautious about it. Instead of number of data points, I’d suggest a window where the stock pair trend had deviated but not less than 100 points to avoid sampling error. Ideally, you should perform adf test for all lags, however, multiple experiments have shown that performing up to a lag of cube root of 3 is good enough. This is more a thumb rule followed by traders rather than a mathematical proof.

  64. Aseem goyal says:

    Trending Markets do make it tougher to recognise actual trading opportunities. Since my complete model depends on historical data of past 200 trading sessions, then my output would become wrong in case of wrong selection.
    I am actually new to trading, can you please elaborate on your suggestions.
    1. You said about taking data of stock pair deviating at least 100 points. Is this mean to take you want the actual difference be more than 100 and consider only those data points for my Regression model instead of considering continuous daily data.
    2. You talked about using different lags in adf test. Currently I am using ad fuller algorithm in python from stats model library. By default it uses some lag value based on a formula or we can explicitly input it. While researching I found using AIC is best option for lag. You said to use up to cube root of 3, does that mean use multiple lags till that number ? Please clarify

  65. jaya says:

    hai sir gold vs silver correlation 0.78 but adf test is 0.832 and z score was 2.70 is it trade worthy sir.

  66. Pavan Kulkarni says:

    Brilliant stuff Karthik! I read multiple places about stat arb but yours is the best explanation.
    I followed your steps but I’m getting some weird beta values. In addition to negative ones, which I’m ignoring, I also get combinations that are highly impossible. E.g. 1 TataMotors to 0.2 AshokLey i.e. 5 TaMo to 1 AL. Are such values normal or do you think there is something wrong with my process? Thanks in advance

    • Karthik Rangappa says:

      What is the time frame you are looking at? This maybe possible if there are crazy movements in stocks. I’d suggest you skim through the data set once. Also do a hygiene check on the data, ensure its clean for all sorts of corporate action.

  67. Pavan Kulkarni says:

    Based on your next chapter, it appears like such ratios are common. However, will appreciate if you can confirm based on your experience.

  68. Dhinson says:

    Thank you. Your explanations are really easy to understand and follow. You’ve done more than a great job. I just have a question, I would assume you have collected lots of data about this technique, what is the win:loss ratio using this technique? Thank you in advance.

    • Karthik Rangappa says:

      Its got a decent success ratio, unfortunately, I cannot plug in a % here. Just that its a bit complex to implement.

  69. Mahesh says:

    Doing great job. All the best.

    If I am not wrong, Z score and Std_err are same right?

  70. Geetank says:

    For Non programmers the pair data sheet which you made available in the end does that rightfully tells that these are the pairs we can track. Also how do we get P value since you said its not available online. Is there by any chance some way from where we can get P value. Last question is standard error (today’s residual/ std error of residual) same as Z score?

    • Karthik Rangappa says:

      Unfortunately, the plan to support with updated values did not go through, hence you will have to calculate the P values yourself. Yes, they are the same.

  71. Geetank says:

    The pair data sheet which you shared is an output from pair trading algo right?
    If someone wants to perform whole process in excel then he/she has to perform the following steps:
    – Download clear data from NSE Bhavcopy.
    – Run Linear regression;identify suitable X &Y through error ratio.
    – Then we check if the residual time series is stationery or not which can be checked through p value or the 3 conditions i.e. mean, SD, Autocorelation.( Can that P value directly comes in excel through some formula maybe T distribution?)
    – Now here is big hurdle how to check p value of pair?
    – Also there are so many pairs in your data sheet. Am i suppose to follow the same process as above to get the right pair and form my own pair sheet or the one which you provided back in 2018 is still good to use.

    Lastly, I am extremely sorry that i am buzzing you up with lots of queries and at the same time extremely grateful that you are so humble to read each and every comment and respond it back.

    Thank you.

    • Karthik Rangappa says:

      1. Download clear data from NSE
      2. Run linear regression on stock pairs from same industry, and identify suitable X&Y from error ration
      3. Check if P-value of beta < 0.05, then perform ADF test on Residual, if the P-value of ADF test is also < 0.05, the pair is eligible for trading 4. If the standard error indicates a trade (i.e. if it is above +2 SD or below -2 SD), take the positions 5. Track the trade and close it Perform this analysis every day. The sheet given in 2018 is valid for that day/week, you can't use it today.

  72. Amit says:

    Hi Karthik,

    First of all, thank you so much for the awesome content, this is gold!

    Q1: How are you calculating the standard_error (I know its today’s residual upon standard error of the regression model), but should we run regression every day with last 200 records and calculate the standard error for today’s price using the newly ran regression model?

    Q2: When we identify a trading opportunity and keep track of the z-scores, how are you calculating the z-score, again should we run a new regression every other day or just use the last regression model when we identified the trading opportunity to calculate the residuals (and then z-score)?


    • Karthik Rangappa says:

      Regression relation between stocks is not stable, they keep changing with time. Hence its recommended to perform regression analysis regularly [preferable daily or at least weekly]

  73. Saood says:

    Hi Karthik,
    Suppose i initiate a trade with low adf value (<0.05). However in between the trade the adf value rises to greater that 0.05. Should i close the trade or disregard the adf value as we have already entered a trade?

  74. Saood says:

    Hi Karthick,
    Thanks for the reply.
    I have two questions
    1. Assuming that we calculate adf value every day during the trade, there should be a threshold for the adf value. If that goes beyond the thresh we should book a loss and exit the trade. My guess is this should be greater than 0.05 otherwise even slight change in adf value (say 0.05 -> 0.6 and back to 0.05 the other day) would make us keep hopping between entering and exiting the trade. What do you think should be the threshold of adf value to exit the trade.
    2. The regression parameters are also not stable and would vary each day we re-perform the analysis. Suppose we enter a trade at z-score of 2 and wait for the z-score to come down to 1. It it possible that by the time it reaches 1 the regression parameters has drastically changed and we are are getting a loss even if the z-score is 1 at the time of exiting the trade.

    • Karthik Rangappa says:

      1) Hard to say, because this can drift from 0.5 to 0.6 to 0.8, which would only increase the losses right? So how would you accommodate for that scenario?
      2) Yup, part of the trade. Parameters change as and when the price changes. It is just that the change is drastic when there is a drastic movement in price.

  75. Saood says:

    I guess the if adf values goes beyond the threshold ( say 0.2) i will immediately take a loss and exit the trade. If the adf value is 0.2 chances are it might not converge at all. I hope i am correct on this. 🙂

  76. kumar says:

    This really peaked my interest in trading, are there any other strategies that I could read?

    Thank you 🙂

  77. jaya says:

    sir AS you mentioned about “sigma value will teach you again ” at which period we can except sir?

  78. Ganesh Patel says:

    Hi Karthik,
    This is a lockdown period when I am reading you. Thanks for writing the best tutorial on the internet. loved reading Varsity.

    now I need technical assistance here, do you know API or endpoint connection to get excel data (at the start of the chapter) of all stock prices and their sectors?


  79. Ganesh Patel says:

    Thank you so much!! I will surely look into it.

  80. Rushiraj Bhusare says:

    in the live trading example, you said that the trade is skewed towards the long side, but isn’t it skewed to the short side as you have sold a greater number of shares than the beta demands?

  81. ronit says:

    hey karthik,
    Could you please confirm that is it unusual to get p_value much smaller than 0.05, I am getting 36 pairs with p_values less than 0.01 with the historical close price of last 200 days

    beta corr intercept p_val residue_error residue_mean sector xstock ystock
    2 3.162 0.92 702.674 0.009 89.343 -0.0 Auto AMARAJABAT BAJAJ-AUTO
    7 0.245 0.93 65.888 0.002 24.059 0.0 Auto HEROMOTOCO AMARAJABAT
    8 0.899 0.96 215.872 0.003 18.769 0.0 Auto M&M AMARAJABAT
    10 4.698 0.93 191.660 0.000 24.011 0.0 Auto MOTHERSUMI AMARAJABAT
    16 0.190 0.92 39.171 0.001 3.544 0.0 Auto BHARATFORG APOLLOTYRE
    22 45.280 0.92 882.464 0.001 174.972 0.0 Auto APOLLOTYRE MARUTI
    24 282.852 0.93 32816.837 0.000 1039.562 -0.0 Auto APOLLOTYRE MRF
    43 0.864 0.96 726.155 0.008 64.844 -0.0 Auto HEROMOTOCO BAJAJ-AUTO
    84 3.394 0.95 720.147 0.004 78.234 0.0 Auto M&M HEROMOTOCO
    308 0.584 0.92 373.481 0.000 12.201 -0.0 FMCG UBL MCDOWELL-N
    344 1.078 0.94 79.684 0.000 5.656 0.0 Metal NMDC COALINDIA
    389 0.029 0.94 9.636 0.003 1.287 0.0 misc CENTURYTEX ACC
    410 12.333 0.95 152.284 0.000 15.783 -0.0 misc ACC NIFTY
    414 57.262 0.92 1633.833 0.000 92.470 -0.0 misc ACC TATACHEM
    460 16.068 0.92 716.933 0.010 93.179 -0.0 misc ADANIENT TATACHEM
    16 0.190 0.92 39.171 0.001 3.544 0.0 Auto BHARATFORG APOLLOTYRE
    22 45.280 0.92 882.464 0.001 174.972 0.0 Auto APOLLOTYRE MARUTI
    24 282.852 0.93 32816.837 0.000 1039.562 -0.0 Auto APOLLOTYRE MRF
    43 0.864 0.96 726.155 0.008 64.844 -0.0 Auto HEROMOTOCO BAJAJ-AUTO
    84 3.394 0.95 720.147 0.004 78.234 0.0 Auto M&M HEROMOTOCO
    308 0.584 0.92 373.481 0.000 12.201 -0.0 FMCG UBL MCDOWELL-N
    344 1.078 0.94 79.684 0.000 5.656 0.0 Metal NMDC COALINDIA
    389 0.029 0.94 9.636 0.003 1.287 0.0 misc CENTURYTEX ACC
    410 12.333 0.95 152.284 0.000 15.783 -0.0 misc ACC NIFTY
    414 57.262 0.92 1633.833 0.000 92.470 -0.0 misc ACC TATACHEM

    Thanks 🙂

    • Karthik Rangappa says:

      Ronit, on what basis have you selected the pairs? ACC and Tata Chem? Bharath Forge and Apollo tyre?

  82. ronit says:

    16 0.190 0.92 39.171 0.001 3.544 0.0 “Auto” BHARATFORG APOLLOTYRE
    They both belong to the same Auto Index.

    I used this website:

    to get which futures scripts belong to the same group.

    Some stocks didn’t have a group, so i grouped them together( I shouldn’t have done that, my bad)
    414 57.262 0.92 1633.833 0.000 92.470 -0.0 “misc” ACC TATACHEM

    could you please provide your groupings of stocks, I searched the internet and nse by didn’t get it..
    It would help me a lot.

    Other than this, do you think there is some error in the calculations, or it’s common to get these low p_values between stocks of same index.

    • Karthik Rangappa says:

      Ronit, group them by intuition and then figure further details. For example, I know HDFC Bank and ICICI Bank is a good pair. But is HDFC and Corporation Bank a good pair? No, because one is a Pvt sector bank and the other is a PSU. The bank’s size is also not comparable. Similarly, Mindtree and HCL maybe a good pair, MindTree and Airtel may not. So you have to put intuition behind paring.

      Low P-value is possible.

  83. ronit says:

    Hi karthik, appreciate the response,
    Can you please provide data which categorized stocks of the same group.
    Anything would work, a link to a source, maybe google sheet link, website, or my mail id: [email protected].
    I have already tried to group these stocks but I’m not able to do a good job. Please help.

  84. ronit jain says:

    hi Karthik,
    quantsapp recently released a pair trading tool on their app and they are using 20 days lockback to calculate SD rather than the previous 200 days.
    I tried their backtesting tool with 20 days and 200 days lockback for SD. and 20 gave better results.
    1) Could you please check which lockback period would be appropriate?
    2) if we want to choose 20 days, then does that mean that we have to do the calculations daily, rather than what we were doing before, like updating our sheet every 2 weeks because we took 200 days, but that won’t work in 20 days, right?

    • Karthik Rangappa says:

      I’m not familiar with the app, so I’m not really sure, Ronit. Ideally yes, you need to run the calculations daily. Once you take the trade, you freeze it trade.

  85. ronit jain says:

    Thanks, Karthik,
    I would also like to ask your views on using
    spread as: stock(a) – stock(b)
    spread as: stock(a)/stock(b)
    rather than the residuals (y_predicted-y) which we use?

  86. ronit jain says:

    hi Karthik,
    as naked futures required huge margin, how about doing spread with options?
    but i had a doubt in this,
    for a pair x and y.
    given x>y
    i have two choices
    1) do debit spread for both: buy call spread for y and buy put spread for x
    2) do credit spread for both: sell put spread for y and sell call spread for x
    or maybe mix and match, sell in one, buy in another, but respectively with our assumption of the future direction,
    Please guide, Thanks a lot

    • Karthik Rangappa says:

      I’ve never really thought of this from options perspective. But options can be tricky as it has greeks playing on it. Pair trading is a pure directional play (delta), you don’t want to complicate it with other parameters. However, since I’ve not really thought through this, I’m not sure 🙂

  87. Jayneel raichura says:

    Can you show how to find ADF values through paid plugin in excel?

  88. Jayneel raichura says:

    Are you still regularly uploading the pair datasheet?
    If not are there any other traders doing and posting it online?

  89. Parallax says:

    Hey Karthik,
    Once we have pair ready and a signal is generated, we are initiate our positions using just stocks as well right? My concern is, I might not have enough capital with me for initiating multiple pair trades simultaneously with futures if the signals arises.

  90. Parallax says:

    Hey karthik,
    I coded up the algo to get the pairs and I have got around 143 of them for stock in nifty100. My question is how should I validate the resulting pairs? Is there a source from where I could validate my results? I need a way to validate these pairs and make sure that they are. indeed correlated and that there’s no error in the implementation of things

    • Karthik Rangappa says:

      You will have to validate the pairs quantitatively. I’m not sure about any other source to do this 🙂

  91. Nitul Rupareliya says:

    Sir, I understand beta neutrality but what to do if we required to sell in spot market for beta neutrality…..

  92. darryl says:

    hello karthik
    can you please through some light on the working of mulivariant regression

  93. Manhar says:

    Hello Karthik,

    Is there any third part software company or website who provides the pair trading data on daily basis with subscription basis?

    Thanks in advance

  94. Ishant Gupta says:

    Here, Standard Error(Std_err)/Z-score (or Standard score, as referred to in statistics) is defined as a ratio of (Today’s residual) over (standard error of the residual). I’m having a doubt that it should be (Today’s residual – Mean Residual) over (standard error of the residual) to make sense.

    Kindly correct me if I am wrong.

  95. Kumar says:

    HI Karthik,

    First of all Guru Namaskar. I really became a fan of your teachings.
    Indeed, your articles are well organized, the choice of words are comprehensible and particularly the metaphor or analogy you put together to explain jargons are really easy to understand the nuances of the trading and investment.


  96. Chaitan says:

    Thanks for sharing all this details.

    I’m a bit curious to learn about multivariate regression. could you please help me with more details please.


    • Karthik Rangappa says:

      That is a slightly complicated task, Chaitan. Not many would want to look into that, hence refrained from discussing it.

  97. Robin says:

    Hi Sir,

    Thank You! for sharing in detail.
    Can you please let me know what days of data (date-range) is taken in the input of code of Pair-data sheet (https://zerodha.com/varsity/chapter/live-example/)?

    I have made a code that downloads the last 200 days of data from yahoo finance, but getting different p-values. I just want to verify my results with yours.

    Best Regards,

    • Karthik Rangappa says:

      Unfortunately, I don’t have the exact dates, but from the trade snapshot, I can see its around May 2018.

  98. Robin says:

    Hi Sir,

    From the comments I could figure that dates are 23-Aug-2017 to 11-June-2018. My coeff, intercept, std_err and sigma are matching with yours 12th June updated sheet, but not the p-value. I tried with lag of 14 (sqrt of 200) as well as 5 (cube root 200), along with no constant and no trend.
    I have used both statsmodels.tsa.stattools.adfuller function and arch.unitroot.ADF function in python.

    Will it be possible for you to share which function are you using in your code to calculate ADF p-value? I just want to verify my results before taking a trade.


    • Karthik Rangappa says:

      Robin, unfortunately, I don’t have the code, had to use my friend’s code and he obliged since this was for the educational purpose 🙂

  99. Devendra says:

    what actually is beta, is it the slope of X and Y only?
    also, are we calculating the z-score of residuals only? which means z=[(current residual)-(avg of 200)]/standard error of residual
    Am I correct

  100. devendra says:

    and mean of residual tends to Zero only?

  101. Mahesh Ananthaiah says:

    Dear Sir,

    Consider two scenarios, (meets all the criteria of pair trade)
    1. Have been tracking pairs for a while. Consider, One pair, Standard error is 1.61 (after yesterday’s closing) and after today’s closing the standard error is 2.54. Can we initiate the trade or what is the probability that it can reach 3SD and not to trade that pair?

    2. Standard error after yesterdays closing is 3.23 and after todays closing the standard error is 2.72. Can we initiate the trade or will the pair come back to 3SD?

    Have been also practicing multivariate regression in excel using NUMXL for ADF. The above problem / doubts arises even for multivariate regression.
    Please clarify. Thank you.

    • Karthik Rangappa says:

      Assuming all pair trade conditions are met.

      1. The chances that it will go from 2.5 to 3 sd is about 1% and going to 1.5 sd is much higher

      2. Can’t guarantee how it moves but the chance that sd will move close to 1 is higher than moving above 3

  102. Chandu says:

    Sir can I just follow first pair trade method which you discussed in the module
    No idea about programming or algo

  103. Ravi says:

    Hello Karthik,
    This article is very well explained covering all concept of linear regression very much effectively. Do we still share such trades somewhere or do we have any place in Zerodha where can scan and pull such trades for pair trading?

  104. Avi says:

    Sir understood the concept I’m stuck at Adf test.How to approach and ask a programmer to create a system for me when he has zero knowledge about pair trading, my brother knows python,what to ask him?

  105. Avi says:

    Sir just noticed our linear regression test only gives us p-value
    Is that enough whether to know a pair is stationarity or not given probability is <5%

  106. Dhawal says:

    While calculating std-err = TODAYS residual value/ std error = 20.92/22.77

    In regression report the 1st residual value is the oldest data as the series is (oldest-newest format) so for today’s residual value should we not take the residual value from the last slab instead of the first slab .

  107. RC says:

    Hi Karthik,
    May you kindly help to understand the following doubts?
    1) Beta/Hedge ratio of the two stocks can be negative (most probably for unrelated sector), and hedging in that case might be done either by buying both or selling both, is it correct? Same sector negative beta sounds strange and we should avoid such trade?
    2) Intercept of the regression, if negative, we should discard that X, and Y combination as negative intercept (fourth quadrant) doesn’t make sense realistically for predicting the Y price, am I correct?
    3) For very close error ratio of X, Y combination is there any threshold(delta of more than 10% in error ratio between the two combination etc) beyond which we chose one combination over another?

    Best Regards,

    • Karthik Rangappa says:

      1) Yup, true if its -ve
      2) Yes. But do try changing Y and X and run the regression again
      3) Hmm, the close range should be ok. You can go with either.

  108. RC says:

    Hi Karthik,
    Thank you for the prompt response 🙂
    One more doubt : Between R1(St err of intercept/ St_error of residuals) and R2(intercept/actual value of Y) which ratio should be given more weightage for decision making?
    For a pair of stock I am finding the following values
    R1=0.64 and R2 =18% when regressing Y on X
    R1=0.72 and R2 = 0.10% when regressing X on Y
    Which one ideally should I choose?

    • Karthik Rangappa says:

      Tricky, since both are quite close. I’d suggest you establish the entire model and see which performs better when backtested.

  109. Vishal says:

    Hi Karthik,
    I have implemented the above mentioned strategy in python. I was found some weird trades while backtesting the strategy, for some of the trades PNL was actually increasing in +ve direction cases when Z-score(residual/std_err_of_residual) was approaching towards stoploss and also in some cases when the Z-score was approaching towards the mean PNL was becoming negative. Can this happen or I made a some mistake in the implementation(I’ve reviewed the code many times)?

    • Karthik Rangappa says:

      Congrats! I wish I had the skills to program stuff 🙂

      Profits cant increase when the trade approaches SL, I think there is some error here. In fact it is the opposite behaviour. Guess you will have to integrate through and identify this.

  110. Vishal says:

    Edit:(Excuse grammatical errors above (can’t edit) 😛 .

    Please see the following trade for the example:


    COEFF: 0.785840
    INTERCEPT: 220.509276
    RESIDUAL_STD_ERR: 102.657799

    ADF (p-value): 0.020033

    Entry: 2021-04-01
    STD_ERR(Z-score): -2.599084
    AUBANK(Y) LONG_ENTRY_PRICE : 1267.90 QTY: 500

    Exit: 2021-04-30
    STD_ERR(Z-score): -0.477996

    • Karthik Rangappa says:

      Yes, this is a possible situation wherein one of the stock moves in the opposite direction while the other stays flat. I’ve experienced this too. Nothing much that can be done with this. But you held this trade for 30 days? Dint you get a better exit during these days?

  111. Vishal Agrawal says:

    No no this trade is from backtest.

  112. Vijay G says:

    Thank you kartik sir, I am also completed this module and i took lots of notes from your pair trading module. Thank you for your help sir.

  113. Shubham Kaushal says:

    Hello Karthik,
    I, as a learner, would like to say that it has been splendid reading your articles on various topics. Thank you for such simple explanations!
    At the end of this chapter, you mentioned “multivariate regression.” Can you suggest to me some sources/references for it? I found research papers by googling, but I would like to read on some practical side of this.

    Thanks again!

    • Karthik Rangappa says:

      Thanks for the kind words, Shubham. I’m glad you liked the content. Unfortunately I could not find any good articles on this said topic, but I’m sure its hidden in few online forums, need to google it 🙂

  114. Anivrat says:

    Hi Kartik,
    I have implemented the entire algorithm but I there’s something I want to clarify. There are a lot of companies that do not have the entire trading data. Like, COFORGE got added to the index in Aug 2020 and since I am taking the data from Jan 2020 for rest of the IT companies, the regression throws me an error because the two data sets are not equivalent. So I entered NULL values to all the previous dates for companies where the data for a particular date is not available. So should I need to change the regression model as well? What do you suggest?

    • Karthik Rangappa says:

      Null values won’t help as they still skew the results (I think). But what other regression model will you use?

  115. Uday Mudholkar says:

    Hi Karthik
    I have some questions regarding the market data used in your example spreadsheets. I am trying to write my own algorithm without Excel or any other spreadsheet app. I have a different source of data than the one you mentioned in your article – NSE’s bhavcopy.

    1. I dowloaded the data for AXISBANK for the period given in your spreadsheet (2015-12-04 to 2017-12-04) and my data and your data matched exactly. But when I downloaded data for ICICIBANK and HDFC bank the data does not match. For e.g. your closing price for 2015-12-04 for ICICIBANK is 261.45 while I get it as 237.68. Following is that particular data point from my source:


    Same is the case for HDFCBANK. Can you shed some light on why this can be the case?
    2. I have checked on NSE website and they have a paid model with an annual fee. Until I complete my algorithm I don’t want to pay a yearly fee. Do you know if they have a ‘pay as you go’ model? The source I am using has a PAYG model.
    3. Is there any other source of data that you can recommend other than bhavcopy?

    Thanks & regards

  116. Anivrat says:

    Hi Karthik,
    This is with reference to the above comment regarding NULL Values. I am not sure which other model to use. So for now I have changed the algorithm a bit. Example, company A has 500 entries and company B has 350 entries. So I will take the latest 350 entries of company A with all 350 entries of company B and run my algorithm normally. Is this fine? Any corner cases which you might want to point out?
    Anyway, thank you so much for these awesome blogs. From not knowing what NIFTY is to doing well in the market on a regular basis, I have come a long way all because of you and your amazing teachings. Thanks a lot for this!

    • Karthik Rangappa says:

      Yes, Anivrat, that’s a reasonable way to move ahead. Take the 350 data points that match both the time series. I’m glad you liked the content and find it helpful. Good luck 🙂

  117. Arun says:

    where can i get stock data on an hourly granularity for a specifc date ? what im getting from BHAVCOPY has a min granularity of per day (OHLC)

  118. Sushil says:

    Residual calculated by regression and by subtracting Predicted price of Y from Actual Price of Y (As done in Position Tracker sheet) is coming different everytime. Why is it so? Am i missing something or its happening with everyone? Pls reply.

  119. Anuraag says:

    Hi Karthik,
    I want to understand how to clean the data wrt results, splits & bonus. Can you please help me understand 1) how to spot from past data and 2) How to clean them.


    • Karthik Rangappa says:

      Anuraag, you can consider subscribing to an authorized data vendor for this, you will get clean data from the vendor.

  120. Dhananjay says:

    Don’t we have to check whether or not 2 stocks are having more than +0.75 correlation? As we did in 1st PTM?

  121. Dhananjay says:

    I run ADF test in R suing URCA package. It’s working fine but I don’t know whether the results are correct or not. I did visual backtesting, it worked but again it was only one trade so I can’t just stamp it as correct one.

    I request you to have 1/2 chapters on this. It will be very helpful

    Thank you!

  122. Dhananjay says:

    So if my trades are working then I can say that calculations are right and can go ahead to deploy it

    • Karthik Rangappa says:

      Yes, but keep an eye on factors that you think are driving your profits or dragging your losses.

  123. Abhinav says:

    Hello Karthik,

    Quick question –
    Why is the stoploss at +/- 3 SD?
    Does that not mean that there is a 99.7% chance of reverting to the mean?
    Is that not a better place to initiate a trade than +/- 2 SD?
    Please explain.

    Thanks and Regards,

  124. Priyanshu Gupta says:

    Dear Sir, I have one question from the options topic.
    I want to ask,
    Suppose I bought a OTM call option (2-3 strike away from ATM). Now, from the two examples that you wrote in one of the module (1. During 2009 election your friend made whooping 28 lakhs, 2. On 24th aug 2015, market falls by 5.92%, but the premium of CE above strike price did not decline rather increased). Here I tried to come up with one result and want you to confirm whether am I right?
    The result is:
    Buy OTM call option, if you are certain that on a certain day the market is gonna move by a big percentage, but not sure in which direction.
    what this does is-
    If market gets up then our OTM call option will convert into ITM option and we will make decent profit.
    If market gets down then, as we have bought OTM option we have 2 things:
    1. OTM is of less delta.
    2. volatility increases on that day => vega increases => premium gets increase.
    In both the cases we will make some profits.

    • Karthik Rangappa says:

      If you are buying CE, especially a naked CE, then it implies that you are bullish. So you need to be sure about the direction. But that said, yes, if the volatility increases, then so does the premium of the options. But then, this is a bet on volatility, not the direction if you realize.

  125. Chaitanya LVK says:

    Hi Karthik,

    Great explanation & apologies for waking up the dormant comments section :). One thing that’s causing a little bit of confusion, though. When you are tracking it regularly once you have initiated the pair trade, are you running the linear regression every time you want to know where the trade stands. For example, from the “logs” screenshot above, on 21st May, you have 4 updates. Was the regression run 4 times to get the “std_err”. I am assuming that values is the z-score you are referring to.

    Thoroughly loving your explanations!

  126. Dhananjay says:

    Please clarify if this is correct

    1) I use beta to decide appropriate hedge ratio between two entities
    2) std error [(latest residual) / (residual std error)] is what I will be tracking for trading opportunities. When it hits 2.5+ I will go short on pair and when it hits -2.5 I will go long on pair
    3) Once I enter, I have to keep track of z-score. z-score is nothing but the [(latest residual) / (residual std error)]. Once z-score drops or rises to desired levels I will square of positions

    confused in z-score specifically. Please let me know if my interpretation of z score is correct!
    z-score = (latest residual value) / (residual std error)

  127. Dhananjay says:

    Thank you for your reply!

    Also I would like to share that I developed simple system in Python to check whether given two stocks are valid pairs or not. If yes then it gives all the parameters related to pair equation. Feeling good! 😊

  128. Dhananjay says:

    Is it okay to use same system on 15 Min data or so lower time frames?
    If yes then are there any other considerations I need to take into account for lower time frames?

  129. Harish says:

    Hi Sir, It is a great service kind of you are doing in this forum. Thanks for that. My query is if we have identified a trade with 2 years hist data on say 15 mar 2022. and my intercept, slope are available with me on 15 mar 2022 and entered a trade on 01 apr 2022. from 2nd apr onwards do i need to recalculate the slope and intercept and arrive at the z score or i can have the same intercept and slope ason 15 mar 2022 to calculate the zscore. plz clarify

  130. Harish says:

    do i need to calculate for 2 years daily or i can have the intercept and slope as same and continue to calculate on zscore daily basis till the trade closure. plz clarify

  131. Harish says:

    Thanks Karthik, I am have combined few softwares oracle and python to get this output. infact it is like magic to analyse and get the result of the whole market at a glance. I have learnt this chapter atleast 10 to 15 times to understand each line of this lesson to extract the deep of knowledge you explained in this chapters. Thanks for that and I shall ping you once i setup the complete system.

  132. Gaurav Kumar says:

    Hi Karthik,
    I’m trying to build pair trading system. The pair passes ADF test. My problem is std of residual is crossing more than 3 and sometimes it is going beyond 5-6. 99.7% of datapoint comes under +3 std to -3std. What should I do?
    Any tips?

    • Karthik Rangappa says:

      Gaurav, here are a few things you can try –

      1) Check if the data is clean. Adjusted for splits and bonuses.
      2) If it is just a few points, maybe you can ignore them?
      3) If you don’t want to ignore it, maybe backtest for performance and see how it results.

      Let me know.

  133. Rahul says:

    Hi Karthik,

    Is there any book you could suggest that goes deeper into pairs trading? I see that you had mentioned earlier in 2021 that there aren’t any good sources for Multivariate regression, have you found anything on the same since then? If not, is there a book that you could recommend to study Multivariate regression, I’m sure you must’ve gained this knowledge from a written resource.

  134. Ansh says:

    Hello sir. Consider that in the pair trading strategy I use the last 200 data points and find a trading opportunity on 18th May 2022 (the value crosses 2.5). Now, the next day the total number of trading points will equal 201. In some cases, the value of the previous day now becomes close to 2.4 and the system recommends to execute the trade today (today’s value crosses 2.5). The problem is that I am developing a code for this strategy and as a result this small detail leads to large changes in the P&L. What should I do?

    • Karthik Rangappa says:

      I’m not sure if I understand this right. So with the change in SD value, you initiate and close trades, right? So at more than 2.5 you’d have initiated a trade and lesser than 2.4 you’d close resulting in a profit or loss?

  135. Ansh says:

    Sir, I’ll expand on my example:
    Consider that on 18th May 2022 the SD value crosses 2.5. I get the SD value using the last 200 trading points. For instance, these are some sample SD values as of 18th May:
    15th May- 2.3
    16th May- 2.39
    17th May- 2.45
    18th May- 2.51

    Now, on 19th May, should I use the last 201 trading points (including the new data point obtained on 19th May)? If I do that then consider the following sample values:
    15th May- 2.28
    16th May- 2.35
    17th May- 2.37
    18th May- 2.46
    19th May- 2.53

    Now, the code changes the date for executing the date from 18th May to 19th May for calculating the P&L. How should I proceed?

  136. ABC says:

    Great stuff Karthik. Can you help with some material on multi variant regression?

  137. Rahul says:

    What are some free reliable sources of historical data?

  138. Anirban Basak says:


    We should trade only when the residuals hit +/- 2 SD. Now, you have also mentioned that we should trade when the standard error reaches +/- 2.5.

    Now, I would like to know that does +/- 2 SD implies directly to +/- 2.5 standard error and we would only have to track the standard error to spot the pair trading opportunity? So there is no need to separately track when the residuals hit +/- 2 SD. Right? Am I missing anywhere here?

  139. Anirban Basak says:


    Do you still able to update the pair data sheet/position tracker now on regular intervals? If it is, could you kindly share the link so that I may track and bet accordingly? I am lost somewhere.

  140. Anirban Basak says:


    In the second method of pair trading, the most challenging thing should be the set up in excels/programming. In between, since, I am in learning phase, can I start the pair trade with the first method instead? How is the results of the method?

    I thought of slowly approaching towards the 2nd method when the set up gets completed in excel/program. Please suggest.

    • Karthik Rangappa says:

      Thats also ok. But 2nd method is preferred, but I understand, it is way more complicated 🙂

  141. Manish Maurya says:

    how to bhavcopy to make automated excelsheet??

  142. Anirban Basak says:


    In the RVT method, you have automated fetching inputs for ystock,xstock,intercept,beta,adf_test_pvalue,std err,sigma.
    In this context , I would like to understand the below:

    (i) If today say for two stocks A & B, it satiesfies A=mB+c based on the error ratio, is there a probability for satisfying B=Ax+C tomorrow? If this could come out true, then is it necessary to track “ystock,xstock” for everyday? If this is true to track ystock,xstock” for everyday, then should we have to track the error ratio of the pairs on everyday to figure out the dependent and independent variable too?

    (ii) Based on everyday closing price, the “intercept, beta,adf_test_pvalue,std err and sigma” will also be changed. So, do we require to track all these on a daily basis?

    Please help with the above understanding.

    • Karthik Rangappa says:

      1) Yes, you will have to track the stocks everyday, especially in times when the volatility has increased and the price changes are raipd.
      2) Usually traders have a program to track the changes on a daily prices and update the same.

  143. Anirban Basak says:


    In RVT, you have mentioned to trade the residuals once hit +/-2.5SD and accordingly close once hit the target/stoploss. However, finally you have suggested to track the std error once it attains a value of +/- 2.5 and accordingly close once hit the target/stoploss. In this context, I require the below understanding:

    (i) Does it imply then that +/-2.5SD is equivalent to hitting +/- 2.5 value of the standard deviation?

  144. Anirban Basak says:


    While we require to calculate the standard error on daily basis, we know std err= today’s residual val/std err of the residual. Now, which data should we take for today’s residual value? Is it the residual value shown in the Observation No.1 or the data shown at the last in the observation? Please advise.

  145. Anirban Basak says:


    (i) Suppose the ADF value between two pairs is 0.01 on a certain day for the residuals. Now, can it change above 0.05 on some other day?

    (ii) Suppose on a certain day, I find the std error val to be +2.75 and a intercept value within a good tolerance. However,the ADF is 0.1. Should we go for taking trade?

    (iii) Suppose on a certain day, I find the std error val to be +2.75 and the ADF is 0.01 and a intercept value within a good tolerance and I take the trade. Then on the following day, should we necessarily have to track the ADF value/intercept or since the trade set up is already taken, we have to only track the std error till the stoploss/target is hit?

    • Karthik Rangappa says:

      1) All these values can change if the intraday prices changes drastically.
      2) You’d be breaking the system rules if you look for other triggers. In any trading system, you need to have consistency in terms of initiating, risk management, and closure of trade. If you are going with ADF as a trigger to initiate a trade, then I’d suggest you stick to that itself.
      3) Yes, to keep an eye on how prices have changed and therefore its statistical impact.

  146. Anirban Basak says:


    (i) Please let know if I am correct-In Mark Whistler’s, two stocks should besimilar companies/business and that the correlation between them should be positive.

    (ii) Please let know if I am correct-Now, in case of RVT method, if the p value of the ADT test is less than or equals 0.05, then it means that the series is stationary and that it can be recognized as a pair to trade on.

  147. Anirban Basak says:


    Reposting for this if you may help with a little more elaborate please:

    Suppose on a certain day, I find the std error val to be +2.75 and the ADF is 0.01 and a intercept value within a good tolerance and I take the trade. Then on the following day, should we necessarily have to track the ADF value/intercept or since the trade set up is already taken, we have to only track the std error till the stoploss/target is hit?

    I mean to say that while taking the trade all the ADF (less than 0.05),std err and intercept remains favorable. As per rule, we hve to track the std err for the reversion. Now, on the next day, the ADF gets 0.1. Shoudl we exit or since the trade is already taken, we have to just now keep track of the std err?

    Please help.

  148. Anirban Basak says:


    This is with regards to pair trading-

    Suppose we took trade on 25-MAR-2023 with the current month futures where 27MAR2023 is the current month expiry date. Now, we have the equation as- HDFC=f(ICICI). The std err touched +2.7 and we short sell. Now, we sell HDFC and buy ICICI with beta neutrality. However, it is seen that although 27MAR passed by but the std err don’t converge. In this respect, should we do the below:

    (i) We should buy HDFC and sell ICICI of the current month just before the expiry. Then we should again sell the HDFC and buy ICICI futures of APRIL (which should be the current month now) in same beta neutrality. Now, we should keep on tracking as long as it either hits target/stoploss.

    (ii) If the above is correct, can we make loss during the roll over? If yes, is there any way to mitigate or overcome?

    (iii) If we want to avoid rollover, is it feasible to trade the pair with next month future?

    (iv) Is the same way to be followed for calendar spread too if it doesn’t hit target/stoploss?

    • Karthik Rangappa says:

      1) Yes, you can roll over of neither the target or the SL is hit.
      2) Yes, that is a possibility too. One thing that that when you roll over, your future prices are different and your spread may not be the same. So do watch out for that. Try your best to avoid situations like rollover.
      3) Yes, but again, check the spread before executing
      4) Yes.

  149. Anirban Basak says:


    While calculating the std err (=Today’s residual/standard error of residual) you have taken today’s residual value for Observation No.1. However, I feel today’s residual value should be of the last observation No and not the first Observation No. Could you kindly help to understand here? Else, my whole understanding may fetch bad result.

  150. Anirban Basak says:


    You suggested to avoid rollover as much possible. So, in either pair trading/calendar spread, will it be a better option to trade with the Near month/far month specially for liquid assets (Index/Stock futures like HDFC/TCS etc.). What will be your advice here?

    • Karthik Rangappa says:

      Thats right, pick the instrument based on the time to expiry. If there is lot of time to expiry, pick the current month series, which is the most liquid. Otherwise, select the next month series.

  151. Terminator says:

    which library will required daily closing stock price from Nse
    and how yo automate scripts on daily basis on excel like Downloading daily prices or bhavcopy

  152. Krishna says:

    Hi Karthik,
    What is the number of data points do we consider for forming the pairs? Is there a certain answer and why did you consider almost a year of data for the pair formation?

    • Karthik Rangappa says:

      A year’s worth of data is reasonable. One of the important factors to consider is how the pair is behaving in recent times…given this there is no point going into past years.

  153. Krishna says:

    Hi Karthik,
    I am trying to automate pairs trading in python. I need help in building the conditions for back testing. I am using past 12 months data to fix pairs and the coefficients. I use the next 6 months data as the trading period. Now I get negative PnL’s due to the fact that some pair’s trade are not settled due to the threshold condition not reached at the end of the trading period. What should I do about it?

    My idea: Should I stop opening trades at month 5 out of the 6 months? Will it work?

    Do you have any idea to help me with this?

    Thanks in advance,

    • Karthik Rangappa says:

      Maybe try and optimize for the threahold conditions instead of restricting trades on 5th month? The reason is that most of these trades are short term in nature, restricting on 5th month may mean you are losing signals. What if all signals are clustered around the 5th month for a given pair?

  154. Krishna says:

    Hi Karthik,
    Should I use the log prices to use in the cointegration test or normal price will suffice? Can you also explain the importance of the respective answer.

    • Karthik Rangappa says:

      I remember using normal prices, but dont have a reason to explain why I dint take log prices 🙂

      But maybe you can use both to see which one fits the model better?

  155. Krishna says:

    Hi Karthik,
    How do I calculate return for a pair? Can you explain using an example?

  156. Suman Kundu says:

    “What if Stock A with Stock B is not stationary, but instead Stock A is stationary with stock B & C as a combined entity?”

    Can you give me an practical example for this statement? cant get this one

  157. Suman Kundu says:

    One More Question Sir,

    we all witness about splitting stocks or any event like dividend giving etc. then data was disturbed, then how do we clean up the data?

    I know this is data science part, But I just want to know the basics

  158. Krishna says:

    Hi Karthik,
    From your experience, do you think the cointegration method is better than the basic ratio method?

Post a comment