Module 10 Trading Systems

Chapter 11

PTM2, C4 – The ADF test


11.1 – Co-Integration of two-time series

I guess this chapter will get a little complex. We would be skimming the surface of some higher order statistical theory. I will try my best and stick to practical stuff and avoid all the fluff. I’ll try and explain these things from a trading point of view, but I’m afraid, some amount of theory will be necessary for you to know.

Given the path ahead I think it is necessary to re-rack our learnings so far and put some order to it. Hence let me just summarize our journey so far –

  1. Starting from Chapter 1 to 7, we discussed a very basic version of a pair trade. We discussed this simply to lay out a strong foundation for the higher order pair trading technique, which is generally known as the relative value trade
  2. The relative value trade requires the use of linear regression
  3. In linear regression, we regress an independent variable, X against a dependent variable Y.
  4. When we regress – some of the outputs that are of interest are the intercept, slope, residuals, standard error, and the standard error of the intercept
  5. The decision to classify a stock as dependent and independent really depends on the error ratio.
  6. We calculate the error ratio by interchanging both X and Y. The one which offers the lowest error ratio will define which stock is X and which on as Y.

I hope you have read and understood everything that we have discussed up to this point. If not, I’d suggest you read the chapters again, get clarity, and then proceed.

Recollect, in the previous chapter, we discussed the residuals. In fact, I also mentioned that the bulk of the focus going forward will be on the residuals. It is time we study the residuals in more detail and try and establish the kind of behavior the residuals exhibit. In our attempt to do this, we will be introduced to two new jargons – Cointegration and Stationarity.

Generally speaking, if two time series are ‘co integrated’ (stock X and stock Y in our case), then it means, that the two stocks move together and if at all there is a deviation from this movement, it is either temporary or can be attributed to a stray event, and one can expect the two time series to revert to its regular orbit i.e. converge and move together again. Which is exactly what we want while pair trading. This means to say, the pair that we choose to pair trade on, should be cointegrated.

So the question is – how do we evaluate if the two stocks are cointegrated?

Well, to check if the two stock is cointegrated, we first need to run a linear regression on the two stocks, then take up the residuals obtained from the linear regression algorithm, and check if the residual is ‘stationary’.

If the residuals are stationary, then it implies that the two stocks are cointegrated, if the two stocks are cointegrated, then the two stocks move together, and therefore the ‘pair’ is ripe for tracking pair trading opportunity.

Here is an interesting way to look at this – one can take any two-time series and apply regression, the regression algorithm will always throw out an output. How would one know if the output is reliable? This is where stationarity comes into play. The regression equation is valid if and only if residuals are stationary. If the residuals are not stationary, regression relation shouldn’t be used.

Speculating and setting up trades on a co-integrated time series is a lot more meaningful and is independent of market direction.

So, essentially, this boils down to figuring out if the residuals are stationary or not.

At this point, I can straight away show you how to check if the residuals are stationary or not, there is a simple test called the ‘ADF test’ to do this – frankly, this is all you need to know. However, I think you are better off if you spend few minutes to understand what ‘Stationarity’ really means (without actually deep diving into the quants).

So, read the following section only if you are curious to know more, else go to the section which talks about ADF test.

11.2 Stationary and non-stationary series

A time series is considered ‘Stationary’ if it follows three 3 simple statistical conditions.  If the time series partially satisfies these conditions, like 2 out of 3 or 1 out of 3, then the stationarity is considered weak. If none of the three conditions are satisfied, then the time series is ‘non-stationary’.

The three simple statistical conditions are –

  • The mean of the series should be same or within a tight range
  • The standard deviation of the series should be within a range
  • There should be no autocorrelation within the series – this means any particular value in the time series – say value ‘n’, should not be dependent on any other value before ‘n’. Will talk more about this at a later stage.

While pair trading, we only look for pairs which exhibit complete stationarity. Non-stationary series or weak stationary series will not work for us.

I guess it is best to take up an example (like a sample time series) and figure out what the above three conditions really mean and hopefully, that will help you understand ‘stationarity’ better.

For the sake of this example, I have two-time series data, with 9000 data points in each. I’ve named them Series A and Series B, and on this time series data, I will evaluate the above three stationarity conditions.

Condition 1 – The mean of the series should be same or within a tight range

To evaluate this, I will split each of the time series data into 3 parts and calculate the respective mean for each part. The mean for all three different parts should be around the same value. If this is true, then I can conclude that the mean will more or less be the same even when new data points flow in the future.

So let us go ahead and do this. To begin with, I’m splitting the Series A data into three parts and calculating its respective means, here is how it looks –

Like I mentioned, I have 9000 data points in Series A and Series B. I have split Series A data points into 3 parts and as you can see, I’ve even highlighted the starting and ending cells for these parts.

The mean for all the three parts are similar, clearly satisfying the first condition.

I’ve done the same thing for Series B, here is how the mean looks –

Now as you can see, the mean for Series B swings quite wildly and thereby not satisfying the first condition for stationarity.

Condition 2 -The standard deviation should be within a range.

I’m following the same approach here – I will go ahead and calculate the standard deviation for all the three parts for both the series and observe the values.

Here is the result obtained for Series A –

The standard deviation oscillates between 14-19%, which is quite ‘tight’ and therefore qualifies the 2nd stationarity condition.

Here is how the standard deviation works out for Series B –

Notice the difference? The range of standard deviation for Series B is quite random. Series B is clearly not a stationary series. However, Series A looks stationary at this point. However, we still need to evaluate the last condition i.e the autocorrelation bit, let us go ahead and do that.

Condition 3 – There should be no autocorrelation within the series

In layman words, autocorrelation is a phenomenon where any value in the time series is not really dependent on any other value before it.

For example, have a look at the snapshot below –

The 9th value in Series A is 29, and if there is no autocorrelation in this series, the value 29 is not really dependent on any values before it i.e the values from cell 2 to cell 8.

But the question is how do we establish this?

Well, there is a technique for this.

Assume there are 10 data points, I take the data from Cell 1 to Cell 9, call this series X, now take the data from Cell 2 to Cell 10, call this Series Y. Now, calculate the correlation between Series X and Y. This is called 1-lag correlation. The correlation should be near to 0.

I can do this for 2 lag as well – i.e between Cell 1 to Cell 8, and then between Cell 3 to Cell 10, again, the correlation should be close to 0. If this is true, then it is safe to assume assumed that the series is not autocorrelated, and hence the 3rd condition for stationarity is proved.

I’ve calculated 2 lag correlation for Series A, and here is how it looks –

Remember, I’m subdividing Series A into two parts and creating two subseries i.e series X and series Y. The correlation is calculated on these two subseries. Clearly, the correlation is close to zero and with this, we can safely conclude that Time Series A is stationary.

Let’s do this for Series B as well.

I’ve taken a similar approach, and the correlation as you can see is quite close to 1.

So, as you can see all the conditions for stationarity is met for Series A – which means the time series is stationary. While Series B is not.

I know that I’ve taken a rather unconventional approach to explaining stationarity and co-integration. After all, no statistical explanation is complete without those scary looking formulas. But this is a deliberate approach and I thought this would be the best possible way to discuss these topics, as eventually, our goal is to learn how to pair trade efficiently and not really deep dive into statistics.

Anyway, you could be thinking if it is really required for you to do all of the above to figure out if the time series (residuals) are indeed stationary. Well, like I said before, this is not required.

We only need to look at the results of something called as the ‘The ADF Test’, to establish if the time series is stationary or not.

11.3 – The ADF test

The augmented Dickey-Fuller or the ADF test is perhaps one of the best techniques to test for the stationarity of a time series. Remember, in our case, the time series in consideration is the residuals series.

Basically, the ADF test does everything that we discussed above, including a multiple lag process to check the autocorrelation within the series. Here is something you need to know – the output of the ADF test is not a definitive ‘Yes – this is a stationary series’ or ‘No – this is not a stationary series’. Rather, the output of the ADF test is a probability. It tells us the probability of the series, not being stationary.

For example, if the output of the ADF test a time series is 0.25, then this means the series has a 25% chance of not being stationary or in other words, there is a 75% chance of the series being stationary. This probability number is also called ‘The P value’.

To consider a time series stationary, the P value should be as low as 0.05 (5%) or lower. This essentially means the probability of the time series is stationary is as high as 95% (or higher).

Alright, so how do you run an ADF test?

Frankly, this is a highly complex process and unfortunately, I could not find a single source online which will help you run an ADF test for free. I do have an excel sheet (which has a paid plugin) to run an ADF test, but unfortunately, I cannot share it here. If I could, I would have.

If you are a programmer, I’ve been told that there are Python plugins easily available to run an ADF test, so you could try that.

But if you are a non-programmer like me, then you will be stuck at this stage. So here is what I will do, once in a weak or 15 days, I will try and upload a ‘Pair Data’ sheet, which will contain the following information of the best possible combination of pairs, this includes –

  1. You will know which stock is X and which stock is Y
  2. You will know the intercept and Beta of this combination
  3. You will also know the p-value of the combination

The look back period for generating this is 200 trading days. I’ve restricted this just to banking stocks, but hopefully, I can include more sectors going forward. To help you understand this better, here is the snapshot of the latest Pair Datasheet for banking stocks –

The first line suggests that Federal Bank as Y and PNB as X is a viable pair. This also means, that the regression of Federal as Y and PNB as X and Federal as X and PNB as Y was conducted and the error ratio for both the combination was calculated, and it was found that Federal as Y and PNB as X had the least error ratio.

Once the order has been figured out (as in which one is Y and which one is X), the intercept and Beta for the combination has also been calculated. Finally, the ADF was conducted and the P value was calculated. If you see, the P value for Federal Bank as Y and PNB as X is 0.365.

In other words, this is not a combination you should be dealing with as the probability of the residuals being stationary is only 63.5%.

In fact, if you look at the snapshot above, you will find only 2 pairs which have the desired p-value i.e Kotak and PNB with a P value of 0.01 and HDFC and PNB with a P value of 0.037.

The p values don’t usually change overnight. Hence, for this reason, I check for p-value once in 15 or 20 days and try and update them here.

I think we have learned quite a bit in this chapter. A lot of information discussed here could be new for most of the readers. For this reason, I will summarize all the things you should know about Pair trading at this point –

  1. The basic premise of pair trading
  2. Basic overview of linear regression and how to perform one
  3. In linear regression, we regress an independent variable, X against a dependent variable Y.
  4. When we regress – some of the outputs that are of interest are the intercept, slope, residuals, standard error, and the standard error of the intercept
  5. The decision to classify a stock as dependent and independent really depends on the error ratio.
  6. We calculate the error ratio by interchanging both X and Y. The one which offers the lowest error ratio will define which stock is X and which on as Y
  7. The residuals obtained from the regression should be stationary. If they are stationary, then we can conclude that the two stocks are co-integrated
  8. If the stocks are cointegrated, then they move together
  9. Stationarity of a series can be evaluated by running an ADF test.

If you are not clear on any of the points above, then I’d suggest you give this another shot and start reading from Chapter 7.

In the next chapter, we will try and take up an example of a pair trade and understand its dynamics.

You can download the Pair Data sheet, updated on 11th April 2018.

Lastly, this module (and this chapter, in particular) could not have been possible without the inputs from my good friend and an old partner, Prakash Lekkala. So I guess, we all need to thank him 🙂

Key takeaways from this chapter –

  1. If two stocks move together, then they are also cointegrated
  2. You can pair trade on stocks which are cointegrated
  3. If the residuals obtained from linear regression is stationary, then it implies the two stocks are co-integrated
  4. A time series is considered stationary if the series has a constant mean, constant standard deviation, and no autocorrelation
  5. The check for stationarity can be done by an ADF test
  6. The p-value of the ADF test should be 0.05% or lower for the series to be considered stationary.


  1. Deepu says:

    Thanks Karthik.
    Excellent writeup!!!
    Is it possible to have the complete list of upcoming chapters to know where we are with regard this pair trade journey ?
    Can we expect a few more chapters this month ? Sorry for being greedy…


    • Karthik Rangappa says:

      Deepu, glad you liked it.

      I don’t plan for it in advance, but generally, go with the flow. To give you a rough idea, the next step would be to take up an example of a trade and try and put all the learning together. Hopefully, that will be exciting enough 🙂

      • Deepu says:

        Thanks Karthik.

        Request you to share 5-6 examples so that it covers most of the leanings. Further what is the ADF plugin cost and where to buy it from ? Please share the details.


        • Akshay Hire says:

          You can use R Studio package to run ADF test. There is a package called “urca” in r studio which enables this test.

        • Karthik Rangappa says:

          The idea is to share a couple of live examples. Will share the other details as we progress.

          • swapnil says:

            Dear Karthik,

            As you said I have stuck on ADF test. no friend with programming knowledge.
            If I need to buy a paid plugin then how much it will cost?
            Any other way to do the ADF test?

          • Karthik Rangappa says:

            Swapnil, unfortunately, I have not evaluated any paid versions. So cannot really comment. Thanks.

  2. Anil Gowda says:

    I’m glad to know new learning with your guidance. Seriously Its very educative and informative.
    Thanks for Enlightenment us.

  3. KM says:

    Thanks prakash lekkala sir and karthik sir for your effort..

  4. Muralidhar says:

    Thank You Karthik sir,
    Even though ADF test is not available , you have taught us how to calculate Stationarity using excel by dividing the data in to parts and calculate Mean,SD and 2 Lag correlation.But please mention how much variation in Mean,SD which would represent ‘p’value of 0.05 (rough estimate).

    • Akshay Hire says:

      You can run ADF test in R software, load package called “urca” in R. It’s really easy in R.

    • Karthik Rangappa says:

      For mean – I’d suggest a tight variation, not more than 3-5 points difference. For SD, technically you will have to look at the standard error of the standard deviation, but then, it may just get a little overboard. Stick to -5-10% at the most. This should result in a pvalue less than 0.05%.

  5. Aditya says:

    Can you please upload the PDF of all the chapters shared so far?

  6. Mainak Mukherjee says:

    Hello Karthik,

    Thanks to you and Prakash for taking the pain to make us understand this chapter. Overall I am thoroughly enjoying this module. However, I have few questions in my mind while going thru’ this chapter. Hope you can clarify the doubts here.

    1. You mentioned that the look back period is 200 trading days. When I am calculating the pair (let’s say PNB as x and Kotak Bank as Y), the Intercept coefficient I am arriving at is in the vicinity of 1111. However, in the sheet you shared it is around 1099. My data range is starting from 23rd June, 2017 till 13th Apr, 2018. Am I missing anything here. I am following the same procedure which you mentioned in chapter 9.

    2. When I am calculating the p-value (using the python in-built packages), for the period as mentioned above – it is coming around .40 instead of .01. Not sure why such a huge difference. Can you please elaborate if there are any additional parameters go into calculating the p-value in your case.


    • Karthik Rangappa says:

      1) How did you source the data? Did you get it from Pi? Make sure its clean for splits and bonuses, if any
      2) Not sure about this, will try and see why this could be happening.

      • Mainak Mukherjee says:

        I took the data from Yahoo finance. Generally it’s adjusted for split and bonuses. But I will take it from Pi and do the calculation once again.

        • Karthik Rangappa says:

          Ok. Also, we have considered the data from 20th June 2017 to 10th apr 2018. The intercept difference is due to that I guess. Also, as you may have figured, in most ADF functions, one needs to give a lag. In our case its 5. Recommend value is the cube root of the length of data points (or thereabouts). Since we had 200 data points, cube root is 5.8, decided to go with 5.

  7. Akash Patel says:

    Thanks Prakash and Kartik..
    For p.value i use amibroker. Cointegration is not inbuilt indicator for p value so we have to outsource the data to pythone from ami . For that search “how to calculate cointegration in amibroker” on, there is v.good step by step explanation on that.
    I find nifty/banknifty, ambujacem/acc and tatamtrdvr/tatamotors very stationary pairs to trade even on 60min chart too..
    I keep searching stocks in same sectors only.

    the p value for axis/icici showing 0.00 all time i look, what does it mean? Is it 100% probability that its mean reverting?

    And once again thanks u both of you.

  8. akash patel says:


    – while studying co-integration i find web pages for pure calculation of how to calculate co-integration, frankly i could not understand any of math symbol and calculation.

    – regarding 0.5 or 0.05 about p-value, 0.05 is confirm. but the afl i m using with ami is simply outsource the data to python servers and displays the coint value to amibroker afl window. i think we should divide the displayed value with 10. because some good pair with my experiment showing the coint value of 0.20…so i m taking it as 0.02…and its working fine. any python coder can crack its afl and throw some more light about it. in this my data starts from 12/4/17 to 26/4/18 almost 252 days value is 0.08 so i will take it as 0.008 and in this tatamtrdvr/tatamotors pair coint showing 0.18 so i m taking it as 0.018…u don’t believe that this pair is so tight spread that i m trading it on 60min basis and touch wood earning good money….pl click below link for hourly chart profit in hourly chart is less than day chart but there are plenty of trading opportunities..(on 5lac both side u can earn around 2500-brkrg in 2-3 days)

    previously i started all good stocks pair, then after experience i narrow down it to good banking stock (total 104 pairs possible), but since last 6 month i narrowed down it further and i trade only nf/bnf, tatamtrdvr/tatamotors, hdfc/hdfcbank and acc/ambuja only.

    besides theory any experience trader can tell after watching 2-5 yrs of daily chart that this pair is regularly mean-reverting or not?

    after all earning consistent money is anyone’s motto here…


    • Karthik Rangappa says:

      Hey, Akash thanks for the insights. Nothing beats practical market experience 🙂
      Btw, what makes you divide the p-value by 10?
      I was not aware of TM DVR future, I’m sure it opens a window of opportunity.

      • akash patel says:

        Hi, Kartik,
        dividing p-value by 10 was just my simple logic/intuition from motors/dvr exp…the long term chart showing its very tight spread and it regularly cross its mean, and coint value showing me 0.11 to 0.45 range, if we took this value this pair is not reliable for pair trade, but if we divide it by 10 we will get 0.011 and 0.045 and that would be v.good for pair trade. and second example was bn/n its coint showed me 0.15 to 0.68 but practically if we see that is also verygood pair. thats why i came on that conclusion that i should divide it with 10. i dont have any knowledge to write or read and understand AFL language… i believe in KISS. and i never regret for it. but i still not understand why axis/icici coint showing 0.00 value? that project is still under process…will let you know. mean time i m searching who has sound AFL coding understanding to crack the python AFL. hope this will help.

  9. Joshan says:

    Hi Kartik,

    I m a regular student of Zerodha Varsity. I am wondering whether it is possible for you to have a separate chapter for “How Stock/Financial Market Operates” which cover basically the mechanism of stock market like Market Makers, Clearing Agents, etc. ( as there are many other components who operate on the back stage of Market and I’m just mentioning couple of them that I know. Hoping you will cover the rest) How they operate and who they are on the context of Indian market.

    There is not so much stuff available online also on this subject. I personally think that one should have knowledge about the mechanism which will broad our knowledge and I believe knowledge is Power.

    Best regards,

  10. Sundeep says:

    Sir I know this is not a proper question but this is just eating away at me. A few days back Airtel had announced its results and it was bad. But it was better than what the market was expecting. Still the next day, the share went up. What do you think caused this?

    • Karthik Rangappa says:

      Sundeep, the same thing happened y’day with Axis. The past was bad but the future looks good. Remember, future is what the stock markets always looks at 🙂

      • Sundeep says:

        Sir what you just said only brings me a few more questions sir. I’m sorry to pester you like this.
        1. You once said when you give market good news and bad news, it always reacts to good news first. By that logic, don’t you think the shares of Airtel and Axis should have gone down?
        2. In the hindsight, do you think you could have predicted that even if those two companies posted bad earnings, it is going to go up? (I’m just asking that to see if seasoned traders can do that, since I had no clue it was a possibility. )
        3. Is there a method to associate a particular news to its reaction to stock price. For example, in the above case, what were the factors that led to stock moving up?

        I hugely appreciate what you’re doing to help fellow traders like myself. Varsity is a treasure and I encourage my friends to read it as well. Thanks in advance for the answer.

        • Karthik Rangappa says:

          1) The market always looks at futures, Sundeep. So, they expect a better outlook for these stocks as they believe the worst could be over. But your guess is as good as mine
          2) My colleague actually had a bet with another colleague that the stocks would go up the next day 🙂
          3) This is largely depended on your experience reading the markets

          Happy to note that Sundeep, keep learning 🙂

  11. Madhavan says:

    Fantastic series Karthik. I had not been here for a while and had to skim through to get here. You have successfully managed to keep it as a easy read. Hats off.

    Considering that I am starting my journey as a full time trader n a month, I see myself coming here more often.
    Do you have tools within kite to figure out cointegration and other analysis ? Are you considering bringing any capability around it?

    • Karthik Rangappa says:

      Glad to note that, Madhavan 🙂

      Unfortunately, we don’t have coint tools within Kite. Trying my best to figure out an alaternative.

  12. Sundeep says:

    Sir I have a very personal question to ask you. But since it relates to mindset of a good trader I decided to ask you anyway. How do you feel when your fellow trader made more money, assuming you started out with same amount of capital. I know I felt really bad when it happened to me. How do you deal with that?

    • Karthik Rangappa says:

      Sundeep, this is personal. The way I react maybe different from the way another. I think you should be happy since you can always check with your friend on what went right for him and learn from his success. End of the day, the only way to move ahead in markets is by having an open mind to learn and adapt. Good luck and keep learning 🙂

  13. Sundeep says:

    Sir you’ve written exhaustive text on trading using Technical and Quantitative methods. Can you write a module on trading using Fundamental analysis (based on earnings or news). If not, can you give some methodology on how to learn them ?

  14. KUMAR MAYANK says:

    Hello sir
    We are eagerly waiting for the next chapter.

  15. Sumon Sadhukhan says:

    Thanks for providing wonderful modules in Varsity. I have some queries listed below:
    When the next chapter will come?
    How much time it will take to complete the entire module and how many more chapters will be added?
    Can you please name some reference books or resources for a deeper understanding of Trading system and coding one by himself?


  16. KUMAR MAYANK says:

    Hello sir
    I have have installed EViews statistical package for one year trial period 🙂 In the “lag length” drop menu of ADF test section there are many options available like Schwarz Info Criterion, Hann-Quin criterion, Modified Akaik, T- static each giving different P value for the same max lag of 15. You may see it here:
    Even it gives value below the threshold value of 0.05 the header reads as Null Hypothesis: Residual has a unit root. If the series has a unit root how could it be a stationary series? I have taken a screen shot here:
    Varsity student

  17. Sundeep says:

    Sir who is your favorite trader? The one may be you try to emulate?

    • Karthik Rangappa says:

      I’m fortunate enough to sit with the best trader I know, Nithin Kamath 🙂
      Lots of learning, not just about trades in markets but also trades in real life 🙂

  18. KUMAR MAYANK says:

    Hello sir
    My question is about updating the pair data everyday. I run regression analysis and copy the residual data and paste in another sheet where I analyse density curve. I’m repeating the same actions everyday. Can you suggest me some smart way to keep my excel sheet updated?
    Next question: Residual, i get everyday, slightly differs than that of previous day albeit the difference is at third or fourth places after decimal. Should i paste the whole set of data or instead add one day data to the already existing column?
    Varsity student

    • Karthik Rangappa says:

      I understand, Kumar. This actually needs some programming help and unfortunately, I can be of very little help in that perspective. You can update the latest close to get the latest position of the residuals.

      • KUMAR MAYANK says:

        Hello sir
        Weel, my quest for updating data fast has got some success. I learnt to use macros but it runs on the fixed amount of data. I mean if i recorded to perform on 255 set of data then it can’t run on 256 set of data.
        Now my excel sheet has become dynamic. Whenever i add new data (today’s close price) the oldest data in the column gets deleted on its own and i have the same number of data but different a starting date.
        I want to know if there is any issue with such dynamic updating of data. Hope you got me 🙂
        Varsity student

  19. Mani says:

    Hello sir for testing my algo, can you tell me the ADF test p_value of HDFC and ICICI pair, I am getting 0.007, is it correct?

  20. Vinay says:

    Sir what is the name of the excel plugin that you use to perform ADF test.

    • Karthik Rangappa says:

      I’m not sure if there is an excel plugin, Vinay.

      • Vinay says:

        So can both method be used in intraday as well? What will be the data series in that case.Will it be 15 min close price in case of 15min chart.Which period would be more reliable daily or intraday? And what should be the profit expectection in case of intraday in percentage terms? Lastly are there some other pair trading method apart from the two (btw I found these two mehod very informative and practical)u showed us and can u suggest some books or reading for same.Thanks in advance

        • Karthik Rangappa says:

          No Vinay, I would not suggest you do this for intraday. These pairs trades need time to evolve and this happens over 3-4 days. However, I have opened and closed pair trades on intraday basis, but this has happened due to luck and not design.

  21. PS says:

    Hi Karthik,

    Good series on Pair Trading. Got me hooked!!
    Just a small query..when you say data should be adjusted for bonus, split, dividend etc..Where do we get such data..I am Importing the data from NSE website..So can it be considered clean or else can you give any other source to obtain clean data?

    • Karthik Rangappa says:

      Yes, Pranay, NSE Bhavcopy usually has clean data.

      • Deepu says:

        Hi Karthik,

        I checked the NSE bhav copy which is published daily and does not have the adjusted price if you go back and pull the same file from the earlier dates like for example TCS whose price was recently got changed due to bonus .

        Can you please share the link or tell me where to get the adjusted price ?

        Thanks for the assistance.


        • Karthik Rangappa says:

          In that case, you need to evaluate a dedicated data vendor who will clean up the data for you. Check

  22. Selvakumar S P says:

    Hi Karthik,
    What is the input for ADF Test?

    200 Day Stock price data or any calculated intercept, residual data?

    • Karthik Rangappa says:

      ADF test checks for the stationarity on the residuals. So you yeah, the inputs for the ADF test is the residuals.

  23. Pratik says:

    Hi Karthik,

    Really thankful for all your efforts. I am learning loads from it.

    I am trying to reproduce all the steps which you mentioned in this blog. I have downloaded the excel sheet in which you have provided HDFC Bank and ICICI Bank data from 4th Dec 2015 until 4th Dec 2017. I have calculated slope, intercept, Standard Error and Standard Error of Intercept and finally the Error Ratio.

    To calculate P value on time series data of residuals, I have used R language. There is a function adf.test() which executes ADF test on the given data. However, when I run the test, I receive the data as follows:
    P value for residual of ICICI bank(Y) = 0.03729
    P value for residual of HDFC bank(Y) = 0.08545

    However, in your post, as you mentioned other results e.g. slope, intercept, Standard Error and Standard Error of Intercept and finally the Error Ratio, I could compare them to verify if the calculations that I am running are correct or not.

    Can you please run the test on the same data and please confirm if the P values which I have received are correct or not? For reference, I run adf.test(c(The time series residual data here….)) function. Without passing any other arguments to this function. There are arguments by which Lag parameter can be defined. But I was not sure about that so ran the function with default arguments. Can you or someone from your team confirm if values which I have received are correct or not? If not then how exactly are they using R to get P values?

    Thanks and regards,


    • Karthik Rangappa says:

      Pratik, can you download the pair data sheet here –, I think this is for 12th June 2018. Compare your results with other pairs as well besides HDFC Bank and ICICI.

      • Pratik says:

        I got the latest excel sheet on 12th Jun 2018. Thanks for your pointer. Can you please clarify the From and To date of the data used for calculation of this sheet? So that I can use the same data and match the results precisely. Current I am using HDFC bank and ICICI bank data from 4th Dec 2015 until 4th Dec 2017 which you have shared in an excel sheet.

      • Pratik says:

        By reading your note for programmers in the following chapter I got that you are using last 200 days of data. Considering that you have data published last Pair data excel sheet which you have pointed out to me in your comment, I figured out that you are using last 200 days of data starting from 23th Aug 2017 till 12th Jun 2018. Now when I run my calculation on it, I could match values of beta, intercept, Std. Error and Sigma, precisely to decimal points. That gives my confidence that my calculations are correct. However, when I pass time series residual data of HDFC to adf.test function in R, the outcome is:

        Dickey-Fuller = -3.1394, Lag order = 5, p-value = 0.09956
        alternative hypothesis: stationary

        The p-value does not match with the excel sheet value of 0.2073132413. Can you please clarify how you guys are executing ADF test to get the number. If anyone in your team can tell me how to get to this number using R, that will be great.

        • Karthik Rangappa says:

          Pratik, I beginning to sense my HDFC data could be wrong. If true, then this is a mistake and the who premise of this chapter could be wrong –

          But the essence still holds valid. Anyway, I’m in the process of figuring out 🙂

          • Manoj says:

            Hi Sir,

            Did you get any chance to look into the ADF test data and parameters. I am also facing the same issue as reported in the above post. When checked with the latest provided pair data excel, I could also match (from 23th Aug 2017 till 12th Jun 2018) all the values (beta, intercept, Std. Error and Sigma) precisely to decimal points except p-value. Could you please check on this.

          • Karthik Rangappa says:

            Manoj, swamped with work. I’ll try and do this as soon as I can. Thanks.

Post a comment