## 11.1 – Co-Integration of two-time series

I guess this chapter will get a little complex. We would be skimming the surface of some higher order statistical theory. I will try my best and stick to practical stuff and avoid all the fluff. I’ll try and explain these things from a trading point of view, but I’m afraid, some amount of theory will be necessary for you to know.

Given the path ahead I think it is necessary to re-rack our learnings so far and put some order to it. Hence let me just summarize our journey so far –

- Starting from Chapter 1 to 7, we discussed a very basic version of a pair trade. We discussed this simply to lay out a strong foundation for the higher order pair trading technique, which is generally known as the relative value trade
- The relative value trade requires the use of linear regression
- In linear regression, we regress an independent variable, X against a dependent variable Y.
- When we regress – some of the outputs that are of interest are the intercept, slope, residuals, standard error, and the standard error of the intercept
- The decision to classify a stock as dependent and independent really depends on the error ratio.
- We calculate the error ratio by interchanging both X and Y. The one which offers the lowest error ratio will define which stock is X and which on as Y.

I hope you have read and understood everything that we have discussed up to this point. If not, I’d suggest you read the chapters again, get clarity, and then proceed.

Recollect, in the previous chapter, we discussed the residuals. In fact, I also mentioned that the bulk of the focus going forward will be on the residuals. It is time we study the residuals in more detail and try and establish the kind of behavior the residuals exhibit. In our attempt to do this, we will be introduced to two new jargons – Cointegration and Stationarity.

Generally speaking, if two time series are ‘co integrated’ (stock X and stock Y in our case), then it means, that the two stocks move together and if at all there is a deviation from this movement, it is either temporary or can be attributed to a stray event, and one can expect the two time series to revert to its regular orbit i.e. converge and move together again. Which is exactly what we want while pair trading. This means to say, the pair that we choose to pair trade on, should be cointegrated.

So the question is – how do we evaluate if the two stocks are cointegrated?

Well, to check if the two stock is cointegrated, we first need to run a linear regression on the two stocks, then take up the residuals obtained from the linear regression algorithm, and check if the residual is ‘stationary’.

If the residuals are stationary, then it implies that the two stocks are cointegrated, if the two stocks are cointegrated, then the two stocks move together, and therefore the ‘pair’ is ripe for tracking pair trading opportunity.

Here is an interesting way to look at this – one can take any two-time series and apply regression, the regression algorithm will always throw out an output. How would one know if the output is reliable? This is where stationarity comes into play. The regression equation is valid if and only if residuals are stationary. If the residuals are not stationary, regression relation shouldn’t be used.

Speculating and setting up trades on a co-integrated time series is a lot more meaningful and is independent of market direction.

So, essentially, this boils down to figuring out if the residuals are stationary or not.

At this point, I can straight away show you how to check if the residuals are stationary or not, there is a simple test called the ‘ADF test’ to do this – frankly, this is all you need to know. However, I think you are better off if you spend few minutes to understand what ‘Stationarity’ really means (without actually deep diving into the quants).

So, read the following section only if you are curious to know more, else go to the section which talks about ADF test.

## 11.2 Stationary and non-stationary series

A time series is considered ‘Stationary’ if it follows three 3 simple statistical conditions. If the time series partially satisfies these conditions, like 2 out of 3 or 1 out of 3, then the stationarity is considered weak. If none of the three conditions are satisfied, then the time series is ‘non-stationary’.

The three simple statistical conditions are –

- The
**mean**of the series should be same or within a tight range - The
**standard deviation**of the series should be within a range - There should be no
**autocorrelation**within the series – this means any particular value in the time series – say value ‘n’, should not be dependent on any other value before ‘n’. Will talk more about this at a later stage.

While pair trading, we only look for pairs which exhibit complete stationarity. Non-stationary series or weak stationary series will not work for us.

I guess it is best to take up an example (like a sample time series) and figure out what the above three conditions really mean and hopefully, that will help you understand ‘stationarity’ better.

For the sake of this example, I have two-time series data, with 9000 data points in each. I’ve named them Series A and Series B, and on this time series data, I will evaluate the above three stationarity conditions.

**Condition 1 – The mean of the series should be same or within a tight range**

To evaluate this, I will split each of the time series data into 3 parts and calculate the respective mean for each part. The mean for all three different parts should be around the same value. If this is true, then I can conclude that the mean will more or less be the same even when new data points flow in the future.

So let us go ahead and do this. To begin with, I’m splitting the Series A data into three parts and calculating its respective means, here is how it looks –

Like I mentioned, I have 9000 data points in Series A and Series B. I have split Series A data points into 3 parts and as you can see, I’ve even highlighted the starting and ending cells for these parts.

The mean for all the three parts are similar, clearly satisfying the first condition.

I’ve done the same thing for Series B, here is how the mean looks –

Now as you can see, the mean for Series B swings quite wildly and thereby not satisfying the first condition for stationarity.

**Condition 2 -The** **standard deviation should be within a range**.

I’m following the same approach here – I will go ahead and calculate the standard deviation for all the three parts for both the series and observe the values.

Here is the result obtained for Series A –

The standard deviation oscillates between 14-19%, which is quite ‘tight’ and therefore qualifies the 2^{nd} stationarity condition.

Here is how the standard deviation works out for Series B –

Notice the difference? The range of standard deviation for Series B is quite random. Series B is clearly not a stationary series. However, Series A looks stationary at this point. However, we still need to evaluate the last condition i.e the autocorrelation bit, let us go ahead and do that.

**Condition 3 – There should be no autocorrelation within the series**

In layman words, autocorrelation is a phenomenon where any value in the time series is not really dependent on any other value before it.

For example, have a look at the snapshot below –

The 9^{th} value in Series A is 29, and if there is no autocorrelation in this series, the value 29 is not really dependent on any values before it i.e the values from cell 2 to cell 8.

But the question is how do we establish this?

Well, there is a technique for this.

Assume there are 10 data points, I take the data from Cell 1 to Cell 9, call this series X, now take the data from Cell 2 to Cell 10, call this Series Y. Now, calculate the correlation between Series X and Y. This is called 1-lag correlation. The correlation should be near to 0.

I can do this for 2 lag as well – i.e between Cell 1 to Cell 8, and then between Cell 3 to Cell 10, again, the correlation should be close to 0. If this is true, then it is safe to assume assumed that the series is not autocorrelated, and hence the 3^{rd} condition for stationarity is proved.

I’ve calculated 2 lag correlation for Series A, and here is how it looks –

Remember, I’m subdividing Series A into two parts and creating two subseries i.e series X and series Y. The correlation is calculated on these two subseries. Clearly, the correlation is close to zero and with this, we can safely conclude that Time Series A is stationary.

Let’s do this for Series B as well.

I’ve taken a similar approach, and the correlation as you can see is quite close to 1.

So, as you can see all the conditions for stationarity is met for Series A – which means the time series is stationary. While Series B is not.

I know that I’ve taken a rather unconventional approach to explaining stationarity and co-integration. After all, no statistical explanation is complete without those scary looking formulas. But this is a deliberate approach and I thought this would be the best possible way to discuss these topics, as eventually, our goal is to learn how to pair trade efficiently and not really deep dive into statistics.

Anyway, you could be thinking if it is really required for you to do all of the above to figure out if the time series (residuals) are indeed stationary. Well, like I said before, this is not required.

We only need to look at the results of something called as the ‘The ADF Test’, to establish if the time series is stationary or not.

**11.3 –** **The ADF test**

The augmented Dickey-Fuller or the ADF test is perhaps one of the best techniques to test for the stationarity of a time series. Remember, in our case, the time series in consideration is the residuals series.

Basically, the ADF test does everything that we discussed above, including a multiple lag process to check the autocorrelation within the series. Here is something you need to know – the output of the ADF test is not a definitive ‘Yes – this is a stationary series’ or ‘No – this is not a stationary series’. Rather, the output of the ADF test is a probability. It tells us the probability of the series, not being stationary.

For example, if the output of the ADF test a time series is 0.25, then this means the series has a 25% chance of not being stationary or in other words, there is a 75% chance of the series being stationary. This probability number is also called ‘The P value’.

To consider a time series stationary, the P value should be as low as 0.05 (5%) or lower. This essentially means the probability of the time series is stationary is as high as 95% (or higher).

Alright, so how do you run an ADF test?

Frankly, this is a highly complex process and unfortunately, I could not find a single source online which will help you run an ADF test for free. I do have an excel sheet (which has a paid plugin) to run an ADF test, but unfortunately, I cannot share it here. If I could, I would have.

If you are a programmer, I’ve been told that there are Python plugins easily available to run an ADF test, so you could try that.

But if you are a non-programmer like me, then you will be stuck at this stage. So here is what I will do, once in a weak or 15 days, I will try and upload a ‘Pair Data’ sheet, which will contain the following information of the best possible combination of pairs, this includes –

- You will know which stock is X and which stock is Y
- You will know the intercept and Beta of this combination
- You will also know the p-value of the combination

The look back period for generating this is 200 trading days. I’ve restricted this just to banking stocks, but hopefully, I can include more sectors going forward. To help you understand this better, here is the snapshot of the latest Pair Datasheet for banking stocks –

The first line suggests that Federal Bank as Y and PNB as X is a viable pair. This also means, that the regression of Federal as Y and PNB as X and Federal as X and PNB as Y was conducted and the error ratio for both the combination was calculated, and it was found that Federal as Y and PNB as X had the least error ratio.

Once the order has been figured out (as in which one is Y and which one is X), the intercept and Beta for the combination has also been calculated. Finally, the ADF was conducted and the P value was calculated. If you see, the P value for Federal Bank as Y and PNB as X is 0.365.

In other words, this is not a combination you should be dealing with as the probability of the residuals being stationary is only 63.5%.

In fact, if you look at the snapshot above, you will find only 2 pairs which have the desired p-value i.e Kotak and PNB with a P value of 0.01 and HDFC and PNB with a P value of 0.037.

The p values don’t usually change overnight. Hence, for this reason, I check for p-value once in 15 or 20 days and try and update them here.

I think we have learned quite a bit in this chapter. A lot of information discussed here could be new for most of the readers. For this reason, I will summarize all the things you should know about Pair trading at this point –

- The basic premise of pair trading
- Basic overview of linear regression and how to perform one
- In linear regression, we regress an independent variable, X against a dependent variable Y.
- When we regress – some of the outputs that are of interest are the intercept, slope, residuals, standard error, and the standard error of the intercept
- The decision to classify a stock as dependent and independent really depends on the error ratio.
- We calculate the error ratio by interchanging both X and Y. The one which offers the lowest error ratio will define which stock is X and which on as Y
- The residuals obtained from the regression should be stationary. If they are stationary, then we can conclude that the two stocks are co-integrated
- If the stocks are cointegrated, then they move together
- Stationarity of a series can be evaluated by running an ADF test.

If you are not clear on any of the points above, then I’d suggest you give this another shot and start reading from Chapter 7.

In the next chapter, we will try and take up an example of a pair trade and understand its dynamics.

You can **download the Pair Data** sheet, updated on 11^{th} April 2018.

Lastly, this module (and this chapter, in particular) could not have been possible without the inputs from my good friend and an old partner, **Prakash Lekkala**. So I guess, we all need to thank him 🙂

### Key takeaways from this chapter –

- If two stocks move together, then they are also cointegrated
- You can pair trade on stocks which are cointegrated
- If the residuals obtained from linear regression is stationary, then it implies the two stocks are co-integrated
- A time series is considered stationary if the series has a constant mean, constant standard deviation, and no autocorrelation
- The check for stationarity can be done by an ADF test
- The p-value of the ADF test should be 0.05% or lower for the series to be considered stationary.

Thanks Karthik.

Excellent writeup!!!

Is it possible to have the complete list of upcoming chapters to know where we are with regard this pair trade journey ?

Can we expect a few more chapters this month ? Sorry for being greedy…

Regards

Deepu

Deepu, glad you liked it.

I don’t plan for it in advance, but generally, go with the flow. To give you a rough idea, the next step would be to take up an example of a trade and try and put all the learning together. Hopefully, that will be exciting enough 🙂

Thanks Karthik.

Request you to share 5-6 examples so that it covers most of the leanings. Further what is the ADF plugin cost and where to buy it from ? Please share the details.

Regards

Deepak

You can use R Studio package to run ADF test. There is a package called “urca” in r studio which enables this test.

Thanks, Akshay. Yes, I’m aware R has a plugin, will have a look at URCA.

The idea is to share a couple of live examples. Will share the other details as we progress.

Dear Karthik,

As you said I have stuck on ADF test. no friend with programming knowledge.

If I need to buy a paid plugin then how much it will cost?

Any other way to do the ADF test?

Swapnil, unfortunately, I have not evaluated any paid versions. So cannot really comment. Thanks.

I’m glad to know new learning with your guidance. Seriously Its very educative and informative.

Thanks for Enlightenment us.

Happy learning, Anil!

I would say, it’s very addictive too along with being educative and informative.

Thanks a lot, Karthik.

Glad to hear that, Arbit 🙂

Keep learning!

Thanks prakash lekkala sir and karthik sir for your effort..

Most welcome!

Thank You Karthik sir,

Even though ADF test is not available , you have taught us how to calculate Stationarity using excel by dividing the data in to parts and calculate Mean,SD and 2 Lag correlation.But please mention how much variation in Mean,SD which would represent ‘p’value of 0.05 (rough estimate).

You can run ADF test in R software, load package called “urca” in R. It’s really easy in R.

Dear Akshay,

Is this software free? If yes please suggest from where to download it.

you can download adf test excel add-in from below link

http://www.web-reg.de/adf_addin.html#

deear sunil thanks a lot for the link please elaborate how to use it on excel? thanks in advance.

For mean – I’d suggest a tight variation, not more than 3-5 points difference. For SD, technically you will have to look at the standard error of the standard deviation, but then, it may just get a little overboard. Stick to -5-10% at the most. This should result in a pvalue less than 0.05%.

Can you please upload the PDF of all the chapters shared so far?

Thanks

The modules will be completed to PDF once this is completed.

Hello Karthik,

Thanks to you and Prakash for taking the pain to make us understand this chapter. Overall I am thoroughly enjoying this module. However, I have few questions in my mind while going thru’ this chapter. Hope you can clarify the doubts here.

1. You mentioned that the look back period is 200 trading days. When I am calculating the pair (let’s say PNB as x and Kotak Bank as Y), the Intercept coefficient I am arriving at is in the vicinity of 1111. However, in the sheet you shared it is around 1099. My data range is starting from 23rd June, 2017 till 13th Apr, 2018. Am I missing anything here. I am following the same procedure which you mentioned in chapter 9.

2. When I am calculating the p-value (using the python in-built packages), for the period as mentioned above – it is coming around .40 instead of .01. Not sure why such a huge difference. Can you please elaborate if there are any additional parameters go into calculating the p-value in your case.

Thanks

1) How did you source the data? Did you get it from Pi? Make sure its clean for splits and bonuses, if any

2) Not sure about this, will try and see why this could be happening.

I took the data from Yahoo finance. Generally it’s adjusted for split and bonuses. But I will take it from Pi and do the calculation once again.

Ok. Also, we have considered the data from 20th June 2017 to 10th apr 2018. The intercept difference is due to that I guess. Also, as you may have figured, in most ADF functions, one needs to give a lag. In our case its 5. Recommend value is the cube root of the length of data points (or thereabouts). Since we had 200 data points, cube root is 5.8, decided to go with 5.

Thanks. I will use 5 then.

Sure, good luck, Mainak.

Thanks Prakash and Kartik..

For p.value i use amibroker. Cointegration is not inbuilt indicator for p value so we have to outsource the data to pythone from ami . For that search “how to calculate cointegration in amibroker” on marketcalls.in, there is v.good step by step explanation on that.

I find nifty/banknifty, ambujacem/acc and tatamtrdvr/tatamotors very stationary pairs to trade even on 60min chart too..

I keep searching stocks in same sectors only.

@kartik,

the p value for axis/icici showing 0.00 all time i look, what does it mean? Is it 100% probability that its mean reverting?

And once again thanks u both of you.

Akash – I’m not sure about the article you have mentioned, maybe I should give it a read. Also, the term ‘p-value’ is a generic term, make sure you are reading this in the right context.

hi karthik

ami gives me data in this format, copy below link paste in other tab to see the screen of amibroker.

http://prntscr.com/j9xawy

http://prntscr.com/j9xc40

http://prntscr.com/j9xcgu

http://prntscr.com/j9xct9

above are my favorite pairs. one can overlook the correlation data as i calculate 63 trading days correlation by amibroker builin function. but i took 252 trading days to calculate co-integration.

below is correlation table link which can run in amibroker by simple afl

http://prntscr.com/j9xeih

i have cointegration afl also but its not running properly otherwise we can just see the cointgration in tabular form in selected watchlist. so i keep looking cointegration in individual pairs only.

hope above info will usefull to other friends too.

forgot to share the link for amibroker users…

https://www.marketcalls.in/amibroker/how-to-compute-cointegration-using-amibroker-and-python.html

@karthik,

why this is written p-value should be less than 0.5? can u throw some light on this? might be its taking % value?

http://prntscr.com/j9xmxp

thanks for nice series of chapters.

when there is a break in correlation, there is a trade opportunity in good co-integrated pair, i think. here is amibroker screen for correlation.

http://prntscr.com/j9zaoy

Interesting, need to validate this, Akash.

I dont know why it should be less than 0.5. I’d prefer less than 0.1 or even 0.05.

I hope so too 🙂

Btw, is there any insight into how the Cointegration is calculated?

@kartik

greetings

– while studying co-integration i find web pages for pure calculation of how to calculate co-integration, frankly i could not understand any of math symbol and calculation.

– regarding 0.5 or 0.05 about p-value, 0.05 is confirm. but the afl i m using with ami is simply outsource the data to python servers and displays the coint value to amibroker afl window. i think we should divide the displayed value with 10. because some good pair with my experiment showing the coint value of 0.20…so i m taking it as 0.02…and its working fine. any python coder can crack its afl and throw some more light about it.

– http://prntscr.com/jahlrd in this my data starts from 12/4/17 to 26/4/18 almost 252 days period..co-int value is 0.08 so i will take it as 0.008 and

– http://prntscr.com/jahp5z in this tatamtrdvr/tatamotors pair coint showing 0.18 so i m taking it as 0.018…u don’t believe that this pair is so tight spread that i m trading it on 60min basis and touch wood earning good money….pl click below link for hourly chart

– http://prntscr.com/jahsfo profit in hourly chart is less than day chart but there are plenty of trading opportunities..(on 5lac both side u can earn around 2500-brkrg in 2-3 days)

previously i started all good stocks pair, then after experience i narrow down it to good banking stock (total 104 pairs possible), but since last 6 month i narrowed down it further and i trade only nf/bnf, tatamtrdvr/tatamotors, hdfc/hdfcbank and acc/ambuja only.

besides theory any experience trader can tell after watching 2-5 yrs of daily chart that this pair is regularly mean-reverting or not?

after all earning consistent money is anyone’s motto here…

thanks.

Hey, Akash thanks for the insights. Nothing beats practical market experience 🙂

Btw, what makes you divide the p-value by 10?

I was not aware of TM DVR future, I’m sure it opens a window of opportunity.

Hi, Kartik,

dividing p-value by 10 was just my simple logic/intuition from motors/dvr exp…the long term chart showing its very tight spread and it regularly cross its mean, and coint value showing me 0.11 to 0.45 range, if we took this value this pair is not reliable for pair trade, but if we divide it by 10 we will get 0.011 and 0.045 and that would be v.good for pair trade. and second example was bn/n its coint showed me 0.15 to 0.68 but practically if we see that is also verygood pair. thats why i came on that conclusion that i should divide it with 10. i dont have any knowledge to write or read and understand AFL language… i believe in KISS. and i never regret for it. but i still not understand why axis/icici coint showing 0.00 value? that project is still under process…will let you know. mean time i m searching who has sound AFL coding understanding to crack the python AFL. hope this will help.

Interesting, Akash. Will certainly look at this. Meanwhile, do post any observations here!

Hi Kartik,

I m a regular student of Zerodha Varsity. I am wondering whether it is possible for you to have a separate chapter for “How Stock/Financial Market Operates” which cover basically the mechanism of stock market like Market Makers, Clearing Agents, etc. ( as there are many other components who operate on the back stage of Market and I’m just mentioning couple of them that I know. Hoping you will cover the rest) How they operate and who they are on the context of Indian market.

There is not so much stuff available online also on this subject. I personally think that one should have knowledge about the mechanism which will broad our knowledge and I believe knowledge is Power.

Best regards,

I agree, Joshan. Have you looked at this module – https://zerodha.com/varsity/module/introduction-to-stock-markets/ , will help you with some basic information.

Sir I know this is not a proper question but this is just eating away at me. A few days back Airtel had announced its results and it was bad. But it was better than what the market was expecting. Still the next day, the share went up. What do you think caused this?

Sundeep, the same thing happened y’day with Axis. The past was bad but the future looks good. Remember, future is what the stock markets always looks at 🙂

Sir what you just said only brings me a few more questions sir. I’m sorry to pester you like this.

1. You once said when you give market good news and bad news, it always reacts to good news first. By that logic, don’t you think the shares of Airtel and Axis should have gone down?

2. In the hindsight, do you think you could have predicted that even if those two companies posted bad earnings, it is going to go up? (I’m just asking that to see if seasoned traders can do that, since I had no clue it was a possibility. )

3. Is there a method to associate a particular news to its reaction to stock price. For example, in the above case, what were the factors that led to stock moving up?

I hugely appreciate what you’re doing to help fellow traders like myself. Varsity is a treasure and I encourage my friends to read it as well. Thanks in advance for the answer.

1) The market always looks at futures, Sundeep. So, they expect a better outlook for these stocks as they believe the worst could be over. But your guess is as good as mine

2) My colleague actually had a bet with another colleague that the stocks would go up the next day 🙂

3) This is largely depended on your experience reading the markets

Happy to note that Sundeep, keep learning 🙂

Fantastic series Karthik. I had not been here for a while and had to skim through to get here. You have successfully managed to keep it as a easy read. Hats off.

Considering that I am starting my journey as a full time trader n a month, I see myself coming here more often.

Question:

Do you have tools within kite to figure out cointegration and other analysis ? Are you considering bringing any capability around it?

Glad to note that, Madhavan 🙂

Unfortunately, we don’t have coint tools within Kite. Trying my best to figure out an alaternative.

Sir I have a very personal question to ask you. But since it relates to mindset of a good trader I decided to ask you anyway. How do you feel when your fellow trader made more money, assuming you started out with same amount of capital. I know I felt really bad when it happened to me. How do you deal with that?

Sundeep, this is personal. The way I react maybe different from the way another. I think you should be happy since you can always check with your friend on what went right for him and learn from his success. End of the day, the only way to move ahead in markets is by having an open mind to learn and adapt. Good luck and keep learning 🙂

Sir you’ve written exhaustive text on trading using Technical and Quantitative methods. Can you write a module on trading using Fundamental analysis (based on earnings or news). If not, can you give some methodology on how to learn them ?

Here is what we have, for now, Sundeep – https://zerodha.com/varsity/module/fundamental-analysis/

Its best you use FA for long-term investment and not really get into trading based on FA.

Sir I think FA is best used for long term too. But I think FA based plays gives the highest yield. What are your thoughts on that? Can you share why you think FA can’t be used for short term plays?

You do have great setups based on PE expansion/contraction. For this, you need to keep track of daily PE. Further, you can even place trades based on quarterly results, employing options.

Yes sir that is exactly what I am doing right now. But the returns are all over the place. And I need some guidance. What books are there on the subject or if you can give me few pointers it would be very very helpful. Thank you.

Have you checked Ganapathy’s VidyaMurthy’s book on Pair Trading? Check this – https://www.amazon.com/Pairs-Trading-Quantitative-Methods-Analysis/dp/0471460672

Hello sir

We are eagerly waiting for the next chapter.

Thanks

Here it is – https://zerodha.com/varsity/chapter/trade-identification/

Hello,

Thanks for providing wonderful modules in Varsity. I have some queries listed below:

When the next chapter will come?

How much time it will take to complete the entire module and how many more chapters will be added?

Can you please name some reference books or resources for a deeper understanding of Trading system and coding one by himself?

Thanks

Here is the next chapter – https://zerodha.com/varsity/chapter/trade-identification/

This module maybe completed by May 2018. We still have delta hedging with options to complete.

Check out Ganapathy Vidya Murthy’s book on Pair trading, its one of the best on the topic.

Hello sir

I have have installed EViews statistical package for one year trial period ? In the “lag length” drop menu of ADF test section there are many options available like Schwarz Info Criterion, Hann-Quin criterion, Modified Akaik, T- static each giving different P value for the same max lag of 15. You may see it here:

http://prntscr.com/jdu0xn

Even it gives value below the threshold value of 0.05 the header reads as Null Hypothesis: Residual has a unit root. If the series has a unit root how could it be a stationary series? I have taken a screen shot here:

http://prntscr.com/jdu7m4

Thanks

Varsity student

I’ll get back to you on this soon, Mayank.

Thank you very much sir.

Welcome, Mayank!

Sir who is your favorite trader? The one may be you try to emulate?

I’m fortunate enough to sit with the best trader I know, Nithin Kamath 🙂

Lots of learning, not just about trades in markets but also trades in real life 🙂

That’s good to know sir. But I always thought Nikhil was the better trader. Nitin’s own words.

Hello sir

My question is about updating the pair data everyday. I run regression analysis and copy the residual data and paste in another sheet where I analyse density curve. I’m repeating the same actions everyday. Can you suggest me some smart way to keep my excel sheet updated?

Next question: Residual, i get everyday, slightly differs than that of previous day albeit the difference is at third or fourth places after decimal. Should i paste the whole set of data or instead add one day data to the already existing column?

Thanks

Varsity student

I understand, Kumar. This actually needs some programming help and unfortunately, I can be of very little help in that perspective. You can update the latest close to get the latest position of the residuals.

Hello sir

Weel, my quest for updating data fast has got some success. I learnt to use macros but it runs on the fixed amount of data. I mean if i recorded to perform on 255 set of data then it can’t run on 256 set of data.

Now my excel sheet has become dynamic. Whenever i add new data (today’s close price) the oldest data in the column gets deleted on its own and i have the same number of data but different a starting date.

I want to know if there is any issue with such dynamic updating of data. Hope you got me 🙂

Thanks

Varsity student

This is not an issue, Mayank. You need to ensure you are looking at the latest 200-day data.

Thank you, sir 🙂

Welcome, Mayank!

Hello sir for testing my algo, can you tell me the ADF test p_value of HDFC and ICICI pair, I am getting 0.007, is it correct?

based on the data you used in previous chapters sir.

ADF value was ard 0.04. I think 0.0007 is not correct.

if

x = ICICI Bank, y = HDFC Bank , adf_p_value = 0.0397,err_ratio = 0.4

x = HDFC Bank, y = ICICI Bank, adf_p_value = 0.0077, err_ratio = 0.2

are the above values correct sir?

The first one seems more like it.

ok sir thanks

Welcome!

Sir what is the name of the excel plugin that you use to perform ADF test.

I’m not sure if there is an excel plugin, Vinay.

So can both method be used in intraday as well? What will be the data series in that case.Will it be 15 min close price in case of 15min chart.Which period would be more reliable daily or intraday? And what should be the profit expectection in case of intraday in percentage terms? Lastly are there some other pair trading method apart from the two (btw I found these two mehod very informative and practical)u showed us and can u suggest some books or reading for same.Thanks in advance

No Vinay, I would not suggest you do this for intraday. These pairs trades need time to evolve and this happens over 3-4 days. However, I have opened and closed pair trades on intraday basis, but this has happened due to luck and not design.

Hi Karthik,

Good series on Pair Trading. Got me hooked!!

Just a small query..when you say data should be adjusted for bonus, split, dividend etc..Where do we get such data..I am Importing the data from NSE website..So can it be considered clean or else can you give any other source to obtain clean data?

Yes, Pranay, NSE Bhavcopy usually has clean data.

Hi Karthik,

I checked the NSE bhav copy which is published daily and does not have the adjusted price if you go back and pull the same file from the earlier dates like for example TCS whose price was recently got changed due to bonus .

Can you please share the link or tell me where to get the adjusted price ?

Thanks for the assistance.

Regards

Deepu

In that case, you need to evaluate a dedicated data vendor who will clean up the data for you. Check truedata.in.

Hi Karthik,

What is the input for ADF Test?

200 Day Stock price data or any calculated intercept, residual data?

ADF test checks for the stationarity on the residuals. So you yeah, the inputs for the ADF test is the residuals.

Hi Karthik,

Really thankful for all your efforts. I am learning loads from it.

I am trying to reproduce all the steps which you mentioned in this blog. I have downloaded the excel sheet in which you have provided HDFC Bank and ICICI Bank data from 4th Dec 2015 until 4th Dec 2017. I have calculated slope, intercept, Standard Error and Standard Error of Intercept and finally the Error Ratio.

To calculate P value on time series data of residuals, I have used R language. There is a function adf.test() which executes ADF test on the given data. However, when I run the test, I receive the data as follows:

P value for residual of ICICI bank(Y) = 0.03729

P value for residual of HDFC bank(Y) = 0.08545

However, in your post, as you mentioned other results e.g. slope, intercept, Standard Error and Standard Error of Intercept and finally the Error Ratio, I could compare them to verify if the calculations that I am running are correct or not.

Can you please run the test on the same data and please confirm if the P values which I have received are correct or not? For reference, I run adf.test(c(The time series residual data here….)) function. Without passing any other arguments to this function. There are arguments by which Lag parameter can be defined. But I was not sure about that so ran the function with default arguments. Can you or someone from your team confirm if values which I have received are correct or not? If not then how exactly are they using R to get P values?

Thanks and regards,

Pratik

Pratik, can you download the pair data sheet here – https://zerodha.com/varsity/chapter/live-example-2/, I think this is for 12th June 2018. Compare your results with other pairs as well besides HDFC Bank and ICICI.

I got the latest excel sheet on 12th Jun 2018. Thanks for your pointer. Can you please clarify the From and To date of the data used for calculation of this sheet? So that I can use the same data and match the results precisely. Current I am using HDFC bank and ICICI bank data from 4th Dec 2015 until 4th Dec 2017 which you have shared in an excel sheet.

The sheet was generated on 12th June, so you will have to go back 200 trading days.

By reading your note for programmers in the following chapter I got that you are using last 200 days of data. Considering that you have data published last Pair data excel sheet which you have pointed out to me in your comment, I figured out that you are using last 200 days of data starting from 23th Aug 2017 till 12th Jun 2018. Now when I run my calculation on it, I could match values of beta, intercept, Std. Error and Sigma, precisely to decimal points. That gives my confidence that my calculations are correct. However, when I pass time series residual data of HDFC to adf.test function in R, the outcome is:

Dickey-Fuller = -3.1394, Lag order = 5, p-value = 0.09956

alternative hypothesis: stationary

The p-value does not match with the excel sheet value of 0.2073132413. Can you please clarify how you guys are executing ADF test to get the number. If anyone in your team can tell me how to get to this number using R, that will be great.

Pratik, I beginning to sense my HDFC data could be wrong. If true, then this is a mistake and the who premise of this chapter could be wrong – https://zerodha.com/varsity/chapter/live-example-2/

But the essence still holds valid. Anyway, I’m in the process of figuring out 🙂

Hi Sir,

Did you get any chance to look into the ADF test data and parameters. I am also facing the same issue as reported in the above post. When checked with the latest provided pair data excel, I could also match (from 23th Aug 2017 till 12th Jun 2018) all the values (beta, intercept, Std. Error and Sigma) precisely to decimal points except p-value. Could you please check on this.

Manoj, swamped with work. I’ll try and do this as soon as I can. Thanks.

Sir, i am also facing the same issue, other than p-value everything else is matching….whenever you get the time ..thanks in advance..

Sir

how to calculate beta in final pair trade sheet

The beta is included in the regression output, have explained how to run a regression here – https://zerodha.com/varsity/chapter/linear-regression/

Hey Karthik,

Won’t the P-values outputted by the regression function in the ‘Analysis ToolPak’ add-in in Excel suffice for our purposes?

Hmm, not really cause we are looking at the stationarity of the residuals.

Dear Karthik,

I found one free excel add-in for ADF test. I have tested it but don’t know how useful the output is.

Can u please test it once, Just want to know it is the same we are looking for free ADF test.

To download click on below link.

http://www.web-reg.de/adf_addin.html#

Really appreciate your support.

Will try and check this by the weekend, Swapnil. Thanks.

Dear Karthik,

Any update on Excel plug in.

Swapnil, unfortunately, no. But I’d suggest you look at the comments, one of the readers has been kind enough to publish a free algo.

But Karthik I don’t know how to use python. Excel adf plug in will not take much of your time. Please help me, from last 2 months I stuck on adf test. And finally find a Excel plug in just run it once.

Please.

Sure, Swapnil, will try and do that.

Python Code Link – ideone.com/z3Xg6M

Sample Output – https://image.ibb.co/bCPp79/Output.png

Just feed any number of data points to it and get your output within fraction of a second 😉

Hey, this is interesting. Will check this out 🙂

Dear Karthik,

Have you tested it?

Regards,

Swapnil.

Yes, I’ve traded this strategy quite a bit 🙂

Thank you very much Karthik for your efforts.

Can you please help me with the outputs?

If possible please give an example on how to use this output.

Regards,

Swapnil

Swapnil, this is kind of stuck. I’ll try and put this up. By the way, did you check out the code one of the readers put up here?

Dear Karthik,

As informed by someone, R studio with urca package is excellent tool for ADF test. there is very simple programme to be written (credit youtube). I guess u must share this solution with other viewers of varsiry as it wil be of great help to all of them. Instead of leaving the users at the end with the statemnet that they will be stuck at ADF test, it is better to feed them with this solution. It will be an added cause to your already running great cause.

Needless to say, thanx for this lovely module on pair trading, It was thrilling.

You do have a point, Arun. Have you used this? If yes, would you mind sharing it with others?

I have used it but results I have not verified yet. I request you to pls run and verify on your pair data sheet, if possible. I m sharing a youtube link which will work for everyone.

https://m.youtube.com/watch?v=mkHtP0nONJY

R studio is open source software. easy to download and install for free.

sincere request to u Karthik that pls verify the results and let us know if it is useful. Thnx in advance.

Will do, Arun. Thanks.

Hi Karthik,

Quick question -suppose the p value was around .30 three months back and now it is around. 05 for any pair than is it tradeable?

Any guidance on the p value? Do we need to see only the current p value or the historical ones too.

What’s is the reason for such scenario happening in any pair?

Thanks in advance.

Deeo

Its good to see the historical P value, but I’d pay more attention to the current value. The reason could be a drastic variation in prices.

Dear Karthik,

ADF output largely depends on 3 conditions of ADF test:

1. no constant & trend

2. constant & no trend

3. constant as well as trend

which condition shall we select…as all 3 conditions give diff value for “p” as all have diff critical values.

Constant and no trend, but I’ll get back to you on this.

2 more queries Karthik,

1. running a script downloads dats for spot price but we r interested in futures. Analysis done on spot but trade to be taken on futures..is this wht we gonna do?

2. trade has to be initiated ONLY when Zscore touches nearly +2.5/-2.5 or it can also be initiated when it touches 2.6, 2.7 or 2.8?

1) Yup, remember its not about absolute price levels.

2) Any level that you think is fair. I usually prefer a z-score closer to 3.

what will be the target & stoploss in case of z-score of 3?

als, did u check with “constant & no trend” as askes earlier?

At z-score of 3, you’d initiate the trade with probably 3.1 as the SL (remember, usually 3 itself is the SL). The target will be closer to the mean.

Hi Karthik,

What’s is the purpose of doing the exercise of identifying X and Y and how it helps in trading the pair?

As depending on deviation we decide which one to buy and sell.

Please advise.

Thanks

Deepu

We need them to value one stock versus the other. Remember, at the core of the strategy is linear regression where we try and explain the price of stock y (dependent) by using the price of stock x (independent).

Thanks for the reply Karthik.

For any pair why we regress two times first X as independent and Y as dependent and then vice versa for the same pair. How does this helps?

Regards

Deepu

This is to figure out the best possible combination of X and Y (dependent and independent). Guess I have explained this in detail earlier in the chapter.

Dear Karthik,

Thanks for all the enlightening on financial modelling for pair trades. Few questions on the data for Hero motocorp and bajaj-auto i am trying to check for possible pair trade scenario for both companies with data count of 246 days for past 356 days till 14th december 2018.

-Having run the adf test in excel, with condition of constant only for unit root at levels got an adf score p value or prob of 0.153961 higher than 0.01

– Having run the same test with test for unit root in first difference the adf result was positive for stationarity withp values or prob values of 0.000000.

how should this be treated as there is stationarity at first difference while no stationarity at levels. should the adf data be considered only at levels and differences (1st, 2nd) ignored?

kindly advise,

Regards,

Vijay

There are many type of trading systems.

1. News based

2. Single Candle stick patterns like engulfing candle, dozis etc

Or patterns like double ot tripple tops and bottoms, head and shoulders, cup and handle etc

3. Combining Indicators like rsi, macd. Super trend, bollinger bands lab lab lab

4. Moving average cross overs

5. Trend following using S and R

6. Treand reversal or bottom fishing

7. Based on open interest and price relationship

8. Pair trading

9. Swing trade or intraday

10. Option trad

11. Just by watching price action

Many more…

Which is your favourite? Mean to say, in which you could able to generate more consistent profits

I’ve had varied success across these strategies. I prefer 4,5,8, and 10 🙂

Dear Karthik,

Requesting your views on my earlier query ( 17th decemeber) on adf test scores about stationarity at levels and 1st levels please ?

Regards,

Vijay

Vijay, apologies for missing this query. This needs a slightly lengthy reply. I will try and draft one by weekend. Thanks.

Dear Karthik,

Thanks for your acknowledgement, shall look forward to the response at your convenience.

Regards,

Vijay

I have this on mind, will get back as soon as possible, Vijay. Thanks.

Dear Karthik,

Congratulations on being the biggest stock brokerage firm in india and also the fastest to reach to the top. Was wondering if you had a chance to ponder on my query posted earlier.

Regards,

Vijay

Thanks, Vijay. Hope to learn and grow from here 🙂

Apologies for not posting a reply earlier. Can you please share the technique of how you did the ADF test and also the test result file? Maybe you can mail that across to [email protected], addressing me. Thanks.

Dear Karthik,

Thanks for the response. As you grow hoping to learn more of the education trickling down to us similar to zerodha varsity. Shall surely mail you the details along with the data file for your perusal.

Regards,

Vijay

Sure, Vijay. Good luck to you.

I have the account of Zerodha. You mentioned that you will upload pair data sheet once in a week or 15 days. But I do not find this anywhere. Pl help me to find this so that I could be able to tradenbased on better p-value.

Rohit, somehow this does not seem to work. Tried for the initial few days. Btw, some of the readers were kind enough to provide free code, maybe you should try using that.

Thanks for response. How will I do to successfully trade? Which type of trade option or futures or pair trading?

If you are a complete newbie, I’d suggest you try with paper trading, then few shares in equity spot before transitioning to F&O.

I have been trading all types of trading i.e. futures, options and equity spotfor last ten years. But I am not been able to make profit consistently and ultimatly lost. I have knowledge of charts and tried different syatems but not successful. So therefore give me suggestions which method I should trade. I know trading is not holy grail but profit can be made consistently from trading. I am interested in short term trading not long term. Pl give me suggestions.

Well, Rohit, how can I tell you that? In fact, how can anyone tell you that? You need to keep exploring and tweaking your methods until you get consistent and profitable.

Sir, thanks again….is the adf test done on the residuals?

Yup, that’s right Ashok.

Thank you very much sir….

Good luck!

This is the latest result. Excluding all the combination where the time series is not stationary.

If anyone can confirm this p-value looks correct then it will be very helpful.

+———+————+———–+————+———+——————+—————————-+———————–+——————————–+| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

+———+————+———–+————+———+——————+—————————-+———————–+——————————–+

| MARUTI | TATAMTRDVR | 4107.6928 | 32.5515 | 0.0426 | -264.8458 | 444.2069 | -0.5962 | True |

+———+————+———–+————+———+——————+—————————-+———————–+——————————–+

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

| SYNDIBANK | ALBK | 21.8551 | 0.3551 | 0.0263 | 1.0936 | 3.6487 | 0.2997 | True |

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

| Y-Stock | X-Stock | Intercept | Slope/Beta | p-Value | Today’s residual | Sigma/Std Err of Residuals | Std Err-DecisionMaker | Is the time series stationary? |

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

| UNIONBANK | ALBK | 39.9479 | 0.9308 | 0.0328 | 3.8069 | 6.6037 | 0.5765 | True |

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

+———–+————+———–+————+———+——————+—————————-+———————–+——————————–+

+———–+————+———–+————+———+——————+—————————-+———————–+——————————–+

| SYNDIBANK | BANKBARODA | 14.0244 | 0.1977 | 0.0054 | 1.6022 | 2.7753 | 0.5773 | True |

+———–+————+———–+————+———+——————+—————————-+———————–+——————————–+

+———+———+———–+————+———+——————+—————————-+———————–+——————————–+

+———+———+———–+————+———+——————+—————————-+———————–+——————————–+

| SBIN | CANBK | 108.0908 | 0.6893 | 0.0436 | 15.0078 | 8.893 | 1.6876 | True |

+———+———+———–+————+———+——————+—————————-+———————–+——————————–+

+———+———–+———–+————+———+——————+—————————-+———————–+——————————–+

+———+———–+———–+————+———+——————+—————————-+———————–+——————————–+

| CANBK | UNIONBANK | 54.7387 | 2.4862 | 0.0067 | -4.7415 | 8.8502 | -0.5358 | True |

+———+———–+———–+————+———+——————+—————————-+———————–+——————————–+

+———+———–+———–+————+———+——————+—————————-+———————–+——————————–+

+———+———–+———–+————+———+——————+—————————-+———————–+——————————–+

| SBIN | UNIONBANK | 141.8861 | 1.7622 | 0.0495 | 11.2014 | 10.249 | 1.0929 | True |

+———+———–+———–+————+———+——————+—————————-+———————–+——————————–+

+————+———+———–+————+———+——————+—————————-+———————–+——————————–+

+————+———+———–+————+———+——————+—————————-+———————–+——————————–+

| INDUSINDBK | YESBANK | 1096.8594 | 2.3158 | 0.0185 | 105.3708 | 75.3098 | 1.3992 | True |

+————+———+———–+————+———+——————+—————————-+———————–+——————————–+

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

| SOUTHBANK | YESBANK | 9.1192 | 0.0277 | 0.0194 | 1.1547 | 1.7954 | 0.6432 | True |

+———–+———+———–+————+———+——————+—————————-+———————–+——————————–+

This appears right for me, Ashok. Do compare the results with the sample set.

Hi Karthik,

I could not understand what you meant by Beta here.

As mentioned above in your comments, I also could not find anything in this chapter with respect to Beta:

” https://zerodha.com/varsity/chapter/linear-regression/”

With Beta do you mean the Slope of the Straight Line Equation ?

Kulbir

Yes, by beta, I mean the slope of a straight line. I’d suggest you read this chapter to understand beta better – https://zerodha.com/varsity/chapter/hedging-futures/

Hello,

The P value in the regression process output is same as the one we get after ADF test.If yes then there are two rows that indicates P value.which one to consider?

THANKS

No, the ADF is a test for stationarity of the residual series.

@Karthik…Found a free addin for conducting ADF test (we need to install the addin to Excel). If possible, kindly review it once, since I dont have a paid tool and cant verify the result.

http://www.real-statistics.com/time-series-analysis/autoregressive-processes/augmented-dickey-fuller-test/

Hey, thanks Himansu. Will review and let you know. Thanks.

Dear Karthik,

If you don’t know how to use ADF test then what is the use of writing articles regarding ADF test for the readers. As I have checked many sites to find the ADF test but none of them explained how to use it. Please help me.

Rohit, Have you read through the chapter? I’ve explained how to use the ADF test but I’ve not really shared a tool to do the ADF test as most of these are paid tools. But I think Himanshu was kind enough to share a free resource to run the ADF test (see the comment above).

Dear Karthik,

I have opened the site recommended by Himanshu but I don’t understand how to use it. And he asked you to review it. Is ADF test mentioned by Himanshu reliable? Please help me.

I’ve to review it Rohit, will try and do that sometime soon.

please can anybody help me where I will find ADF test. I have tried ADF trst of different sites recommended by above esteemed readers but none of them is understood or unable to find it . Pl help me. If it is paid even then tell me. Anybody give me proper ADF test will be highly appreciated

Dear Karthik,

Pl refer to Dated 14 Dec 2018 11.11 am in which you answered about the parameter of ADF test asked by someone that the parameter should be constant and no trend. It has created my confusion because I asked the same question some days back which you answered me the parameter should be no constant and no trend which is contradiction. Pl tell me actual answer what should be the parameter?

Rohit – no trend, no constant for pair trading.

Dear Karthik,

What should be minimum value of ADF test, if all parameters are ok. I have seen that ADF with less than .01 is tricky because at this point both stocks do same direction with almost same percentage every day. What do you think? Please reply me

The answer to this depends on each pair. I’ve found that its hard to assign 1 value and stick to it. Perhaps, it makes sense to evaluate which ADF value works for the given pair. A simple backtesting should help you here.

one more question. Sir, is it always less than 0.05 of ADF value be considered good trade? What if ADF value is 0.1?

Again, its best if you custom fit this to each pair.

Hi Karthik,

How often do we check the Residuals. Do we do it everyday with EOD data or one in 3 to 4 days would suffice.

When am comparing latest Residual value with the Std Error (the ratio ) on daily basis using linear regression in excel do I take residual of Observation 1?

Based on experience, If am running manual tests in excel for finding P-value of adf value, How often will this adf value persist for that pair and when do i recheck it again?

Thanks

It largely remains the same if the latest prices don’t change drastically or the oldest data which is going out of the time series is a large value. By the way, it is worth taking a look at the residuals whenever you run the regression.

so you recommend to run regression every day with eod close price??

and Thank for the reply karthik,

I would.

Hi Karthik,

The conditions for adf test when you run with “constant and no trend” and ” No constant and no trend” returns different number of pairs and there is varied difference in calculating standard error for today.

Constant and no trend returns less number of pairs compared to No constant and no trend.

Which one you feel is the right setup for the adf test.

Thanks

Sunil, my research points to no constant and no trend as the right way.

Thankfully, I’ve completed CFA L2 so this chapter flew was like a breeze to me.

Regarding the DF Test, don’t we first need to check whether the equation has a unit root or not? Checking for unit root is fairly simple. We regress the change of X on the lagged value of X(X2 – X1) and check the DF t- table and hope that the hypothesis is rejected.

I really could have explained it better but it requires a longer explanation, much notation based . So not sure how to do it here.

Umer, isn’t that what the ADF test does?