13.1 – Tracking the pair data
We have finally reached a point where we are through with all the background theory knowledge required for Pair Trading. I know most of you have been waiting for this moment ☺
In this last and final chapter of pair trading, we will take up an example of a live trade and discuss factors that influence the trade.
Here is a quick recap of pre-trade theory –
- Basic overview of linear regression and how to perform one
- Linear regression requires you to regress an independent variable X against a dependent variable Y
- The output of linear regression includes the intercept, slope, residuals, standard error, and the standard error of the intercept
- The decision to classify a stock as dependent (Y) and independent (X) depends the error ratio
- Error ratio is defined as the ratio of standard error of intercept/standard error
- We calculate the error ratio by interchanging both X and Y. The combination which offers the lowest error ratio will define which stock is assigned X and which on as Y
- The residuals obtained from the regression should be stationary. If they are stationary, then we can conclude that the two stocks are co-integrated
- If the stocks are cointegrated, then they move together
- Stationarity of a series can be evaluated by running an ADF test
- The ADF value of an ideal pair should be less than 0.05
Over the last few chapters, we have discussed each point in great details. These points help us understand which pairs are worth considering for pair trading. In a nutshell, we take any two stocks (from the same sector), run a linear regression on it, check the error ratio and identify which stock is X and which is Y. We now run an ADF test on the residual of the pair. A pair is considered worth tracking (and trading) only if the ADF is 0.05 or lower. If the pair qualifies this, we then track the residuals on a daily basis and try to spot trading opportunities.
A pair trade opportunity arises when –
- The residuals hit -2 standard deviations (-2SD). This is a long signal on the pair, so we buy Y and sell X
- The residuals hit +2 standard deviation (+2SD). This is a short signal on the pair, so we sell Y and buy X
Having said so, I generally prefer to initiate the trade when the residuals hit 2.5 SD or thereabouts. Once the trade is initiated, the stop loss is -3 SD for long trades and +3SD for short trades and the target is -1 SD and +1 SD for long and short trades respectively. This also means, once you initiate a pair trade, you will have to track the residual value to know where it lies and plan your trades. Of course, we will discuss more on this later in this chapter.
13.2 – Note for the programmers
In chapter 11, I introduced the ‘Pair Data’ sheet. This sheet is an output of the Pair Trading Algo. The pair trading algo basically does the following –
- Downloads the last 200-day closing prices of the underlying. You can do this from NSE’s bhavcopy, in fact, automate the same by running a script.
- The list of stock and its sector classification is already done. Hence the download is more organized
- Runs a series of regressions and calculates the ‘error ratio’ for each regression. For example, if we are talking about RBL Bank and Kotak Bank, then the regression module would regress RBL (X) and Kotak (Y) and Kotak (X) and RBL (Y). The combination which has the lowest error ratio is considered and the other combination is ignored
- The adf test is applied on the residuals, for the combination which has the lowest error ratio.
- A report (pair data) is generated with all the viable X-Y combination and its respective intercepts, beta, adf value, standard error, and sigma are noted. I know we have not discussed sigma yet, I will shortly.
If you are a programmer, I would suggest you use this as a guideline to develop your own pair trading algo.
Anyway, in chapter 11, I had briefly explained how to read the data from the Pair data, but I guess it’s time to dig into the details of this output sheet. Here is the snapshot of the Pair data excel sheet –
Look at the highlighted data. The Y stock is Bajaj Auto and X stock is TVS. Now because this combination is present in the report, it implies – Bajaj as Y and TVS as X has a lower standard error ratio, which further implies that Bajaj as X and TVS as Y is not a viable pair owing to higher error ratio, hence you will not find this combination (Bajaj as X and TVS as Y) in this report.
Along with identifying which one is X and Y, the report also gives you the following information –
- Intercept – 1172.72
- Beta – 2.804
- ADF value – 0.012
- Std_err – -0.77
- Sigma – 103.94
I’m assuming (and hopeful) you are aware of the first three variables i.e intercept, Beta, and ADF value so I won’t get into explaining this all over again. I’d like to quickly talk about the last two variables.
Standard Error (or Std_err) as mentioned in the report is essentially a ratio of Today’s residual over the standard error of the residual. Please note, this can get a little confusing here because there are two standard error’s we are talking about. The 2nd standard error is the standard error of the residual, which is reported in the regression output. Let me explain this with an example.
Have a look at the snapshot below –
This is the regression output summary of Yes Bank versus South Indian Bank. I’ve highlighted standard error (22.776). This is the standard error of the residuals. Do recollect, we have discussed this earlier in this module.
The second highlight is 20.914, which is the residual.
The std_err in the report is simply a ratio of –
Today’s residual / Standard Error of the residual
= 20.92404/22.776
= 0.91822
Yes, I agree calling this number std_err is not the best choice, but please bear with it for now ☺
This number gives me information of how today’s residual is position in the context of the standard distribution. This is the number which is the key trigger for the trade. A long position is hit if this number is -2.5 or higher with -3.0 as stop loss. A short position is initiated if this number reads +2.5 or higher with a stop loss at +3.0. In case of long, target is at -1 or lower and in case of short, the target is +1 or lower.
This also means, the std_err number has to be calculated on a daily basis and tracked to identify trading opportunities. More on this in a bit.
The sigma value in the pair data report is simply the standard error of the residual, which in the above case is 22.776.
So now if you read through the pair data sheet, you should be able to understand the details completely.
Alright, let us jump to the trade now ☺
13.3 – Live example
I have been running the pair trading algo to look for opportunities, and I found one on 10th May 2018. Here is the snapshot of the pair data, you can download the same towards the end of this chapter. Do recollect, this pair trading algo was generated using the closing prices of 10th May.
Look at the data highlighted in red. This is Tata Motors Ltd as Y (dependent) and Tata Motors DVR as X (independent).
The ADF value reads, 0.0179 (less than the threshold of 0.05), and I think this is an excellent adf value. Do recollect, ADF value of less than 0.05 indicates that the residual is stationary, which is exactly what we are looking for.
The std_err reads -2.54, which means the residuals is close has diverged (sufficiently enough) away from the mean and therefore one can look at setting up a long trade. Since this is a long trade, one is required to buy the dependent stock (Tata Motors) and short the independent stock (Tata Motors DVR). This trade was supposed to be taken on 11th May Morning (Friday), but for some reason, I was unable to place the trade. However, I did take the trade on 14th May (Monday) morning at a slightly bad rate, nevertheless, the intention was to showcase the trade and not really chase the P&L.
Here are the trade execution details –
You may have two questions at this point. Let me list them for you –
Question – Did I actually execute the trade without checking for prices? As in I didn’t even look at what price the stocks, I didn’t look at support, resistance, RSI etc. Is it not required?
Answer – No, none of that is required. The only thing that matters is where the residual is trading, which is exactly what I looked for.
Question – On what basis did I choose to trade 1 lot each? Why can’t I trade 2 lots of TM and 3 lots of TMD?
Answer – Well this depends on the beta of the stock. We will use the beta and identify the number of stocks of X &Y to ensure we are beta neutral in this position. The beta neutrality states that for every 1 stock of Y, we need to have beta*X stock of X. For example, in the Tata Motors (Y) and Tata Motors DVR (X) for example, the beta is 1.59. This means, for every 1 stock of Tata Motors (Y), I need to have 1.59 stocks of Tata Motors DVR (X).
Going by this proportion, the lot size of Tata Motors (Y) is 1500, so we need 1500*1.59 or 2385 shares of Tata Motors DVR (X). The lot size is 2400, quite close to 2385, hence I decided to go with 1 lot each. But I’m aware this trade is slightly more skewed towards the long side since I’m buying additional 115.
Also, please note, because of this constraint, we cannot really trade pairs if the beta is –ve, at least, not always.
Remember, I initiated this trade when the residual value was -2.54. The idea was to keep the position open and wait for the target (-1 on residual) or stop loss (-3 on residual) was hit. Until then, it was just a waiting game.
To track the position live, I’ve developed a basic excel tracker. Of course, if you are a programmer, you can do much better with these accessories, but given my limited abilities, I put up a basic position tracker in excel. Here is the snapshot, of course, you can download this sheet from the link posted below.
The position tracker has all the basic information about the pair. I’m guessing this is a fairly easy sheet to understand. I’ve designed it in such a way that upon entering the current values of X & Y, the latest Z score is calculated and also the P&L. I’d encourage you to play around this sheet, even better if you can build one yourself ☺
Once the position is taken, all one has to do is track the z-score of the residual. This means you have to keep tracking the values and the respective z-scores. This is exactly what I did. In fact, for the sake of this chapter, my colleague, Faisal, logged all the values (except for the 14th and 15th). Here are the logs –
As you can see, the current values were tracked and the latest z-score was calculated several times a day. The position was open for nearly 7 trading session and this is quite common with pair trading. I’ve experienced positions where they were open for nearly 22 -25 trading sessions. But here is the thing – as long as your math is right, you just have to wait for the target or SL to trigger.
Finally, on 23rd May morning, the z-score dropped to the target level and there was a window of opportunity to close this trade. Here is the snapshot –
Notice, the gains in Tata Motors DVR is much larger than the loss in Tata Motors. In fact, when we take the trade, we will never know which of the two positions will make us the money. The idea, however, is that one of them will move in our favor and the other won’t (or may). It’s however, just not possible to identify which one will be the breadwinner.
The position tracker for the final day (23rd May) looked like this –
The P&L was roughly Rs.14,000/-, not bad I’d say for a relatively low-risk trade.☺
13.4 – Final words on Pair Trading
Alright guys, over the last 13 chapter, we have discussed everything I know about pair trading. I personally thing this is a very exciting way of trading rather than blind speculative trading. Although less risky, pair trade has its own share of risk and you need to be aware of the risk. One of the common ways to lose money is when the pair can continue to diverge after you initiate the position, leaving you with a deep loss. Further, the margin requirements are slightly higher since there are two contracts you are dealing with. This also means you need to have some buffer money in your account to accommodate daily M2M.
There could be situations where you will need to take a position in the spot market as well. For example on 23rd May, there was a signal to go short on Allahabad Bank (Y) and long on Union Bank (X). The z-score was 2.64 and the beta for this pair is 0.437.
Going by beta neutrality, for every 1 share of Allahabad Bank (Y), I need 0.437 shares of Union Bank (X). The Lot size of Allahabad Bank is 10,000, this implies I need to buy 4378 shares of Union Bank. However, the lot size of Union Bank is 4000, hence I had to buy 370 shares in the spot market.
Well, I hope I trade is successful ☺
I know most of you would want the pair data sheet made available. We are working on making this sheet available to you on a daily basis so that you can track the pairs. Meanwhile, I would suggest you try and build this algo yourself. If you have concerns, please post it below and I will be happy to assist.
If you don’t know how to program then you have no option but to find someone who knows programming and convince him or her that there is money to be made, this is exactly what I did ☺
Lastly, I would like to leave you with a thought –
- We run a linear regression of Stock A with Stock B to figure out if the two stocks are cointegrated with their residuals being stationary
- What if Stock A with Stock B is not stationary, but instead Stock A is stationary with stock B & C as a combined entity?
Beyond Pair, trading lies something called as multivariate regression. By no stretch of the imagination is this easy to understand, but let me tell you if you can graduate to this arena, the game is different.
Download the Position Tracker and Pair Datasheet below:
Key takeaways from this chapter
- The trigger to trade a pair comes from the residual’s current value
- Check for beta neutrality of the pair to identify the number of stock required in X and Y
- If the beta of the pair is negative, then it may not be possible to set up the trade
- Once the trade is initiated, check the z-score movement to trade its current position
- The price of the futures does not really matter, the emphasis is only on the z-score
Hey Karthik,
how do we position size the trade? Is it possible for us to know the amount we would lose if the trade goes wrong?
You will have to go by beta neutrality to position size this trade. Have explained this in the chapter.
It is hard to know how much you may lose in advance as the trade is on the z-score.
hey Karthik,
I understood the beta neutrality part of position size. My question is, let me take an example from above. lets say 1 lot of tata motors is equal to 1 lot of tata motors DVR. In this case if I want to trade 2 lots of each of the script then how do I know if it is safe for me to position 2 lots assuming I have funds to take the position. That’s why I wanted to know if their is any way if we could know the amount of loss we would suffer in case the SL hits. Assuming I am willing to risk 5% of my account in one trade. If trading 1 lot of both has a SL at 2.5% then I can probably do 2 lots right? Is there any rough approximation of finding out the loss. I am not worried about winning because we are entering at 2SD level. And from the trade you mentioned above, you have entered the trade of tata and tata dvr on 14th and profit was taken on 23rd. Assume you have not got the profit signal and the trade is open, there was a quaterly result announcement on 24th, what do you recommend us to do? Should we close the trade or keep it open if there is any situation like this?
Thank you.
I get your concern, Nikhil. Unfortunately, there is no way this can translate to a Rupee value. Or maybe I need to do some research, I will certainly get back to you on this. But here is what I have observed, whenever this position makes a loss (assuming its initiated at 2.5SD), it is usually in the range of 10-12K per beta neutral pair. This is purely from my observation and no concrete science backing this.
Thanks Karthik.
What about the 2nd question on result announcement part?
Hello sir
I have a doubt. Are the regression parameters “static” or “dynamic” in your basic excel tracker sheet? As we know these estimators (coefficients, std. error….) have variance. Everytime we regress the securities the values of the estimators get varied. So taking the parameters of the eqn. y=mx+c as constant and calculating Z-Score based on earlier calculated estimators might give some error, i guess.
What is your view, sir?
Thank you very much for this most awaited chapter, sir!
Varsity student
You are right on that. Given this, you may also want to look at the variance or standard error of the intercept as well. However, these trades are open for few days, unless there is a drastic move in the share price, you will not experience big changes in the variables.
When we run linear regression using python statsmodels library we don’t get STD error of residuals in the output. Is there any formula to calculate the STD error using residual values?
You actually don’t need the standard error of residuals, you only need the residuals. The standard deviation of the residuals is the standard error of residuals.
How to calculate z score
You can divide the residual by standard error of residual to get the Z-score.
Sir what is use of Z-score.M not geeting.
Z-Score tells you where the variable is with respect to its mean. For example, a Z-score of 2.5 indicates that the variable is 2.5 standard deviations away from the mean.
Thanks a lot sir
Welcome!
Can this method be used in cash also or in future only because I have less capital also trial purpose can pair with small quantities
Its best if you do it in futures because the strategy requires you to short stocks as well, which is possible only in futures.