## 8.1 – A straight relationship

Today happens to be 14^{th} of Feb, people around me are excited about Valentine’s Day, they are busy celebrating love and relationships. I think Valentine’s Day is a packaged affair, meant to boost the revenues of restaurants, jewelers, and gift shops, but then it’s just me and my random thoughts.

Anyway, given its valentine’s day, I thought it would be a perfect idea to discuss relationships. Don’t worry, I’m not going to bore with a clichéd love story or give you any unsolicited advice on maintaining a great relationship, rather I’ll talk to you about two sets of numbers and how you can measure the relationship between them if at all there exists one.

In the process, I’ll attempt to take you back to your school days, well, at least back to your high school math class ☺

A quick recap here – Chapter 1 to 7 of this module, we discussed a rather simple technique of pair trading. This was as taught by Mark Whistler. Moving forward from this chapter, we will discuss a slightly more advanced technique of pair trade. This is also called ‘**Statistical Arbitrage**’ or ‘**Relative value trading**’ or RVT in short.

So here we go.

Do you remember the time your math teacher discussed the equation of a **straight line **in the class? If you were like me, you’d have promptly ignored the lecture and looked outside of the window, quietly rebelling against the mainstream education.

But then, if only the teacher had said ‘learn this, you’ll make money off it someday’, the interest level would have been totally different!

Anyway, life always gives you a second chance, so this time around, pay attention, and hopefully, you will make some money off it ☺

The equation of a straight line reads something like this –

Y = mx + ε

**Click here** for a detailed explanation, or continue reading for a barebone explanation.

Before we discuss the equation, a quick note on the notations used –

y = Dependent variable

M = Slope

X = Independent variable

E = Intercept

The equations states, the value of a dependent variable ‘y’ can be derived from an independent variable ‘x’, by multiplying x by its slope with y’ and adding the intercept ‘e’ to this product.

Sounds confusing? I guess so ☺

Let me elaborate on this and by the way before you start thinking why we are discussing the straight line equation instead of relative value trading (RVT), then please be rest assured, this concept has deep relevance to RVT!

Consider two fitness freaks, let’s call them FF1 and FF2, between the two, FF2 is the kind of guy who wants to go that step extra and something more than what FF1 does. So if FF1 does 5 pushups, FF2 does 10. If FF1 does 20 pull-ups, then FF2 does 40. So on and so forth. Here is a table on how many pushups they did Monday to Saturday –

Day |
FF1 |
FF2 |
---|---|---|

Monday | 30 | 60 |

Tuesday | 15 | 30 |

Wednesday | 40 | 80 |

Thursday | 20 | 40 |

Friday | 10 | 20 |

Saturday | 15 | ??? |

Now, if you were to guess the number of push-ups FF2 would do on Saturday, what would it be? I guess it’s a no-brainer, it would be 30.

This also means – the number of pushups FF2 does, is kind of dependent on the number of pushups FF1 does. FF1 does not really bother about FF2, he will go ahead and do as many pushups his body permits, but FF2, on the other hand, does twice the number of pushup as FF1.

So this makes FF2 a dependent variable and FF1 an independent variable. Or in the straight line equation, FF2 = y and FF1 = x.

FF2 = FF1*M + ɛ

In simple English, the equation reads like this –

The number of pushups FF2 does is equal to the number of pushups FF1 does, multiplied by a certain number, plus a constant.

That certain number is called the slope (M), which happens to be 2, and the constant or ɛ happens to be 0. So the equation is –

FF2 = FF1*2 + 0

I hope this is fairly clear now. Let me copy paste the definition I had posted earlier –

*The straight line equations states, the value of a dependent variable ‘y’ can be derived from an independent variable ‘x’, by multiplying x by its slope with y’ and adding the intercept ‘e’ to this product.*

Now, think about another case –

There are two hungry men, let’s call them H1 and H2. Just like FF1 and FF2, H2 eats twice the number of paratha as H1 plus 1.5 more. For example, if H1 eats 2 parathas, then H2 will eat 4 plus eat another 1.5. H2 will always ensure he eats that extra 1.5 parathas, no matter how full he is.

So here is the table which gives you count of how many parathas these two hungry men ate over the last 6 days –

Day |
H1 |
H2 |
---|---|---|

Monday | 2 | 5.5 |

Tuesday | 1.5 | 4.5 |

Wednesday | 1 | 3.5 |

Thursday | 3 | 7.5 |

Friday | 3.5 | 8.5 |

Saturday | 4 | ??? |

If you notice, H2 (who is really hungry, all the time), eats twice as much as H1 plus 1.5 paratha extra. So on Saturday, he will eat –

4*2 + 1.5 = 9.5 paratha!

Remember, the number of parathas H2 eats is dependent on how many parathas H1 eats. H1, on the other hand, eats till he is satisfied. Given this, let us a construct a straight line equation for these two hungry men, just like the way we did for the two fitness freaks.

H2 = H1*2 + 1.5

Here, H2 is the dependent variable, whose value is dependent on H1. 2 is the slope, and 1.5 is the constant.

Before we proceed, let’s make a small change in the paratha example, think of ‘Y’ as a diet conscious person. Every day, irrespective of how hungry or full Y is, he eats just 1.5 parathas. Not a morsel more or not morsel less.

So, X eats 3 paratha, Y eats 1.5, X eats 5, Y eats 1.5, X eats 2.5, Y eats 1.5. So on and so forth. So what do you think the equation states?

y = x*0 + 1.5

The slope here is 0, hence, y is not really dependent on x, in fact, the value of y is a constant of 1.5, which is quite obvious. Hopefully, you get the point by now on how you can relate two sets of numbers.

Now forget the fitness, forget the parathas, I’ll give you two sets of random numbers –

X |
Y |
---|---|

10 | 3 |

12 | 6 |

8 | 4 |

9 | 17 |

20 | 36 |

18 | 22 |

X is the independent variable and Y is the dependent variable. Given this, do you see a relationship between these two sets of numbers here? Eyeballing the numbers suggest that there is no relationship between X and Y, definitely not like the one which existed in the above two examples. But this does not mean that there is no relationship between the two at all. It’s just the relationship is not obvious to the naked eye.

So how do we establish the relationship between the two? To be more precise, how do we figure out the values of the slope’ and the constant ‘ɛ’?

Well, say hello to linear regression!

I’ll introduce the same to you in the next chapter.

## Key takeaways from this chapter

- A straight line equation can define the relationship between two variables
- Of the two variables, one of it is dependent and the other one is independent
- The slope of a straight-line equation, represented by ‘m’ helps you identify the extent by which the independent variable has to be scaled
- The term ɛ represents a constant term
- If the slope is zero, the Y = ɛ
- Sometimes, the relationship between two variables is not obvious
- When the relationship is not obvious, one can identify the relationship by employing a statistical technique called ‘Linear regression’.

Sir excellent write up as usual. Don’t get me wrong but I think after waiting eagerly for a week I think the chapters are way too small.

Sundeep, I’m glad you liked the chapter. I had to stop this chapter here as the explanation of linear regression would make it very lengthy, so decided to keep that as a new chapter.

Loved the short and sweet chapter. On the first look I wanted to run away because of Math. But I stayed.. It was very easy to understand. Thank you so so much Karthik Rangappa Ji.

Thanks, Michael. The next one (hopefully) will get more interesting!

Please stay tuned and happy learning 🙂

Very nice sir, waiting for next article.

Yes, hopefully in the next week!

Hi Karthick- provide a Kindle version of all your brilliant write ups. That would be immensely beneficial.

Will keep a note of that, Vishnu.

Sir are all mean reverting strategies a variant of pairs trading?

They belong to the same family.

What are few stat arbitrage strategies other than mean reverting ones?

Risk arb, Merger Arbitrage, and all strategies where the corporate action is involved.

Sir I’d guess arbitrage based on M&A or corporate actions wouldn’t happen very often and so it’s not feasible for long term as the signals would be less frequent. Can you name one or two statarb strategies that could produce trading signals at least 5 to 6 times per year,that is not pairs trading?

Option based, delta hedging in another strategy. I will discuss this in the same module shortly.

Hi,

I personally learnt a lot from Zerodha Varsity. The best thing what I like is the approach taken to explain things to the layman and taking real world examples for trading activities. (Special mentions Short selling and Futures market explanation).

Also, I would like to know the various statistical and data science techniques which can be used for trading.

Thanking you,

Rajarshi.

Glad you liked the contents here, Rajarshi!

This entire module is dedicated to stats based trading strategies.

Karthik, You have a link on getting detailed explanation on linear. After reading 6 pages, came out and saw your example of fitness freak & hungry parathas :D. Why were you not my maths teacher? 🙂

Rohit, I was never good maths student 🙂

I will talk about linear regression in the next chapter, which should be out next week. Happy learning!

Karthik, I read most of your writings and like them very much.

Here, you are mixing up terminology. You are calling ‘slope’ as ‘intercept’ and ‘intercept’ as error. Please correct.

Sure, let me read this again. Slope = beta = M.

Thank you so much! I’ve made the corrections.

Sir I’ve a question. If one has learnt thoroughly whatever is written in varsity till now, mastered everything, what level of expertise do you think one would have, in respect to trading alone? Around 30-40%? Thanks in advance.

Sundeep, it really depends on how well you translate the learning to practice in the market.

Sir I can understand what you’re saying. But I just wanted to know, if this was a university syllabus, what is the level of completion, including the latest chapter. I’m just trying to gauge the depth of things to come. Thanks for the amazing work you’re doing sir. I’m behalf of all varsity readers, I’d like to say that we’re greatly indebted to you.

Sundeep, this will get little more challenging going forward, but I’ll try my best to keep everything restricted to markets. Happy learning 🙂

I like the way you explain things. I guarantee no teacher can explain in such a simple language.

10/10 full marks.

Haha, thanks so much for the kind words and full marks 🙂

hello sir,

what is the risk reward ratio for this kind of trading?

It is around 1:5 reward to risk of 1.

Karthik got to say that I have never ever read such a lucid explanation about the slope of line, even after having done it in highschool and courses on Coursera! Karthik you indeed are a superhero! But waiting for the Financial Modelling module to solve all the doubts by your out of the world explanations. Please Karthik.

Thanks a lot Karthik for everything. Wishing you a long life full of happiness!

🙂

I feel happy to read your comment, thanks for the kind words 🙂

I’ll try and put up the content on financial modelling sometime soon. Thanks.

Hi Karthik, thanks for the beautiful modules. Varsity is going to change the way retail traders in India are going to approach the market.

My question is regarding the comparison between the previous method and this one. May I know why we need this method while we can work with the previous method ? Thanks.

These are two different methods to do the same thing. For example, if you wish to value security, you can do so by either looking at DCF valuation or relative valuation, both methods lead to getting a perspective on valuations, but its just that DCF is more elaborate and incorporates all aspects of the company. Similarly these two methods. If I were to pick one, I’d pick the 2nd method.

Noted. thanks a lot for the quick response.

I have one more question. I had an opportunity using a small script I have developed to analyse different pairs. What I have noticed is that the hard limit of +/- 2.5 StD is not suitable for many pairs. I see some pairs even reverse to the mean as soon as they hit say +1.7 mean. Also, I have noticed a bias in the positive and negative side. This leads to (with my limited knowledge) a situation where we may miss the entry and exits. In order to overcome this, I have written in my script to select a 85th percentile std on positive and negative side to use as my trigger points, i.e I am setting the entry points as the Std/Zscore below which 85% of the deviations are accounted for.

can you please provide your feedback on my approach.

Thanks.

Bhalaji, I can completely understand what you are saying. Yes, it is difficult to get the signals at 2.5, each and every time and I’ve also noticed pairs converging/diverging at 1.8 or so. I’d suggest you go with this on a case to case basis. For some, 1.8 works and other maybe 2.5.

Hi Karthik, thanks for acknowledging. I am trying my 85 percentile approach. I will keep us all posted on my learnings. Thanks.

Good luck!

Hi Sir,

I loved your articles and always eager to learn more about creating strategies and developing a model.

Good luck, and happy reading 🙂

you can be great math teacher 🙂

Happy learning, Raghu!