4.1 – Correlation and its types
I have to mention this at this point. The pair trading technique we are discussing now is discussed in a book called, ‘Trading Pairs’, by Mark Whistler. I like this book for the fact that it got me hooked to Pair trading and over time as my interest grew, I explored the strategy beyond Mark Whistler’s techniques. Needless to say, I will discuss those techniques later in this module. At this point, my intention is to take you through the exact learning path I underwent learning pair trading.
Towards the end of the previous chapter, we introduced the concept of correlation and the way one can analyze the correlation values. We will take that discussion forward now and understand how to calculate the correlation between two stocks, on excel. As you may have guessed by now, the calculation of Correlation between two stocks is the key in pair trading.
For the sake of this example, I’ve considered Axis Bank and ICICI Bank. Both are Private sector banks and have similar business backgrounds, hence intuition says that the two stocks should be highly correlated.
At this point, I have downloaded the closing price of Axis Bank and ICICI Bank from 4th Dec 2015 to 4th Dec 2017, roughly 2 years of trading data or about 496 data points.
Before we proceed, a quick note on data –
- Make sure you are dealing with the same number of data points. For example, if you have 400 data point for Stock A, then you need to ensure you have the same number of data points for Stock B, corresponding to same dates.
- Make sure the data is cleaned for corporate actions such as bonus/splits etc
As you can see from the above image, besides ICICI and Axis, I have also downloaded the data for BPCL, HPCL, and HDFC Bank. You can use this data to build and test other correlations.
Anyway, at this stage, the only data we have is the date and the closing price of the stock. We will go ahead and calculate the daily returns. I guess you are familiar with the daily return calculation, we have discussed this several time in the previous module.
The daily return can be calculated as
= [today’s closing price / previous day’s closing price] – 1
I’ve calculated this for both ICICI and Axis Bank –
Now, correlation can be calculated on the basis of two parameters –
- The daily closing price
- The daily return series
The daily closing price correlation requires you to calculate the correlation based on the closing prices of two stock. I’m not a big fan of calculating correlation on closing prices, but then let’s just go ahead and do this for time being.
To do this in excel, simply use the ‘=Correl()’, function on the daily closing prices. I’m running this calculation on a new sheet, which is labeled it as ‘Pair Data’.
Here is the snapshot –
The correlation between the closing prices of ICICI Bank and Axis bank is 0.51. Not particularly a great correlation, but we can live with this for now. Do recollect, our gut said the two banks could be highly correlated as they have similar business backgrounds, but the number is painting a slightly different picture 🙂
We will now run the correlation on the daily % return series for the two stock. I’ve already calculated this % return, I’ll just have to run the correl function now.
Again, not a very encouraging number, but that is ok for now.
Some traders, run the correlation on the absolute per day change calculated as ‘Today’s stock price – yesterday’s stock price’. Again, I’m not a big fan of this. But let me just go ahead and introduce the same to you –
In all the above calculations, I’ve run the correlation of Axis Bank versus ICICI Bank, the results obtained will be same if I had opted to calculate the correlation of ICIC Bank versus Axis. Generally speaking, the correlation between A and B is the same as Correlation between B and A.
In this method of trading pairs, the correlation number is considered sacred. Ideally speaking, the number should be above 0.75. Clearly, that is not the case with ICICI and Axis, but then as I mentioned earlier, we can live with it.
4.2 – Setting up the datasheet
In the previous chapter, we discussed three variables concerning the pairs namely the spread, differential, and the ratios. Let us go ahead and calculate these variables on the two stocks we are studying. We will do this on a separate sheet within the same workbook and name the sheet as the ‘Data Sheet’. Here is the snapshot –
The calculation of these variables is quite straightforward, I’ve explained this in the previous chapter.
Different types of Pair Trading works at different complexities levels. We will deal with basic stats for this version of pair trading. Given this, we will now define 3 most commonly used statistic variables.
4.3 – Basic stats
I’ll discuss 3 basic statistical terms at this stage. These are basic terms which play a very crucial role in pair trading. I’m fairly certain that you’d have learned these in your high school math, even otherwise this is quite basic and you can pick it up anytime.
To help you understand these jargons better, I’ve come up with a set of arbitrary runs scored by batsmen across 10 cricket matches –
Mean – Also called the arithmetic average, represents the average of a set of numbers. You can calculate the average by taking the sum of all the observations by the total number of observations.
So if I were to find the average in the above example, I’d total up all the scores and divide it by 10 (10 being the total number of observations).
Mean (Average) = 626/10
On excel, you can simply use the ‘=Average ()’ function to calculate the average of any set of numbers.
Median – The median number represents the middle number of the data series when the data series is arranged in its numerical order. If there are even set of numbers (which is the case here), then we have to take the average of the middle two numbers to calculate the mean. However, if there are an odd number of data points, then we simply take the middle data point as the median.
So let me rearrange the data points in its numerical order –
23, 34, 44, 51, 55, 65, 72, 82, 100, 100
Since there are even numbers of observation, I’ll take the middle two numbers i.e 55 and 65, their average represents the median.
Median = (55 + 65)/2
The excel function to calculate median is ‘=Median()’.
The mean and median when viewed together gives a sense of the trend. More on this later.
Mode – The mode of a data series is simply that data point which occurs the most number of times in the series. Clearly, 100 is repeating twice, with no other number appearing more than once, and that makes it the mode of the data series.
The excel function to calculate Mode is ‘=Mode()’.
In the next chapter, we will use these function in excel and understand its relevance to pair trading.
Download the excel sheet used in this chapter here.
Key takeaways from this chapter
- Care has to be taken to ensure the data is clean and adjusted for corporate actions
- Close correlation is the correlation when calculated on the closing prices of stocks
- The % return correlation is the correlation when calculated on the daily returns of the stock
- Mean is the arithmetic average of the data series
- Median is the middle observation of a data series.
- If the data series has even number of observations, then the median is the average of the middle two observations
- If the data series has odd number of observations, then the median is the middle observation
- The mode of a data series is that value which repeats the highest number of times
- The mean and median, when viewed together to each other, offers great insight into the data trend.