the "Best Line" Fit to Data and the Error and Beta   a continuation of Part II

We'll summarize what we found in Parts I and II:

We're fitting a straight line, y = Mx + K, to a collection of points (xn, yn) ... called the Regression Line.
We're using the notation:
Σx = x1 + x2 + ... + xN
and
Σxy = x1y1 + x2y2 + ... + xNyN
etc. etc.

M and K are chosen so as to minimize the Mean Squared Error:
Error2 = (1/N)Σ{yn - (M xn + K) }2

That requirement gives:
M = { N Σxy - Σx Σy } / { N Σx2 - ( Σx )2 }

K = { Σx2 Σy - Σx Σxy } / { N Σx2 - ( Σx )2 }



Figure 1
We saw that the slope of the "best fit" line can be written:
[1]

M = COVAR[x,y] / SD2[x]
    where COVAR[x,y] = (1/N)Σxy - {(1/N)Σx} { (1/N)Σy} = Mean[xy] - Mean[x]Mean[y] is the Covariance of x and y
    and SD2[x] = (1/N)Σx2 - {(1/N)Σx}2 = Mean[x2] - (Mean[x])2 is the Variance or (Standard Deviation)2 of the set of x's

    (See stat-stuff.htm#3)

K = {Mean[x2]Mean[y] - Mean[x]Mean[xy]} / SD2[x]
= {(SD2[x] +(Mean[x])2)Mean[y] - Mean[x](COVAR[x,y] +Mean[x]Mean[y])} / SD2[x]

>Why are you doing this again?
Although we've identified the "best line" fit to the data, we failed to determine the minimum error.

>The minimum error?
Yes, the minimum of Error ... remember? The slope and intercept of the "best line", that's M and K, was chosen to minimize Error.

So we write:
Error2= (1/N)Σ{y - (M x+ K)}2     where we're dropping the subscripts for sanitary reasons
= (1/N)Σ{y2 - 2y(M x+ K) + (M x+ K)2}
= (1/N)Σy2 - 2(M/N)Σxy - 2(K/N)Σy + (M2/N)Σx2 + (2MK/N)Σx + (K2/N)Σ(1)

>Ugh.
Do you see all those Means?
>Ugh!
The Error can be expressed in terms of five Means.
In fact, Error can be expressed in terms of the statistical parameters of the x- and y-sets and their Covariance ... like so:
To simplify we'll let:

    Mean[x] =X,   SD[x] = A
    Mean[y] =Y,   SD[y] = B
    COVAR[x,y] = C

then, using [1]:
    Mean[xy] = COVAR[x,y] + Mean[x]Mean[y] = C + XY
    Mean[x2] = SD2[x] + (Mean[x])2 = A2 + X2
    Mean[y2] = SD2[y] + (Mean[y])2 = B2 + Y2

so we can write
    M = C / A2
    K = ((A2+X2)Y - X(C+XY) ) / A2 = (A2Y - CX) / A2


so
Error2 = (1/N)Σy2 - 2(M/N)Σxy - 2(K/N)Σy + (M2/N)Σx2 + (2MK/N)Σx + (K2/N)Σ(1)
= (B2+Y2) -2(C/A2)(C+XY) -2(A2Y - CX)Y/A2 +(C/A2)2(A2+X2) +2(C/A2)(A2Y-CX)X/A2+(A2Y-CX)2/A4
    where Σ(1) = 1+1+1+...+1 = N

>zzzZZZ
Patience! We just simplify. Lots of stuff cancels out and we get (finally!) Error2 = B2 - C2 / A2:
We place this in a position of eminence:
[2]

Error2= SD2[y] - COVAR2[x,y] / SD2[x]
= SD2[y] (1 - COVAR2[x,y] / SD2[x]SD2[y])
= SD2[y] (1 - r2)

>Huh?
[3]

r = COVAR[x,y] / SD[x]SD[y] is the Pearson correlation


The square of r is called ...

>R-squared?
Yes. If the correlation r = 1 or -1, then the Error is zero. The points (x1, y1), (x2, y2) etc. lie right on that "best line".
  • r = +1 means perfect (linear) correlation
  • r = 0 means no correlation
  • r = -1 means perfect inverse correlation
   

>And for zero correlation then ... uh ...
Then the Error is just the Standard Deviation of the set of y's.

For example, stare at the charts here
The values of x1, x2, etc. are the same for both charts.
In fact, the Pearson correlation is also the same for both charts (namely r = 0.99).
The difference is in the volatility of the set of y's:
SD[y] is larger for the lower chart ... hence the Error is larger.

In fact, it's larger in proportion to the Standard Deviation.
(But that's just because r happens to be the same for both charts.)

In general, changing the Standard Deviation of the y's will change r as well so we can't conclude that the Error is smaller just because r is smaller.

>But it helps.
Yes. It helps.


Figure 2
>So that Error, SD2[y] (1 - r2) ... does it have a name?
Uh ... not that I know of. How about calling it Error?
>Very funny.
Besides, we're calling SD2[y] (1 - r2) the Error2   because it's the Mean Squared Error.

One thing that's a little bothersome is that the Error isn't symmetrical in x and y.
>Huh?
Although the correlation r is unchanged if the x's and y's are switched, the Error does change. That seems strange, doesn't it? I mean, if you want to know the error in fitting a straight line to (x,y) data, why should the resultant error depend upon which variable you choose as x and which as y? Taking the vertical distance to that "best line" gives to the y's a special role.
We could introduce symmetry by calculating the average of the two Errors, when the x's and y's are switched:

      symmetrical Error2 = (1/2)(SD2[x] + SD2[y] )(1 - r2)
Or (and this one I like better), we could take as Error2 the Mean Squared perpendicular distance of the points to that "best line". That'd give another symmetrical Error:
[4]

Error2 = (1 - r2)SD2[x]SD2[y]/(SD2[x] + SD2[y] )


Figure 3

>So does that Error have a name?
Uh ... not that I know of. How about calling it another ...?
>Forget it. So ... how about the slope and intercept?
You mean M and K? So, what about them?
>Do they change when you interchange x and y?
Uh ... good point. They do change.

Since the covariance, C, doesn't change, M is either C/A2 = COVAR[x,y]/SD2[x] or C/B2 = COVAR[x,y]/SD2[y].

So the moral of this story is just this:
When doing "best line" fit to (x,y) data, be wise in choosing which variable is x and which is y.


There's this other thing called Beta, namely:
[5]

Beta[x,y] = COVAR[x,y] / SD2[x]


It's used to determine whether two time series (say the monthly S&P 500 returns and the returns for Microsoft) tend to move up or down together and ...

>Hey! That Beta is just the slope of that "best line" fit ... isn't it?
Yes. Beta[x,y] = M.
If the two sets, x and y, are daily (weekly? monthly?) Returns*, then a Beta[x,y] of 1.5 means that the increase in the y-Returns tend to be 1.5 times the increase in the x-Returns. That means that the y-Returns tend to change more than the x-Returns. That means ...

>But that Beta depends upon which set of returns you choose for x and y, right?
Yes. It could be COVAR[x,y] / SD2[x] or COVAR[x,y] / SD2[y] so one normally uses Beta to determine the relationship of stock returns to the Market ... in which case the x-Returns are, say, the TSE300 or the S&P500 or some other "benchmark" set of returns.

>So Beta = 1.5 means your stock is 1.5 times more volatile than the market, eh?
Uh ... not exactly. It's SD that measures volatility, not Beta.
>But I've read that "Beta measures the volatility of a stock compared to the volatility of the market".
Yeah, I've read that too. (Check google.)
In fact, Beta measures how changes in returns are related ... not the returns themselves.
Remember, the slope is (change in y) / (change in x).
For example, if Market returns increase from 8% to 10% (that's a 2% increase) then you might expect your stock returns to increase by 1.5(2) = 3%.
>Assuming Beta = 1.5, right?
Right. We could say that the stock will participate in Market moves, but more or less, depending upon Beta.

Figure 4

In fact, we have:
[6]

Beta[x,y] = r SD[y] / SD[x]

since COVAR[x,y] = r SD[x] SD[y].
So Beta is the ratio of volatilities multiplied by the Pearson correlation.

Here are some examples:

However, if the "best fit" line happens to pass through the origin, then the slope is y / x.

>Pass through the origin? That means the Intercept = 0, right?
Yes, and the intercept also has a name. It's called ...
>It's called K, right?
Well ... uh, investment gurus call it Alpha.

Notice an interesting thing: if we're measuring the Beta of a set of returns with itself (so y = x) then Beta[x,x] = COVAR[x,x] / SD2[x] = 1.
(That just says that the "best fit" line, namely y = x, has slope = 1.)
That means that Beta of the Market is 1 ... since we'd be comparing the Market with itself!

Bloomberg (and others), define Beta as the slope of the "best line" fit when you plot excess returns: the stock against the Market
>Excess?
Yes, the actual return less some risk-free return such as Money Market or maybe T-bills ... but we'll stick with the actual returns and forget using the excess.
>Does it make a difference?
Not really. Subtracting a constant risk-free rate, C, from returns will give the same value for
Beta = COVAR[x,y] / SD2[x] ... since neither COVAR[x,y] nor SD[x] will change when x and y are replaced by x-C and y-C.

>How about an example?
Consider Microsoft vs the S&P 500 (which is our Market).
We look at the daily returns over the last few weeks and get a "best fit" line like Figure 5.
>So Beta = 1.097 ... so MSFT is 1.097 times more volatile, right?
We'll check. The parameters turn out to be:
  • Correlation r = 0.4818
  • SD[MSFT] = 1.520%     SD[S&P] = 0.668%
  • Ratio of Volatilities = SD[MSFT] / SD[S&P] = 1.520 / 0.668 = 2.275
    ... so MSFT was 2.275 times more volatile, over the past few weeks
  • Beta = 1.097 (the slope of the "best fit" line)
    Note that Beta = r SD[MSFT] / SD[S&P] = (0.4818)(2.275) = 1.097

>So why do they say that Beta is a comparison of volatilities?
Probably because it provides a simple explanation ... even though it ain't true


Figure 5
>But if Beta is less than 1, then surely the stock is less volatile than the market, eh?
Think so?

Figure 6 shows the daily returns over the past few weeks for Eastman Kodak versus the S&P.
Beta is less than 1 (it's 0.81) so one might conclude that EK was less volatile than the S&P.

In fact, EK was twice as volatile! Look at that upper chart.

>The ratio of volatilities was 2?
Pretty close. SD[EK] = 1.377% and SD[S&P] = 0.668% and 1.377 / 0.668 = 2.06

In fact, the Correlation between EK and S&P is r = 0.393 which is pretty small (hence decreases the value of Beta). That's why Beta = (0.393) * (Ratio of Volatilites) is smaller than the ratio of volatilities.

>Beta is small ... even tho' the ratio of volatilies is large, eh?
Exactly.

>So why do they say that Beta is a comparison of volatilities?
I give up. Why?


Figure 6
* Beta is usually calculated using Monthly returns ... I think!

See also Capital Asset Pricing Model and CAPM & Sharpe and CAPM spreadsheet and Beta