How To Make Money On TikTok

Hey guys! TikTok has become the latest craze and it’s not just for dancing and lip-syncing anymore. You can actually make some dough on this platform if you know how to play your cards right. I’ve…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




What Can We Learn from Regression Residuals?

Regression Basics

I recently sat in on a seminar about regression applications in business. I was somewhat surprised by what was included and what was left out. Many applied statistics courses/text books put a lot of emphasis (perhaps too much) on the confidence intervals of the coefficient estimates, but do not spend much time explaining the confidence intervals you can construct about a model’s predictions! There is a lot of information we can learn from a regression model’s residuals. The metric we will be focusing on today is the estimated residual standard deviation, σ^ (the ‘hat’ is supposed to be above sigma, but medium does not have strong support for math expressions).

The estimated residual standard deviation does more than determine the R-Squared coefficient. In fact, it can tell us about the accuracy of our predictions. Lets say we regress student’s standardized test scores on family income, and we find that the standard deviation of the residuals is 7 points. This tells us that we can predict a student’s test scores, give or take 7 points. It also helps us construct a confidence interval for our model’s predictions. To explain this further, we have to talk about the sampling distribution of the estimated residual variance, σ^².

The aforementioned sampling distribution is centered around the population residual variance, σ², and resembles a chi squared distribution with number of observations minus number of predictors degrees of freedom. Knowing this, we can assert that 95 percent of the sample residuals should fall within plus or minus 2σ^², and about 67 percent within plus or minus σ^². Lets use python and some real data to see how this knowledge can be useful.

we can access the model’s residuals through statsmodels.formula.api.ols.regressionresults.resid, meaning we can get σ^ like this:

In our case, this command outputs 18.34194887504922 (many more decimal places than necessary). First, lets write a custom prediction function:

From this we get

How confident are we in these predictions? We can find out like this:

It should be mentioned that what we are doing here is assuming that our fitted regression is the true model, not the estimated model. Thus, we are treating σ^ as the standard error of predictions. In reality, the true standard error will likely be a bit higher than σ^.

This method can be useful, especially with financial forecasting using ols. The model does not necessarily have to be linear; you can apply the same technique to exponential or logarithmic models. Anyways, thanks for reading! I hope this information is useful to someone.

Add a comment

Related posts:

Guestimates

Easy one to start with: Let’s guess the number of cars sold in Delhi (Why Delhi Later/on): Let’s think thru the approach first and then build up on it. Easy mode: Population — Income Group of…

Ars Nova

Entre melodias sacras, cantos melismáticos[1] e partituras carregadas de neumas[2], o homem não era visto como um indivíduo, mas sempre como parte de um coletivo. Tivemos, então, a sociedade se…

VeriToken ICO on GoChain

GoChain is excited to announce VeriToken, a blockchain platform that enables users to verify, sell and completely control their own data, will be holding their $30M ICO on GoChain! VeriToken is…