3.1 Simple Linear Regression


Simple linear regression is a very straightforward approach for predicting a quantitative response $Y$ on the basis of a single predictor variable $X$. It assumes that there is approximately a linear relationship between $X$ and $Y$.

$$ Y \approx \beta_0 + \beta_1X $$

When predicting the $Y$, $β_0$ and $β_1$ are two unknown constants that represent the intercept and slope terms in the linear model. Together, they are known as the model coefficients or parameters. The equation to get the prediction of $Y$ becomes

$$ \hat{y}=\hat{\beta}_0+ \hat{\beta}_1x $$

3.1.1 Estimating the Coefficients

The goal for us is to estimate the coefficients that are as close as possible to the data points. There are several ways to measure this closeness, but the most common method is to minimize the least squares criterion.

Screenshot 2024-09-30 at 6.49.30 PM.png

Let $\hat{y}_i = \hat{\beta}_0+\hat{\beta}_1x_i$ be the prediction for $Y$ based on the $i$th value of $X$. Then $e_i=y_i-\hat{y}_i$ represents the residuals — the distance between the actual data and the predicted data. We define the residual sum of squares (RSS) as follows:

$$ RSS=e^2_1 + e^2_2+\cdots+e^2_n $$

The least squares approach is basically minimizing this $RSS$ with minimizers of:

$$ \hat{\beta}1=\frac{\sum{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i-\bar{x})^2}, $$

$$ \hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x} $$

3.1.2 Assessing the Accuracy of the Coefficient Estimates

Assuming that the true relationship between $X$ and $Y$ takes the form $Y= f (X) + ϵ$ for some unknown function $f$, $f$ can be approximated by linear function of

$$ Y=\beta_0+\beta_1X+\epsilon $$

Screenshot 2024-09-30 at 7.01.19 PM.png

The above graph is the plot of 100 random $X$’s, with $Y$ corresponding as