2.1 What is Statistical Learning?

2.1.1 Why Estimate $f$?

There are two main reasons that we may wish to estimate $f$ : prediction and inference.

Prediction

In many situations, a set of inputs X are readily available, but the output $Y$ cannot be easily obtained. In this setting, since the error term averages to zero, we can predict $Y$ using

$$ \hat{Y}=\hat{f}(X) $$

In this setting,$\hat{f}$ is often treated as a black box, in the sense that one is not typically concerned with the exact form of $\hat{f}$ , provided that it yields accurate predictions for $Y$.

The accuracy of $\hat{Y}$ as a prediction for $Y$ depends on two quantities, which we will call the reducible error and the irreducible error.

In general, $\hat{f}$ will not be a perfect estimate for $f$, but this can be “reducible” by using more appropriate statistical method to estimate $f$.
However, we still will have error with a perfect $f$, since

$$ Y=f(x)+\epsilon, $$

where $\epsilon$ is the variability associated with $\epsilon$ that cannot be resolved by predictions.

$$ E(Y-\hat{Y})^2 = E[f(X)+\epsilon - \hat{f}(X)]^2=\underbrace{[f(X)-\hat{f}(X)]^2}\text{Reducible} + \underbrace{Var(\epsilon)}\text{Irreducible} $$

Inference

In this situation, we wish to estimate $f$, but our goal is not necessarily to make predictions for $Y$. We want to answer the following questions:

Which predictors are associated with the response?
What is the relationship between the response and each predictor?
Can the relationship between $Y$ and each predictor be adequately summarized using a linear equation, or is the relationship more complicated?

2.1.2 How Do We Estimate $f$?

There are broadly 2 types of statistical learning methods: parametric or non-parametric.