There are two main reasons that we may wish to estimate $f$ : prediction and inference.
In many situations, a set of inputs X are readily available, but the output $Y$ cannot be easily obtained. In this setting, since the error term averages to zero, we can predict $Y$ using
$$ \hat{Y}=\hat{f}(X) $$
In this setting,$\hat{f}$ is often treated as a black box, in the sense that one is not typically concerned with the exact form of $\hat{f}$ , provided that it yields accurate predictions for $Y$.
The accuracy of $\hat{Y}$ as a prediction for $Y$ depends on two quantities, which we will call the reducible error and the irreducible error.
In general, $\hat{f}$ will not be a perfect estimate for $f$, but this can be “reducible” by using more appropriate statistical method to estimate $f$.
However, we still will have error with a perfect $f$, since
$$ Y=f(x)+\epsilon, $$
where $\epsilon$ is the variability associated with $\epsilon$ that cannot be resolved by predictions.
$$ E(Y-\hat{Y})^2 = E[f(X)+\epsilon - \hat{f}(X)]^2=\underbrace{[f(X)-\hat{f}(X)]^2}\text{Reducible} + \underbrace{Var(\epsilon)}\text{Irreducible} $$
In this situation, we wish to estimate $f$, but our goal is not necessarily to make predictions for $Y$. We want to answer the following questions:
There are broadly 2 types of statistical learning methods: parametric or non-parametric.