Igor Purlantov Gives Tips on Creating a Real Estate Portfolio

Investing in real estate is one of the most profitable investment opportunities. However, creating a solid real estate portfolio requires a lot of research, planning, and patience. Igor Purlantov, a…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Maximum Likelihood For Dummies

Welcome! For the data science enthusiast and layperson alike, this is the first post in a series of posts on Statistical Learning. The emphasis here is on concepts, as well as some snippets of code where necessary. Let’s jump into the first topic: Maximum Likelihood Estimation.

In data science and machine learning, a central problem is the ability to find a parameterization of the sampling distribution of our data. In other words, the dataset is drawn from some unknown distribution, parameterized by an unknown parameter:

We will discuss why this is an important problem, the procedures for solving it, and some connections to powerful machine learning algorithms such as linear regression.

As an example, imagine you were flipping a special coin, with an unknown probability of getting Heads or Tails. In other words, we have the following parameterization:

From our experiences, we expect the parameter to equal 1/2, which means we have equal chance of getting heads or tails (a “fair” coin). However, what if you flipped this magical coin 5 times and observed the following sequence: HHHHT. Would you expect this with an equal probability coin? If you kept flipping this coin and generating more observations, and still saw a disproportionate amount of heads, you would probably start to believe the coin is not fair. In fact, you would believe that P(H) is a lot higher than P(T). This is the core idea behind inferring a parameter from our data observations, where the data is sampled according to some underlying distribution. We would say the likelihood is not high of observing HHHHT. Likelihood can be described as the probability of observing the dataset given a parameterization:

Likelihood Function

The goal of maximum likelihood estimation (MLE) is then to estimate the value of the parameter as the value that maximizes the probability (likelihood) of our data.

Of course, this idea of likelihood extends beyond simple coin flipping. It actually gives us a problem solving procedure for distribution parameter estimation. Let’s outline the process in general, then walk through some examples for a typical Normal distribution, and the coin flipping problem mentioned above:

Let’s walk through an example with each of these steps. Let’s use MLE to estimate the mean:

2. Next, we actually will simplify this likelihood function by taking the logarithm, which generates the log-likelihood function. Remember, the goal is to maximize the likelihood function. Since logarithm is monotonically increasing, if we maximize log of the likelihood, then this automatically maximizes the likelihood (since f(b) > f(a) only when b>a). In practice, we almost always skip directly to maximizing the log-likelihood as a proxy for maximizing the actual likelihood, since they are equivalent problems.

Example: Let’s calculate the log likelihood for the normal distribution example we started earlier:

3. Finally, we remember the good old days in calculus where we have a function and want to find the maximum. This requires taking the derivative and finding where it equals zero. As a note, we could also utilize any numerical optimization technique such as gradient descent. Either way, we have to actually calculate the derivative. The resulting argmax of the optimization problem returns the parameter that maximizes the likelihood.

Example: Let’s continue the Normal distribution example by solving for the optimal mean parameter:

MLE estimate for mean parameter

So, we derived the maximum likelihood estimate for the mean value parameter, and guess what? It turns out to be the empirical mean! It verifies the logic that if we want to place a Gaussian around our data, the most likely Gaussian is centered on the actual mean of the data.

Let’s go back to this example of the coin flip sequence. What value of P(H), which is our unknown parameter, makes the sequence HHHHT the most likely? Let’s follow the procedure:

Coin Flip MLE

So, the maximum likelihood estimate for P(H) is 4/5. This should not be surprising, because our data sequence was HHHHT, so the ratio of heads was 4/5. In other words, if we choose P(H) as 4/5, then we are most likely to produce the data sequence HHHHT if we flip the coin 5 times.

Notice that during the example for a normally distributed dataset we came across something interesting. The maximum likelihood estimation came down to optimizing a log-likelihood function that looked an awful lot like a least-squares error function. This is actually by design! The least squares error function for a linear model is actually the maximum likelihood estimator for the coefficients in the linear regression model. To see this, recall the main assumption of linear regression, which is that the regression labels y are generated according to a linear function plus Gaussian noise:

Linear Regression is MLE

So, the MLE problem on this generating distribution actually is equivalent to minimizing the square error. This was one of the earliest applications of MLE in practice. Similar results hold for logistic regression, where the MLE of the logistic coefficients is equal to a minimizer of the logistic loss function.

We have walked through this powerful statistical parameter estimation tool and outlined the general procedure. It has many uses in machine learning, such as formulation of many regression problems. It’s worth noting that there are times when this MLE procedure fails, such as when the log-likelihood isn’t convex or differentiable. This happens in mixture models where a new procedure called Expectation-Maximization has to be used, and we will expand on that in a continuation of this series on statistical learning.

Add a comment

Related posts:

Hack the Virus 2020

This notebook has been inspired by the hackathon event “Hack the Virus 2020”. It has been written on the weekend of 2020–04–04/2020–04–05. The SARS-CoV-2 virus transformed the Earth into a giant…

Life and Death of a Fine Blade

A poem on the supreme instant. For a split second, you feel more alive than ever, right when a fine cold blade is claiming your soul in a blinding flash.

BitGo to Provide Custody for Crypto Assets Under Bitstamp Management

Digital asset financial services firm BitGo will provide secure storage for crypto assets under management at major cryptocurrency exchange Bitstamp. In a press release on Oct. 9, the…