© Copyright Quantopian Inc.
© Modifications Copyright QuantRocket LLC
Licensed under the Creative Commons Attribution 4.0.
By Delaney Granizo-Mackenzie and Andrei Kirilenko developed as part of the Masters of Finance curriculum at MIT Sloan.
In this tutorial notebook, we'll do the following things:
First we need to import some libraries
import math
import matplotlib.pyplot as plt
import numpy as np
import scipy
import scipy.stats
We'll start by sampling some data from a normal distribution.
TRUE_MEAN = 40
TRUE_STD = 10
X = np.random.normal(TRUE_MEAN, TRUE_STD, 1000)
Now we'll define functions that, given our data, will compute the MLE for the $\mu$ and $\sigma$ parameters of the normal distribution.
Recall that
$$\hat\mu = \frac{1}{T}\sum_{t=1}^{T} x_t$$$$\hat\sigma = \sqrt{\frac{1}{T}\sum_{t=1}^{T}{(x_t - \hat\mu)^2}}$$def normal_mu_MLE(X):
# Get the number of observations
T = len(X)
# Sum the observations
s = sum(X)
return 1.0/T * s
def normal_sigma_MLE(X):
T = len(X)
# Get the mu MLE
mu = normal_mu_MLE(X)
# Sum the square of the differences
s = sum( np.power((X - mu), 2) )
# Compute sigma^2
sigma_squared = 1.0/T * s
return math.sqrt(sigma_squared)
Now let's try our functions out on our sample data and see how they compare to the built-in np.mean
and np.std
print("Mean Estimation")
print(normal_mu_MLE(X))
print(np.mean(X))
print("Standard Deviation Estimation")
print(normal_sigma_MLE(X))
print(np.std(X))
Mean Estimation 40.10868841891504 40.10868841891503 Standard Deviation Estimation 9.967034665881854 9.967034665881858
Now let's estimate both parameters at once with scipy's built in fit()
function.
mu, std = scipy.stats.norm.fit(X)
print("mu estimate:", str(mu))
print("std estimate:", str(std))
mu estimate: 40.10868841891503 std estimate: 9.967034665881858
Now let's plot the distribution PDF along with the data to see how well it fits. We can do that by accessing the pdf provided in scipy.stats.norm.pdf
.
pdf = scipy.stats.norm.pdf
# We would like to plot our data along an x-axis ranging from 0-80 with 80 intervals
# (increments of 1)
x = np.linspace(0, 80, 80)
plt.hist(X, bins=x, density='true')
plt.plot(pdf(x, loc=mu, scale=std))
plt.xlabel('Value')
plt.ylabel('Observed Frequency')
plt.legend(['Fitted Distribution PDF', 'Observed Data', ]);
Let's do the same thing, but for the exponential distribution. We'll start by sampling some data.
TRUE_LAMBDA = 5
X = np.random.exponential(TRUE_LAMBDA, 1000)
numpy
defines the exponential distribution as $$\frac{1}{\lambda}e^{-\frac{x}{\lambda}}$$
So we need to invert the MLE from the lecture notes. There it is
$$\hat\lambda = \frac{T}{\sum_{t=1}^{T} x_t}$$Here it's just the reciprocal, so
$$\hat\lambda = \frac{\sum_{t=1}^{T} x_t}{T}$$def exp_lamda_MLE(X):
T = len(X)
s = sum(X)
return s/T
print("lambda estimate:", str(exp_lamda_MLE(X)))
lambda estimate: 5.239531351645476
# The scipy version of the exponential distribution has a location parameter
# that can skew the distribution. We ignore this by fixing the location
# parameter to 0 with floc=0
_, l = scipy.stats.expon.fit(X, floc=0)
pdf = scipy.stats.expon.pdf
x = range(0, 80)
plt.hist(X, bins=x, density='true')
plt.plot(pdf(x, scale=l))
plt.xlabel('Value')
plt.ylabel('Observed Frequency')
plt.legend(['Fitted Distribution PDF', 'Observed Data', ]);
Now we'll fetch some real returns and try to fit a normal distribution to them using MLE.
from quantrocket.master import get_securities
from quantrocket import get_prices
aapl_sid = get_securities(symbols="AAPL", vendors='usstock').index[0]
prices = get_prices('usstock-free-1min', data_frequency='daily', sids=aapl_sid, fields='Close', start_date='2014-01-01', end_date='2015-01-01')
prices = prices.loc['Close'][aapl_sid]
# This will give us the number of dollars returned each day
absolute_returns = np.diff(prices)
# This will give us the percentage return over the last day's value
# the [:-1] notation gives us all but the last item in the array
# We do this because there are no returns on the final price in the array.
returns = absolute_returns/prices[:-1]
Let's use scipy
's fit function to get the $\mu$ and $\sigma$ MLEs.
mu, std = scipy.stats.norm.fit(returns)
pdf = scipy.stats.norm.pdf
x = np.linspace(-1,1, num=100)
h = plt.hist(returns, bins=x, density='true')
l = plt.plot(x, pdf(x, loc=mu, scale=std))
Of course, this fit is meaningless unless we've tested that they obey a normal distribution first. We can test this using the Jarque-Bera normality test. The Jarque-Bera test will reject the hypothesis of a normal distribution if the p-value is under a c.
from statsmodels.stats.stattools import jarque_bera
jarque_bera(returns)
(866.6489261082853, 6.450254076465886e-189, -0.14337787515083608, 12.098604328561175)
jarque_bera(np.random.normal(0, 1, 100))
(1.844145388244916, 0.3976938884502871, 0.2366822260779503, 2.532537714233625)
This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian") or QuantRocket LLC ("QuantRocket"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, neither Quantopian nor QuantRocket has taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information believed to be reliable at the time of publication. Neither Quantopian nor QuantRocket makes any guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.