QuantRocket logo

© Copyright Quantopian Inc.
© Modifications Copyright QuantRocket LLC
Licensed under the Creative Commons Attribution 4.0.

Disclaimer

Fundamental Factor Models

By Beha Abasi, Maxwell Margenot, and Delaney Mackenzie

What are Fundamental Factor Models?

Fundamental data refers to the metrics and ratios measuring the financial characteristics of companies derived from the public filings made by these companies, such as their income statements and balance sheets. Examples of factors drawn from these documents include market cap, net income growth, and cash flow.

This fundamental data can be used in many ways, one of which is to build a linear factor model. Given a set of $k$ fundamental factors, we can represent the returns of an asset, $R_t$, as follows:

$$R_t = \alpha_t + \beta_{t, F_1}F_1 + \beta_{t, F_2}F_2 + ... + \beta_{t, F_k}F_k + \epsilon_t$$

where each $F_j$ represents a fundamental factor return stream. These return streams are from portfolios whose value is derived from its respective factor.

Fundamental factor models try to determine characteristics that affect an asset's risk and return. The most difficult part of this is determining which factors to use. Much research has been done on determining significant factors, and what makes things even more difficult is that the discovery of a significant factor often leads to its advantage being arbitraged away! This is one of the reasons why fundamental factor models, and linear factor models in general, are so prevalent in modern finance. Once you have found significant factors, you need to calculate the exposure an asset's return stream has to each factor. This is similar to the calculation of risk premia discussed in the CAPM lecture.

In using fundamental data, we run into the problem of having factors that may not be easily compared due to their varying units and magnitudes. To resolve this, we take two different approaches to bring the data onto the same level - portfolio construction to compare return streams and normalization of factor values.

Approach One: Portfolio Construction

The first approach consists of using the fundamental data as a ranking scheme and creating a long-short equity portfolio based on each factor. We then use the return streams associated with each portfolio as our model factors.

One of the most well-known examples of this approach is the Fama-French model. The Fama-French model, and later the Carhart four factor model, adds market cap, book-to-price ratios, and momentum to the original CAPM, which only included market risk.

Historically, certain groups of stocks were seen as outperforming the market, namely those with small market caps, high book-to-price ratios, and those that had previously done well (i.e., they had momentum). Empirically, Fama & French found that the returns of these particular types of stocks tended to be better than what was predicted by the security market line of the CAPM.

In order to capture these phenomena, we will use those factors to create a ranking scheme that will be used in the creation of long short equity portfolios. The factors will be $SMB$, measuring the excess return of small market cap companies minus big, $HML$, measuring the excess return of companies with high book-to-price ratios versus low, $MOM$, measuring the excess returns of last month's winners versus last month's losers, and $EXMRKT$ which is a measure of the market risk.

In general, this approach can be used as an asset pricing model or to hedge our portfolios. The latter uses Fama-Macbeth regressions to calculate risk premia, as demonstrated in the CAPM lecture. Hedging can be achieved through a linear regression of portfolio returns on the returns from the long-short factor portfolios. Below are examples of both.

Portfolio Construction as an Asset Pricing Model

First we import the relevant libraries.

Use pipeline to get all of our factor data that we will use in the rest of the lecture.

Now we can go through the data and build the factor portfolios we want

Now that we've constructed our portfolios, let's look at our performance if we were to hold each one.

Now, as we did in the CAPM lecture, we'll calculate the risk premia on each of these factors using the Fama-Macbeth regressions.

Our asset returns data is asset and date specific, whereas our factor portfolio returns are only date specific. Therefore, we'll need to spread each day's portfolio return across all the assets for which we have data for on that day.

Returns Prediction

As discussed in the CAPM lecture, factor modeling can be used to predict future returns based on current fundamental factors. As well, it could be used to determine when an asset may be mispriced in order to arbitrage the difference, as shown in the CAPM lecture.

Modeling future returns is accomplished by offsetting the returns in the regression, so that rather than predict for current returns, you are predicting for future returns. Once you have a predictive model, the most canonical way to create a strategy is to attempt a long-short equity approach.

Portfolio Construction for Hedging

Once we've determined that we are exposed to a factor, we may want to avoid depending on the performance of that factor by taking out a hedge. This is discussed in more detail in the Beta Hedging and Risk Factor Exposure lectures. The essential idea is to take the exposure your return stream has to a factor, and short the proportional value. So, if your total portfolio value was $V$, and the exposure you calculated to a certain factor return stream was $\beta$, you would short $\beta V$ amount of that return stream.

The following is an example using the Fama-French factors we used before.

Let's look at a graph of our two portfolio return streams

We'll check for normality, homoskedasticity, and autocorrelation in this model. For more information on the tests below, check out the Violations of Regression Models lecture.

For normality, we'll run a Jarque-Bera test, which tests whether our data's skew/kurtosis matches that of a normal distribution. As the standard, we'll reject the null hypothesis that our data is normally distributed if our p-value falls under our confidence level of 5%.

To test for heteroskedasticity, we'll run a Breush-Pagan test, which tests whether the variance of the errors in a linear regression is related to the values of the independent variables. In this case, our null hypothesis is that the data is homoskedastic.

Autocorrelation is tested for using the Durbin-Watson statistic, which looks at the lagged relationship between the errors in a regression. This will give you a number between 0 and 4, with 2 meaning no autocorrelation.

Based on the Jarque-Bera p-value, we would reject the null hypothesis that the data is normally distributed. This means there is strong evidence that our data follows some other distribution.

The test for homoskedasticity suggests that the data is heteroskedastic. However, we need to be careful about this test as we saw that our data may not be normally distributed.

Finally, the Durbin-Watson statistic can be evaluated by looking at the critical values of the statistic. At a confidence level of 95% and 4 explanatory variables, we cannot reject the null hypothesis of no autocorrelation.

Approach Two: Factor Value Normalization

Another approach is to normalize factor values for each day and see how predictive of that day's returns they were. This is also known as cross-sectional factor analysis. We do this by computing a normalized factor value $b_{a,j}$ for each asset $a$ in the following way.

$$b_{a,j} = \frac{F_{a,j} - \mu_{F_j}}{\sigma_{F_j}}$$

$F_{a,j}$ is the value of factor $j$ for asset $a$ during this time, $\mu_{F_j}$ is the mean factor value across all assets, and $\sigma_{F_j}$ is the standard deviation of factor values over all assets. Notice that we are just computing a z-score to make asset specific factor values comparable across different factors.

The exceptions to this formula are indicator variables, which are set to 1 for true and 0 for false. One example is industry membership: the coefficient tells us whether the asset belongs to the industry or not.

After we calculate all of the normalized scores during time $t$, we can estimate factor $j$'s returns $F_{j,t}$, using a cross-sectional regression (i.e. at each time step, we perform a regression using the equations for all of the assets). Specifically, once we have returns for each asset $R_{a,t}$, and normalized factor coefficients $b_{a,j}$, we construct the following model and estimate the $F_j$s and $a_t$

$$R_{a,t} = a_t + b_{a,F_1}F_1 + b_{a, F_2}F_2 + \dots + b_{a, F_K}F_K$$

You can think of this as slicing through the other direction from the first analysis, as now the factor returns are unknowns to be solved for, whereas originally the coefficients were the unknowns. Another way to think about it is that you're determining how predictive of returns the factor was on that day, and therefore how much return you could have squeezed out of that factor.

Following this procedure, we'll get the cross-sectional returns on 2016-11-22, and compute the coefficients for all assets:

We can take the fundamental data we got from the pipeline call above.

Problem: The Data is Weirdly Distributed

Notice how there are big outliers in the dataset that cause the z-scores to lose a lot of information. Basically the presence of some very large outliers causes the rest of the data to occupy a relatively small area. We can get around this issue using some data cleaning techniques, such as winsorization.

Winsorization

Winzorization takes the top $n\%$ of a dataset and sets it all equal to the least extreme value in the top $n\%$. For example, if your dataset ranged from 0-10, plus a few crazy outliers, those outliers would be set to 0 or 10 depending on their direction. The following is an example.

We'll apply the same technique to our data and grab the returns to all the assets in our universe. Then we'll run a linear regression to estimate $F_j$.

To expand this analysis, you would simply loop through days, running this every day and getting an estimated factor return.


Next Lecture: Portfolio Analysis with pyfolio

Back to Introduction


This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian") or QuantRocket LLC ("QuantRocket"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, neither Quantopian nor QuantRocket has taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information believed to be reliable at the time of publication. Neither Quantopian nor QuantRocket makes any guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.