Volatility forecasting is a hugely important topic, because volatility is the most basic measure to work with for a
substantial amount of employees in the financial sector. Volatility is our most basic understanding of risk. It is
important because it is used widely in many financial applications, e.g. VaR analyses, expected shortfall etc. It is
of great value for risk managers to be able to know the next day’s volatility, in-order to alter portfolio positions
and calculate the above mentioned measures.
So how do we forecast volatility, when it is of such vital importance? The first step is to find models that fit the
attributes of the volatility. Traditionally, this has been a huge source of research, and many different attempts
have been made. Engle (1982)[8] proposed the ARCH model, which was able to capture time-varying volatility of
returns, i.e. volatility clustering, which is an important attribute of volatility. This inspired Bollerslev (1986)[4] to
develop the more general GARCH model, which has similar attributes. Both of these models, and variations there
of, have been applied widely, also up until this date.
As time has passed more and more data have become available, and nowadays we have high-frequency price data
available for many different kinds of assets, even to the millisecond frequency. Theoretical results in Merton
(1980)[11] and Nelson (1992)[12] suggests that there exists excess information in the higher frequencies that our
models should be able to take advantage of when doing volatility forecasts. Nevertheless, when feeding the highfrequency intraday data directly to the ARCH and GARCH type models their performances are not satisfactory.
This is mainly due to the microstructure noise found in high-frequency financial data, e.g. bid-ask spreads. However,
to circumvent this Andersen et. al. (2003)[1] suggested the use of the realized volatility measure, where the intraday
returns are squared and summed for each day. This is a way to take advantage of the information of the higher
frequencies without being prone to the modeling issues of microstructure noise.
Factor models, or principal component analysis, can have exiting capabilities in financial econometrics, by reason
of their potential to tell us a lot about the latent common components that drive stock prices and volatilities. If we
can pinpoint these hidden common financial attributes, then we should be able to use them to make more accurate
volatility forecasts.
Page 4 of 40
Rasmus Fisker Bang
201405433 Economic Forecasting
Aarhus University
May 13, 2018
2 Data and Realized Measures
The data is the same as used in Bollerslev et. al. (2016)[5] and is made publicly available by Andrew J. Patton.1
The data is obtained from the NYSE TAQ database and consists of very high frequent price data for 27 different
Dow Jones stocks. The data has been transformed to 5 minutes realized volatility from price data. This procedure
has several steps. First the intraday log returns are calculated from the prices, and then the returns are aggregated
to 5-minute intervals. After the aggregation, the realized volatility for a given day, t, can be calculated as follows.
RVt =
Xm
j=1
r
2
j
Here rj is the return of the stock at the j-th intraday interval and m is the number of intraday intervals. For the
case of a 5-minute aggregation m is equal 78 for a given trading day. It might seem counter intuitive to aggregate
the returns, since information is lost, and we want to take advantage of the high-frequent data. However, there is
a trade-off to be made, since very high-frequent data often contains microstructure noise, such as bid-ask spreads,
which we want to remove, and this can be done by aggregation. The choice of 5-minute aggregation might seem
arbitrary, and there definitely exists fancier realized measures, the pre-averaging estimator and the realized kernels
to name a few. Nevertheless, Liu et. al. (2015)[10] concludes that it is difficult to find other realized measures that
significantly out-performs the 5-minute measure in accuracy, hence, that is the one that will be used.
The data spans from the 22nd of April 1997 to 31th of December 2013 and contains 4202 trading days for 27
different Dow Jones stocks. The names of the stocks can be found in Table 1 below.
Ticker Name Ticker Name Ticker Name
AXP American Express IBM IBM NKE Nike
BA Boeing INTC Intel PFE Pfizer
CAT Caterpillar JNJ Johnson & Johnson PG Procter & Gamble
CSCO Cisco JPM JPMorgan Chase TRV Travelers Companies, Inc.
CVX Chevron KO Coca-Cola UNH UnitedHealth
DD Du Pont MCD McDonald’s UTX United Technologies
DIS The Walt Disney Company MMM 3M VZ Verizon
GE General Electric MRK Merck WMT Wal-Mart
HD The Home Depot MSFT Microsoft XOM Exxon Mobil Corporation
Table 1: Stock names and tickers
As seen in the table above the data consists of stocks from various industries, ranging from fast-food restaurants,
health companies, airliners and tech-companies, which means that the correlation between most of the stocks should
not be that large. This might serve as an advantage to the factor model, because if there had been large correlations
between a lot of the stocks, the factor would mostly pick-out that co-movement. With smaller correlations the factor
model should be able to find linear combinations of the stocks with more variance, and thereby greater explanatory
power.