Estimated time 60 minutes plus independent work.
In this lab we will begin to explore time series data and series operations such as smoothing,
differencing, and using lags in a model.
Part 1 Data Smoothing and filtering
- Please open the lab6bdta. These data were obtained from the EIA and U.S. BEA and contain
information on energy consumption by fuel and inflation adjusted GDP expressed as an index
number from 1948 to 2020 - Stata like time series data to be in a table with the rows representing years and the columns
representing the variables in the model. The data from the EIA are already in this format but I
had to join them with the BEA data using the “transpose” function in excel because the BEA
reported GDP with each year being a column and the components of GDP in the rows.
Although I have prepared the data for you already, if you are not familiar with the copy and
transpose function please take a moment to look it up. In either case include a brief instruction
on how to copy and transpose in excel in your writeup for future use. - With the datafile open it is now necessary to tell the software that these are time series data.
This is done using the command .tset , use help to look this command up and write out the basic
command syntax. - Visualize the GDP and total energy consumption. Since these are time series data a scatter plot
may not be very useful. .tsline will generate a line plot , generate line plots of GDP and total
energy consumption then use the .twoway prefix as we did in prior labs to plot GDP and total
energy together. Assess your plots, do either suggest a time trend? Describe the general trend
and the shape of these data trends. What function would best describe them? - Make a log transform of the GDP data to see if it produces a more linear pattern using .gen .
Label your new transformed variable lgdp. Plot this new variable on its own and with total
energy consumption. Based on visual analysis did this transformation provide any useful
insights? - Let us try applying a moving average smoothing transformation to these data using .tssmooth.
Use help to learn the syntax of this command. (You may have noticed the time series commands
start with ts, for time series.) - Create a 5 year moving average of GDP using .tssmooth, call the new smoothed variable sm1
(smooth one ). Compare the smoothed and raw GDP trends. - Try to use one other type of smoother described in your textbook, label it sm2 and compare the
results with sm1.
Part 2 Time Series Regression Intro
Please open the lab6 data in STATA. This file contains a reduced form of some data I generated for my
research. The data are both spatial, by state, AND time series. Remember that data that are time series
and have set identifiers, in this case a spatial id, are called panel data. Note that panels need not be
spatial, if you had data by time and any other grouping it is still a panel. This could be research plots,
population groupings, or cohorts, among many other types of groupings. In this lab we will consider
the special considerations that need to be made when we work with this type of data. - Let us begin by removing the panel aspect of the data by dropping every state except for MA
from the file using .drop and a logical expression. .drop will remove observations that meet a
specific criteria we want to drop if state == MA. Use help to figure out the appropriate syntax. - We are going to be working with only 3 variables, population(n), personal income(aspipc) and
carbon emissions(stateco2emis) Postulate a testable hypothesis concerning this data by applying
concepts from lecture. Run a multivariate regression to test it with .regress Do carbon
emissions increase with income and population size? With a minimal regression what do we
find? - This is not an acceptable way to test this question for reasons that will soon be clear. Produce a
scatter plot of these variables with year on the x axis (list year last in the command) and
describe what you see. - There appears to at least be the potential for a time trend and therefore the data may not be
stationary and may have a unit root. We are going to test for the presence of a unit root using an
Augmented Dickey-Fuller test. We will cover this test in more detail later this semester when
we return to the concept of stationarity again but for the moment just understand that if our data
has a unit root then it posses some trait that makes the values we observe depend on the period
in which they occur. - Declare the data to be time series with the .tset command and generate line plots of our 3
variables of interest. - Again, the test we are going to use is called an Augmented Dickey-Fuller test, or ADF test. The
ADF test has the null hypothesis that there is in fact a unit root in the variable we need to test
each of the variables to test for unit roots individually. The command is .dfuller followed by the
variable name. We can reject the null if our test statistic exceeds the identified critical values,
the software will give us both our test statistic and critical values as well as a p value for the
test. Which variables have unit roots? - Since there are unit roots in our data we need to account for them somehow. Remember from
our last lecture that a simple way to do this is by examining the change in one year to the next
rather than the level in one year or another. This is called the first difference. By adding the
prefix d. to a variable name STATA will use the first difference instead of the level for a given
variable. There is no need to generate a new transformed variable to use differences - Repeat the regression but using first differences for the 3 variables. How do the results change?
What you are seeing in the first simple regression is a spurious regression, we find significance
when in fact there is none. This will happen the majority of the time when an OLS is conducted
on time series data where a unit root is present and it is why ALL statistical analysis of time
series data is suspect if the time dimension is ignored. - With time series modeling we have another tool at our disposal in addition to differencing, lags.
Lags refer to the using past observations for our X variables in our model of present values of Y.
For example we could consider how investments last year impact economic activity this year, or
how consumer behavior last year impacts production decisions this year. It is possible that
perhaps this year's carbon emissions are influenced by last year's population growth or income
levels. (Note that levels refers to the actual value of an observation, growth or change refers to a
difference.) As last year's purchases impact this years energy use. Let's include lagged variables
in our regression to see if this is the case using the l. prefix with our variables. l.gdp would give
me the one period lagged value of GDP , l2.gdp would give us 2 lags etc, etc . Run several
combinations of lags and see what happens to the model. - What is the best model you can produce using lags or other transformations of the variables?
Defend your assessment of your “best” model.
Part 3 Revisiting GDP and Energy use data - Let us return to the lab6b data. Although there are only a few steps in this part, each step
requires a bit of self directed effort and exploration. Use the skills from part two to examine
these data, test for a unitroot in GDP, total energy consumption and one other variable of your
choosing and describe your findings. Don’t forget to tset your data! - Using the information from the prior step build the most appropriate model using your two
energy variables, how do those energy variables relate to GDP? - Now consider possible lagged relationships. Build three different models with varying lag
structures, which one is best, why? Consider the logic of the model you built, does it make
sense for your understanding of energy and GDP? - Next, obtain monthly GDP from the BEA . Select the real GDP as chained dollars (T10106) for
each quarter since 1947. Pull out line 1, total GDP, copy and transpose these data into a new
sheet in the workbook, then bring these data into Stata . Note when manipulating data in excel
be sure to format the cells as numbers prior to copying into Stata to avoid problems.
https://apps.bea.gov/national/Release/XLS/Survey/Section1All_xls.xlsx - Save these data as lab6c.dta
- Tset the data and generate a line plot of the series, do you notice any evidence of cycles or
seasonality in the data when you visualize it? What is the general trend? - Calculate a 3 year moving average of GDP (don’t forget these data are quarterly!) , label this
smoothed series 3ysmgdp. - Plot the 3ysmgdp and the raw data and compare your results. Compare and contrast these
results with those from part 1 using the annual data. How valuable is this transformation for
these data? - Use any other smoothing method from the text or a different moving average window on these
data, compare your results. Which smoothing method was the most useful in elucidating trend
vs noise?
Part 4 Your Project! - To motivate you a bit on your project and practice the more tedious bits of data analysis ; obtain
at least one time series that relates to the topic you are planning on investigating. It can be ANY
related time series. Include a full and proper citation of this series. - Clean it up and get this series into stata
- Tset the data, visualize it, and consider if filtering or smoothing would be useful. If yes do so, if
no, explain why you don’t think it would be helpful. - Test the series to see if it contains a unit root using .dfuller , does it?