ARIMAX/SARIMAX/VAR
ARIMAX
ARIMAX model assumes that future values of a variable linearly depend on its past values, as well as on the values of past (stochastic) shocks. It is an extended version of the ARIMA model, with other independent (predictor) variables. The model is also referred to as the dynamic regression model. The X added to the end stands for “exogenous”. In other words, it suggests adding a separate different outside variable to help measure our endogenous variable.
The ‘exogenous’ variables added here are COVID-19 case numbers and COVID-19 vaccine rates. The number of COVID-19 cases in US can affect healthcare stock prices, e.g. UNH. Higher case numbers may result in increased demand for healthcare services, such as hospitalizations, treatments, and testing, which could positively impact the stock prices of healthcare companies involved in providing those services. Conversely, lower case numbers may lead to reduced demand for healthcare services, potentially resulting in lower stock prices for healthcare companies.
Vaccine rates, specifically the rate at which a population is vaccinated against COVID-19, can also impact healthcare stock prices. Higher vaccine rates are generally seen as positive for healthcare companies, as vaccines are considered a key tool in controlling the spread of the virus and reducing the severity of illness. Higher vaccine rates may lead to decreased demand for COVID-19 treatments and testing, but increased demand for vaccines and other preventive healthcare measures, which could positively impact the stock prices of healthcare companies involved in vaccine production or distribution, as well as other preventive healthcare services.
In this section, we choose the UNH stock price to fit ARIMAX model with COVID-19 case numbers and COVID-19 vaccine rates.
Step 1: Data Prepare
Show the code
getSymbols("UNH", from="2020-12-16", src="yahoo")
Show the code
<- as.data.frame(UNH)
UNH_df $Date <- rownames(UNH_df)
UNH_df<- UNH_df[c('Date','UNH.Adjusted')]
UNH_df <- UNH_df %>%
UNH_df mutate(Date = as.Date(Date)) %>%
complete(Date = seq.Date(min(Date), max(Date), by="day"))
# fill missing values in stock
<- UNH_df %>% fill(UNH.Adjusted)
UNH_df
<- seq(as.Date('2020-12-16'), as.Date('2023-3-21'),'week')
new_dates
<- UNH_df[which((UNH_df$Date) %in% new_dates),]
UNH_df
<- read.csv('data/vaccine_clean.csv')
vaccine_df
<- seq(as.Date('2020-12-16'), as.Date('2023-3-22'),'week')
new_dates
#vaccine_df
$Date <- as.Date(vaccine_df$Date)
vaccine_df<- vaccine_df[which((vaccine_df$Date) %in% new_dates),]
vaccine_df
#covid_df
<- read.csv('data/covid.csv')
covid_df
#covid_ts <- ts(covid_df$Weekly.Cases, start = c(2020,1,29), frequency = 54)
$Date <- as.Date(covid_df$Date)
covid_df<- covid_df[covid_df$Date >= '2020-12-16'&covid_df$Date < '2023-03-22',]
covid_df
# combine all data, create dataframe
<- data.frame(UNH_df, vaccine_df$total_doses, covid_df$Weekly.Cases)
df colnames(df) <- c('Date', 'stock_price', 'vaccine_dose','covid_case')
::kable(head(df)) knitr
Date | stock_price | vaccine_dose | covid_case |
---|---|---|---|
2020-12-16 | 329.2309 | 160010 | 1526464 |
2020-12-23 | 327.5330 | 577895 | 1499703 |
2020-12-30 | 334.7126 | 848447 | 1299023 |
2021-01-06 | 348.5672 | 1029958 | 1584212 |
2021-01-13 | 344.4632 | 1334188 | 1716354 |
2021-01-20 | 340.3883 | 1614134 | 1391546 |
Step 2: Plotting the Data
Show the code
<-ts(df,star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)
df.ts
autoplot(df.ts[,c(2:4)], facets=TRUE) +
xlab("Date") + ylab("") +
ggtitle("Variables influencing UNH Stock Price in USA")
UNH stock price, Covid, Vaccine values
Step 3: Fitting the model using ’auto.arima()`
Here I’m using auto.arima() function to fit the ARIMAX model. Here we are trying to predict UNH stock price using COVID vaccine dose and COVID cases. All variables are time series and the exogenous variables in this case are vaccine_dose and covid_case.
Show the code
<- cbind(Vac = df.ts[, "vaccine_dose"],
xreg Imp = df.ts[, "covid_case"])
<- auto.arima(df.ts[, "stock_price"], xreg = xreg)
fit summary(fit)
Series: df.ts[, "stock_price"]
Regression with ARIMA(0,1,0) errors
Coefficients:
Vac Imp
0e+00 0e+00
s.e. 1e-04 1e-04
sigma^2 = 177.9: log likelihood = -468.1
AIC=942.2 AICc=942.41 BIC=950.49
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 1.097616 13.16639 10.30662 0.233993 2.273228 0.1046469 0.04011524
Show the code
checkresiduals(fit)
Ljung-Box test
data: Residuals from Regression with ARIMA(0,1,0) errors
Q* = 21.112, df = 24, p-value = 0.6321
Model df: 0. Total lags used: 24
This is an ARIMA model. This is a Regression model with ARIMA(0,1,0) errors.
Step 4: Fitting the model manually
Here we will first have to fit the linear regression model predicting stock price using Covid cases and vaccine doses.
Then for the residuals, we will fit an ARIMA/SARIMA model.
Show the code
$stock_price <- ts(df$stock_price,star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)
df$vaccine_dose <-ts(df$vaccine_dose,star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)
df$covid_case<-ts(df$covid_case,star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)
df
######### First fit the linear model#######
<- lm(stock_price ~ vaccine_dose+covid_case, data=df)
fit.reg summary(fit.reg)
Call:
lm(formula = stock_price ~ vaccine_dose + covid_case, data = df)
Residuals:
Min 1Q Median 3Q Max
-154.24 -42.59 8.62 42.91 82.36
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.954e+02 7.932e+00 62.461 < 2e-16 ***
vaccine_dose -4.341e-05 5.043e-06 -8.607 4.47e-14 ***
covid_case -3.275e-06 5.345e-06 -0.613 0.541
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 52.02 on 115 degrees of freedom
Multiple R-squared: 0.3944, Adjusted R-squared: 0.3839
F-statistic: 37.45 on 2 and 115 DF, p-value: 2.988e-13
Show the code
<-ts(residuals(fit.reg),star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)
res.fit
########## Then look at the residuals ########
acf(res.fit)
Show the code
Pacf(res.fit)
Show the code
%>% diff() %>% ggtsdisplay() res.fit
Show the code
%>% diff() %>% diff(52) %>% ggtsdisplay() res.fit
Finding the model parameters.
Show the code
=1
d=1
i= data.frame()
temp=matrix(rep(NA,6*23),nrow=23) # roughly nrow = 3x4x2
ls
for (p in 1:3)# p=1,2,
{for(q in 1:3)# q=1,2,
{for(d in 0:1)#
{
if(p-1+d+q-1<=8)
{
<- Arima(res.fit,order=c(p-1,d,q-1),include.drift=TRUE)
model= c(p-1,d,q-1,model$aic,model$bic,model$aicc)
ls[i,]=i+1
i#print(i)
}
}
}
}
= as.data.frame(ls)
tempnames(temp)= c("p","d","q","AIC","BIC","AICc")
#temp
::kable(temp) knitr
p | d | q | AIC | BIC | AICc |
---|---|---|---|---|---|
0 | 0 | 0 | 1217.4187 | 1225.7307 | 1217.6292 |
0 | 1 | 0 | 993.5902 | 999.1146 | 993.6955 |
0 | 0 | 1 | 1117.7725 | 1128.8552 | 1118.1264 |
0 | 1 | 1 | 993.1256 | 1001.4121 | 993.3380 |
0 | 0 | 2 | 1058.3969 | 1072.2503 | 1058.9326 |
0 | 1 | 2 | 994.2046 | 1005.2533 | 994.5617 |
1 | 0 | 0 | 1005.4074 | 1016.4901 | 1005.7614 |
1 | 1 | 0 | 992.7214 | 1001.0079 | 992.9338 |
1 | 0 | 1 | 1003.8974 | 1017.7508 | 1004.4331 |
1 | 1 | 1 | 994.2405 | 1005.2892 | 994.5976 |
1 | 0 | 2 | 1003.9563 | 1020.5804 | 1004.7130 |
1 | 1 | 2 | 995.7980 | 1009.6089 | 996.3386 |
2 | 0 | 0 | 1002.8800 | 1016.7334 | 1003.4157 |
2 | 1 | 0 | 994.1790 | 1005.2277 | 994.5362 |
2 | 0 | 1 | 999.6163 | 1016.2404 | 1000.3730 |
2 | 1 | 1 | 995.8971 | 1009.7080 | 996.4376 |
2 | 0 | 2 | 1005.7699 | 1025.1647 | 1006.7881 |
2 | 1 | 2 | 995.6381 | 1012.2111 | 996.4017 |
NA | NA | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
Show the code
print('Minimum AIC')
[1] "Minimum AIC"
Show the code
which.min(temp$AIC),] temp[
p d q AIC BIC AICc
8 1 1 0 992.7214 1001.008 992.9338
Show the code
print('Minimum BIC')
[1] "Minimum BIC"
Show the code
which.min(temp$BIC),] temp[
p d q AIC BIC AICc
2 0 1 0 993.5902 999.1146 993.6955
Show the code
print('Minimum AICc')
[1] "Minimum AICc"
Show the code
which.min(temp$AICc),] temp[
p d q AIC BIC AICc
8 1 1 0 992.7214 1001.008 992.9338
Show the code
set.seed(1234)
<- capture.output(sarima(res.fit, 0,1,0)) model_output12
Show the code
cat(model_output12[9:38], model_output12[length(model_output12)], sep = "\n")
$fit
Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
REPORT = 1, reltol = tol))
Coefficients:
constant
1.0885
s.e. 1.5357
sigma^2 estimated as 275.9: log likelihood = -494.8, aic = 993.59
$degrees_of_freedom
[1] 116
$ttable
Estimate SE t.value p.value
constant 1.0885 1.5357 0.7088 0.4799
$AIC
[1] 8.492224
$AICc
[1] 8.492522
$BIC
[1] 8.539441
Show the code
set.seed(1234)
<- capture.output(sarima(res.fit, 1,1,0)) model_output13
Show the code
cat(model_output13[16:46], model_output13[length(model_output13)], sep = "\n")
$fit
Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
REPORT = 1, reltol = tol))
Coefficients:
ar1 constant
0.1556 1.1047
s.e. 0.0913 1.7935
sigma^2 estimated as 269.2: log likelihood = -493.36, aic = 992.72
$degrees_of_freedom
[1] 115
$ttable
Estimate SE t.value p.value
ar1 0.1556 0.0913 1.7049 0.0909
constant 1.1047 1.7935 0.6160 0.5391
$AIC
[1] 8.484798
$AICc
[1] 8.485698
$BIC
[1] 8.555623
ARIMA(0,1,0) and ARIMA(1,1,0) both look okay.
Step 5: Using Cross Validation
Show the code
<- 36 # minimum data length for fitting a model
k <- length(res.fit)
n -k # rest of the observations n
[1] 82
Show the code
=1
i= c()
err1 = c()
err2
<- c()
rmse1 <- c()
rmse2
for(i in 1:(n-k))
{<- res.fit[1:(k-1)+i] #observations from 1 to 75
xtrain <- res.fit[k+i] #76th observation as the test set
xtest
<- Arima(xtrain, order=c(1,1,0),include.drift=FALSE, method="ML")
fit <- forecast(fit, h=1)
fcast1
<- Arima(xtrain, order=c(0,1,0),include.drift=FALSE, method="ML")
fit2 <- forecast(fit2, h=1)
fcast2
#capture error for each iteration
# This is mean absolute error
= c(err1, abs(fcast1$mean-xtest))
err1 = c(err2, abs(fcast2$mean-xtest))
err2
# This is mean squared error
= c(err1, (fcast1$mean-xtest)^2)
err3 = c(err2, (fcast2$mean-xtest)^2)
err4
<- c(rmse1, sqrt((fcast1$mean-xtest)^2))
rmse1 <- c(rmse2, sqrt((fcast2$mean-xtest)^2))
rmse2
}
MAE1=mean(err1)) # This is mean absolute error (
[1] 13.45554
Show the code
MAE2=mean(err2)) #has slightly higher error (
[1] 13.46804
Show the code
=mean(err1) #fit 1,1,0
MSE1=mean(err2)#fit 0,1,0
MSE2
MSE1
[1] 13.45554
Show the code
MSE2
[1] 13.46804
Show the code
<- data.frame(rmse1,rmse2)
rmse_df $x <- as.numeric(rownames(rmse_df))
rmse_df
plot(rmse_df$x, rmse_df$rmse1, type = 'l', col=2, xlab="horizon", ylab="RMSE")
lines(rmse_df$x, rmse_df$rmse2, type="l",col=3)
legend("topleft",legend=c("fit1","fit2"),col=2:4,lty=1)
Step 6: forcasting
Show the code
<-auto.arima(df$vaccine_dose) #fiting an ARIMA model to the vaccine_dose variable
vac_fitsummary(vac_fit)
Series: df$vaccine_dose
ARIMA(1,1,1)
Coefficients:
ar1 ma1
0.8341 -0.6298
s.e. 0.0916 0.1128
sigma^2 = 4.824e+10: log likelihood = -1604.18
AIC=3214.36 AICc=3214.57 BIC=3222.65
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -1592.486 216829.1 132002.8 0.0009291879 13.08388 0.1149613
ACF1
Training set -0.06829849
Show the code
<-forecast(vac_fit)
fvac
<-auto.arima(df$covid_case) #fiting an ARIMA model to the covid_case variable
covid_fitsummary(covid_fit)
Series: df$covid_case
ARIMA(2,0,2) with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 mean
1.3306 -0.5128 0.8165 0.4848 733469.2
s.e. 0.1153 0.1117 0.1363 0.1076 189902.4
sigma^2 = 2.904e+10: log likelihood = -1588.94
AIC=3189.87 AICc=3190.63 BIC=3206.5
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -424.0988 166757.6 99191.78 -6.959882 17.02194 0.1095754
ACF1
Training set -0.00122146
Show the code
<-forecast(covid_fit)
fcov
<- cbind(Vac = fvac$mean,
fxreg Cov = fcov$mean)
<- forecast(fit, xreg=fxreg) #fimp$mean gives the forecasted values fcast
Warning in forecast.forecast_ARIMA(fit, xreg = fxreg): xreg not required by this
model, ignoring the provided regressors
Show the code
autoplot(fcast) + xlab("Date") +
ylab("Price")
Discussion
Based on the result above, the number of daily COVID-19 vaccination number is not significant on predicting the UNH stock price, while the number of COVID-19 cases is. Though the number of COVID-19 vaccination number is not significant here, it does not mean that it does not affect healthcare stock price. While UNH is a healthcare company that is involved in health insurance, healthcare services, and technology solutions, it is not directly involved in COVID-19 vaccine development or production. Therefore, the number of COVID-19 vaccine doses administered may not have a direct impact on UNH’s operations or revenue.
VAR
VAR models (vector autoregressive models) are used for multivariate time series. The structure is that each variable is a linear function of past lags of itself and past lags of the other variables. A Vector autoregressive (VAR) model is useful when one is interested in predicting multiple time series variables using a single model.
The variables we are interested here are GDP and Unemployment rate in US. The overall health of the economy, as reflected by the GDP, can influence healthcare stock prices. A strong GDP generally indicates a healthy economy with higher levels of consumer spending, business investment, and economic growth. In such an environment, healthcare companies may experience increased demand for their products and services, which could positively impact their stock prices. Conversely, a weak GDP may signal an economic slowdown or recession, which could result in reduced demand for healthcare products and services, potentially leading to lower stock prices for healthcare companies. The unemployment rate, which reflects the percentage of the labor force that is unemployed, can also affect healthcare stock prices. A low unemployment rate is generally indicative of a strong labor market, with more people employed and potentially having access to employer-sponsored healthcare benefits. This may result in increased demand for healthcare services, positively impacting healthcare stock prices. On the other hand, a high unemployment rate may indicate a weak labor market, with more people losing their jobs and potentially losing access to healthcare benefits, leading to reduced demand for healthcare services and potentially lower stock prices for healthcare companies.
In this section, I’m fitting a VAR model to find multivariate relationship between the series UNH stock price, GDP, and Employment.
Step 1: Data Preparing
Show the code
getSymbols("UNH", from="1989-12-29", src="yahoo")
[1] "UNH"
Show the code
<- as.data.frame(UNH)
UNH_df $Date <- rownames(UNH_df)
UNH_df<- UNH_df[c('Date','UNH.Adjusted')]
UNH_df <- UNH_df %>%
UNH_df mutate(Date = as.Date(Date)) %>%
complete(Date = seq.Date(min(Date), max(Date), by="day"))
# fill missing values in stock
<- UNH_df %>% fill(UNH.Adjusted)
UNH_df #UNH_df
<- seq(as.Date('1990-01-01'), as.Date('2022-10-01'),'quarter')
new_dates #new_dates
<- UNH_df[which((UNH_df$Date) %in% new_dates),]
UNH_df
<- read.csv('data/GDP59.CSV')
gdp $DATE <- as.Date(gdp$DATE)
gdp<- gdp[which((gdp$DATE) %in% new_dates),]
gdp
<- read.csv('data/Employment59.CSV')
emp $DATE <- as.Date(emp$DATE)
emp<- emp[which((emp$DATE) %in% new_dates),] emp
Show the code
<- data.frame(UNH_df,gdp,emp)
dd <- dd[,c(1,2,4,6)]
dd colnames(dd) <- c('DATE', 'stock_price','GDP','Employment')
::kable(head(dd)) knitr
DATE | stock_price | GDP | Employment | |
---|---|---|---|---|
125 | 1990-01-01 | 0.308110 | 5872.701 | 109418.7 |
126 | 1990-04-01 | 0.247759 | 5960.028 | 109786.7 |
127 | 1990-07-01 | 0.438343 | 6015.116 | 109651.0 |
128 | 1990-10-01 | 0.435356 | 6004.733 | 109255.0 |
129 | 1991-01-01 | 0.591069 | 6035.178 | 108783.0 |
130 | 1991-04-01 | 0.985114 | 6126.862 | 108312.3 |
Step 2: Plotting the data
Show the code
<-ts(dd,star=decimal_date(as.Date("1990-01-01",format = "%Y-%m-%d")),frequency = 4)
dd.ts
autoplot(dd.ts[,c(2:4)], facets=TRUE) +
xlab("Year") + ylab("") +
ggtitle("UNH Stock Price, GDP and Employment in USA")
Step 3: Fitting a VAR model
Show the code
VARselect(dd[, c(2:4)], lag.max=10, type="both")
$selection
AIC(n) HQ(n) SC(n) FPE(n)
10 9 9 9
$criteria
1 2 3 4 5
AIC(n) 2.941404e+01 2.865601e+01 2.849637e+01 2.814189e+01 2.812288e+01
HQ(n) 2.955407e+01 2.888005e+01 2.880444e+01 2.853398e+01 2.859898e+01
SC(n) 2.975880e+01 2.920762e+01 2.925484e+01 2.910721e+01 2.929506e+01
FPE(n) 5.948597e+12 2.788633e+12 2.379347e+12 1.671830e+12 1.644336e+12
6 7 8 9 10
AIC(n) 2.758294e+01 2.723307e+01 2.696700e+01 2.669062e+01 2.668670e+01
HQ(n) 2.814305e+01 2.787720e+01 2.769516e+01 2.750279e+01 2.758289e+01
SC(n) 2.896196e+01 2.881895e+01 2.875974e+01 2.869021e+01 2.889314e+01
FPE(n) 9.616264e+11 6.809549e+11 5.251270e+11 4.014819e+11 4.038690e+11
It’s clear that according to selection criteria p=10 and 9 are good.
I’m fitting several models with p=1(for simplicity), 5, and 9.=> VAR(1), VAR(5), VAR(9)
Show the code
summary(VAR(dd[, c(2:4)], p=1, type='both'))
VAR Estimation Results:
=========================
Endogenous variables: stock_price, GDP, Employment
Deterministic variables: both
Sample size: 131
Log Likelihood: -2461.36
Roots of the characteristic polynomial:
1.028 0.9422 0.78
Call:
VAR(y = dd[, c(2:4)], p = 1, type = "both")
Estimation results for equation stock_price:
============================================
stock_price = stock_price.l1 + GDP.l1 + Employment.l1 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 1.0341243 0.0394781 26.195 <2e-16 ***
GDP.l1 -0.0005115 0.0039797 -0.129 0.8979
Employment.l1 -0.0006163 0.0003038 -2.029 0.0446 *
const 69.2145818 34.7475964 1.992 0.0485 *
trend 0.3006699 0.4555344 0.660 0.5104
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.62 on 126 degrees of freedom
Multiple R-Squared: 0.9849, Adjusted R-squared: 0.9844
F-statistic: 2057 on 4 and 126 DF, p-value: < 2.2e-16
Estimation results for equation GDP:
====================================
GDP = stock_price.l1 + GDP.l1 + Employment.l1 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 2.55503 0.64584 3.956 0.000126 ***
GDP.l1 0.79020 0.06511 12.137 < 2e-16 ***
Employment.l1 -0.01069 0.00497 -2.151 0.033349 *
const 2355.16290 568.45522 4.143 6.24e-05 ***
trend 27.89290 7.45234 3.743 0.000275 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 239.2 on 126 degrees of freedom
Multiple R-Squared: 0.998, Adjusted R-squared: 0.998
F-statistic: 1.607e+04 on 4 and 126 DF, p-value: < 2.2e-16
Estimation results for equation Employment:
===========================================
Employment = stock_price.l1 + GDP.l1 + Employment.l1 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 6.183e+00 4.862e+00 1.272 0.20584
GDP.l1 -5.610e-01 4.901e-01 -1.145 0.25458
Employment.l1 9.253e-01 3.741e-02 24.733 < 2e-16 ***
const 1.161e+04 4.279e+03 2.712 0.00762 **
trend 8.544e+01 5.610e+01 1.523 0.13028
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1801 on 126 degrees of freedom
Multiple R-Squared: 0.9786, Adjusted R-squared: 0.9779
F-statistic: 1441 on 4 and 126 DF, p-value: < 2.2e-16
Covariance matrix of residuals:
stock_price GDP Employment
stock_price 213.8 1386 12148
GDP 1385.8 57224 396720
Employment 12147.6 396720 3242998
Correlation matrix of residuals:
stock_price GDP Employment
stock_price 1.0000 0.3962 0.4613
GDP 0.3962 1.0000 0.9209
Employment 0.4613 0.9209 1.0000
Show the code
summary(VAR(dd[, c(2:4)], p=5, type='both'))
VAR Estimation Results:
=========================
Endogenous variables: stock_price, GDP, Employment
Deterministic variables: both
Sample size: 127
Log Likelihood: -2267.863
Roots of the characteristic polynomial:
1.054 0.9435 0.9435 0.8781 0.8781 0.7952 0.7952 0.7806 0.7806 0.7709 0.7709 0.6959 0.4107 0.4107 0.3355
Call:
VAR(y = dd[, c(2:4)], p = 5, type = "both")
Estimation results for equation stock_price:
============================================
stock_price = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 5.454e-01 1.048e-01 5.204 9.13e-07 ***
GDP.l1 6.007e-02 1.789e-02 3.358 0.00108 **
Employment.l1 -7.709e-03 2.385e-03 -3.232 0.00162 **
stock_price.l2 2.323e-01 1.468e-01 1.583 0.11639
GDP.l2 -6.188e-02 2.695e-02 -2.297 0.02353 *
Employment.l2 8.708e-03 4.117e-03 2.115 0.03668 *
stock_price.l3 -6.701e-02 1.636e-01 -0.410 0.68293
GDP.l3 1.256e-02 2.614e-02 0.480 0.63198
Employment.l3 -3.778e-03 4.122e-03 -0.917 0.36136
stock_price.l4 2.159e-01 1.796e-01 1.202 0.23192
GDP.l4 -1.114e-02 2.275e-02 -0.490 0.62525
Employment.l4 2.759e-03 3.733e-03 0.739 0.46146
stock_price.l5 2.480e-01 1.956e-01 1.268 0.20751
GDP.l5 -1.060e-02 1.379e-02 -0.769 0.44375
Employment.l5 -6.394e-04 2.197e-03 -0.291 0.77155
const 1.189e+02 3.723e+01 3.194 0.00183 **
trend 1.493e+00 5.610e-01 2.661 0.00897 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.06 on 110 degrees of freedom
Multiple R-Squared: 0.9909, Adjusted R-squared: 0.9896
F-statistic: 750.5 on 16 and 110 DF, p-value: < 2.2e-16
Estimation results for equation GDP:
====================================
GDP = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 -1.30445 1.54245 -0.846 0.399555
GDP.l1 1.92765 0.26327 7.322 4.28e-11 ***
Employment.l1 -0.18525 0.03510 -5.277 6.65e-07 ***
stock_price.l2 7.85075 2.15990 3.635 0.000425 ***
GDP.l2 -0.49800 0.39656 -1.256 0.211852
Employment.l2 0.14315 0.06059 2.363 0.019907 *
stock_price.l3 1.97665 2.40806 0.821 0.413510
GDP.l3 -0.68441 0.38474 -1.779 0.078023 .
Employment.l3 0.07846 0.06066 1.293 0.198571
stock_price.l4 -3.50097 2.64351 -1.324 0.188127
GDP.l4 -0.53338 0.33481 -1.593 0.114012
Employment.l4 0.03338 0.05493 0.608 0.544643
stock_price.l5 -4.52830 2.87918 -1.573 0.118643
GDP.l5 0.62554 0.20300 3.081 0.002603 **
Employment.l5 -0.07633 0.03233 -2.361 0.019978 *
const 1593.63764 547.91857 2.909 0.004394 **
trend 21.22796 8.25642 2.571 0.011473 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 177.5 on 110 degrees of freedom
Multiple R-Squared: 0.999, Adjusted R-squared: 0.9988
F-statistic: 6815 on 16 and 110 DF, p-value: < 2.2e-16
Estimation results for equation Employment:
===========================================
Employment = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 -3.786e+01 1.190e+01 -3.182 0.00190 **
GDP.l1 6.787e+00 2.030e+00 3.343 0.00113 **
Employment.l1 -1.385e-02 2.707e-01 -0.051 0.95928
stock_price.l2 7.770e+01 1.666e+01 4.664 8.77e-06 ***
GDP.l2 -2.705e+00 3.058e+00 -0.885 0.37831
Employment.l2 6.207e-01 4.673e-01 1.328 0.18686
stock_price.l3 4.014e+00 1.857e+01 0.216 0.82927
GDP.l3 -8.103e+00 2.967e+00 -2.731 0.00736 **
Employment.l3 1.098e+00 4.678e-01 2.346 0.02075 *
stock_price.l4 -1.641e+01 2.039e+01 -0.805 0.42269
GDP.l4 -1.499e+00 2.582e+00 -0.581 0.56270
Employment.l4 -1.900e-01 4.237e-01 -0.448 0.65470
stock_price.l5 -2.700e+01 2.221e+01 -1.216 0.22657
GDP.l5 4.674e+00 1.566e+00 2.985 0.00349 **
Employment.l5 -5.865e-01 2.493e-01 -2.352 0.02044 *
const 1.222e+04 4.226e+03 2.891 0.00462 **
trend 1.203e+02 6.368e+01 1.890 0.06146 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1369 on 110 degrees of freedom
Multiple R-Squared: 0.9879, Adjusted R-squared: 0.9861
F-statistic: 560.4 on 16 and 110 DF, p-value: < 2.2e-16
Covariance matrix of residuals:
stock_price GDP Employment
stock_price 145.5 455.4 5503
GDP 455.4 31506.6 225145
Employment 5503.3 225144.8 1874014
Correlation matrix of residuals:
stock_price GDP Employment
stock_price 1.0000 0.2127 0.3333
GDP 0.2127 1.0000 0.9266
Employment 0.3333 0.9266 1.0000
Show the code
summary(VAR(dd[, c(2:4)], p=9, type='both'))
VAR Estimation Results:
=========================
Endogenous variables: stock_price, GDP, Employment
Deterministic variables: both
Sample size: 123
Log Likelihood: -2076.371
Roots of the characteristic polynomial:
1.058 1.051 1.018 1.018 1.012 1.012 1.012 1.012 1.005 1.005 1.003 1.003 0.9793 0.9793 0.9726 0.9726 0.8256 0.8256 0.8098 0.8098 0.7926 0.7926 0.6662 0.6662 0.4868 0.4868 0.0237
Call:
VAR(y = dd[, c(2:4)], p = 9, type = "both")
Estimation results for equation stock_price:
============================================
stock_price = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + stock_price.l6 + GDP.l6 + Employment.l6 + stock_price.l7 + GDP.l7 + Employment.l7 + stock_price.l8 + GDP.l8 + Employment.l8 + stock_price.l9 + GDP.l9 + Employment.l9 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 0.6177986 0.1140057 5.419 4.61e-07 ***
GDP.l1 0.0415175 0.0139527 2.976 0.003717 **
Employment.l1 -0.0069645 0.0020733 -3.359 0.001131 **
stock_price.l2 0.3603716 0.1652332 2.181 0.031680 *
GDP.l2 -0.0157162 0.0211664 -0.743 0.459633
Employment.l2 0.0048987 0.0033424 1.466 0.146093
stock_price.l3 0.0177119 0.1607489 0.110 0.912499
GDP.l3 -0.0009781 0.0212811 -0.046 0.963439
Employment.l3 -0.0018913 0.0034992 -0.540 0.590135
stock_price.l4 -0.1568989 0.1711382 -0.917 0.361596
GDP.l4 -0.0352143 0.0218848 -1.609 0.110953
Employment.l4 0.0072039 0.0035652 2.021 0.046168 *
stock_price.l5 0.3383735 0.2044761 1.655 0.101294
GDP.l5 -0.0156536 0.0222246 -0.704 0.482964
Employment.l5 0.0016545 0.0036673 0.451 0.652934
stock_price.l6 0.1416245 0.2112799 0.670 0.504299
GDP.l6 -0.0184066 0.0221573 -0.831 0.408236
Employment.l6 -0.0012687 0.0036506 -0.348 0.728965
stock_price.l7 -0.9691876 0.2444899 -3.964 0.000144 ***
GDP.l7 0.0190303 0.0219980 0.865 0.389190
Employment.l7 -0.0017353 0.0035712 -0.486 0.628163
stock_price.l8 1.0643895 0.2599676 4.094 8.96e-05 ***
GDP.l8 0.0123808 0.0212167 0.584 0.560927
Employment.l8 -0.0058935 0.0033845 -1.741 0.084893 .
stock_price.l9 -0.2681893 0.2116088 -1.267 0.208149
GDP.l9 0.0063637 0.0130395 0.488 0.626663
Employment.l9 0.0035409 0.0021429 1.652 0.101786
const 79.8647702 39.9369777 2.000 0.048411 *
trend 0.9524397 0.5617195 1.696 0.093276 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.687 on 94 degrees of freedom
Multiple R-Squared: 0.9959, Adjusted R-squared: 0.9947
F-statistic: 819 on 28 and 94 DF, p-value: < 2.2e-16
Estimation results for equation GDP:
====================================
GDP = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + stock_price.l6 + GDP.l6 + Employment.l6 + stock_price.l7 + GDP.l7 + Employment.l7 + stock_price.l8 + GDP.l8 + Employment.l8 + stock_price.l9 + GDP.l9 + Employment.l9 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 -5.605e+00 1.777e+00 -3.154 0.00217 **
GDP.l1 1.818e+00 2.175e-01 8.360 5.58e-13 ***
Employment.l1 -1.497e-01 3.232e-02 -4.631 1.17e-05 ***
stock_price.l2 1.352e+01 2.576e+00 5.250 9.43e-07 ***
GDP.l2 -7.098e-01 3.300e-01 -2.151 0.03403 *
Employment.l2 1.262e-01 5.211e-02 2.422 0.01734 *
stock_price.l3 -1.116e+00 2.506e+00 -0.445 0.65722
GDP.l3 -2.534e-01 3.318e-01 -0.764 0.44683
Employment.l3 3.585e-02 5.455e-02 0.657 0.51262
stock_price.l4 1.673e+00 2.668e+00 0.627 0.53208
GDP.l4 -8.506e-02 3.412e-01 -0.249 0.80365
Employment.l4 1.384e-02 5.558e-02 0.249 0.80384
stock_price.l5 -7.495e-01 3.188e+00 -0.235 0.81462
GDP.l5 1.171e-01 3.465e-01 0.338 0.73604
Employment.l5 -2.497e-02 5.717e-02 -0.437 0.66327
stock_price.l6 -1.389e+01 3.294e+00 -4.216 5.71e-05 ***
GDP.l6 2.705e-02 3.454e-01 0.078 0.93775
Employment.l6 -9.452e-03 5.691e-02 -0.166 0.86846
stock_price.l7 -1.613e+00 3.812e+00 -0.423 0.67319
GDP.l7 -2.916e-01 3.429e-01 -0.850 0.39727
Employment.l7 7.971e-02 5.567e-02 1.432 0.15551
stock_price.l8 7.701e+00 4.053e+00 1.900 0.06049 .
GDP.l8 9.947e-02 3.308e-01 0.301 0.76428
Employment.l8 -1.132e-01 5.276e-02 -2.144 0.03457 *
stock_price.l9 1.073e+00 3.299e+00 0.325 0.74571
GDP.l9 1.156e-01 2.033e-01 0.569 0.57100
Employment.l9 3.477e-02 3.341e-02 1.041 0.30064
const 1.603e+03 6.226e+02 2.575 0.01159 *
trend 2.134e+01 8.757e+00 2.437 0.01669 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 135.4 on 94 degrees of freedom
Multiple R-Squared: 0.9995, Adjusted R-squared: 0.9993
F-statistic: 6211 on 28 and 94 DF, p-value: < 2.2e-16
Estimation results for equation Employment:
===========================================
Employment = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + stock_price.l6 + GDP.l6 + Employment.l6 + stock_price.l7 + GDP.l7 + Employment.l7 + stock_price.l8 + GDP.l8 + Employment.l8 + stock_price.l9 + GDP.l9 + Employment.l9 + const + trend
Estimate Std. Error t value Pr(>|t|)
stock_price.l1 -6.801e+01 1.216e+01 -5.595 2.17e-07 ***
GDP.l1 6.439e+00 1.488e+00 4.328 3.75e-05 ***
Employment.l1 1.372e-01 2.211e-01 0.621 0.53643
stock_price.l2 1.137e+02 1.762e+01 6.453 4.74e-09 ***
GDP.l2 -4.116e+00 2.257e+00 -1.824 0.07136 .
Employment.l2 5.335e-01 3.564e-01 1.497 0.13778
stock_price.l3 -4.159e+00 1.714e+01 -0.243 0.80882
GDP.l3 -3.838e+00 2.269e+00 -1.691 0.09408 .
Employment.l3 5.003e-01 3.731e-01 1.341 0.18318
stock_price.l4 1.066e+01 1.825e+01 0.584 0.56038
GDP.l4 -6.501e-02 2.334e+00 -0.028 0.97783
Employment.l4 1.168e-01 3.802e-01 0.307 0.75925
stock_price.l5 2.259e+01 2.180e+01 1.036 0.30277
GDP.l5 8.889e-01 2.370e+00 0.375 0.70845
Employment.l5 -2.821e-01 3.911e-01 -0.721 0.47245
stock_price.l6 -1.130e+02 2.253e+01 -5.017 2.48e-06 ***
GDP.l6 -1.567e+00 2.363e+00 -0.663 0.50891
Employment.l6 2.161e-01 3.893e-01 0.555 0.58019
stock_price.l7 -2.070e+01 2.607e+01 -0.794 0.42930
GDP.l7 -1.096e+00 2.346e+00 -0.467 0.64133
Employment.l7 4.753e-01 3.808e-01 1.248 0.21512
stock_price.l8 8.864e+01 2.772e+01 3.198 0.00189 **
GDP.l8 1.933e+00 2.262e+00 0.854 0.39515
Employment.l8 -1.204e+00 3.609e-01 -3.336 0.00122 **
stock_price.l9 -3.272e+01 2.256e+01 -1.450 0.15042
GDP.l9 8.076e-01 1.390e+00 0.581 0.56273
Employment.l9 4.222e-01 2.285e-01 1.848 0.06777 .
const 1.262e+04 4.259e+03 2.963 0.00386 **
trend 9.703e+01 5.990e+01 1.620 0.10859
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 926.4 on 94 degrees of freedom
Multiple R-Squared: 0.9945, Adjusted R-squared: 0.9928
F-statistic: 603.2 on 28 and 94 DF, p-value: < 2.2e-16
Covariance matrix of residuals:
stock_price GDP Employment
stock_price 75.47 290.5 2750
GDP 290.52 18342.3 112144
Employment 2749.66 112143.8 858125
Correlation matrix of residuals:
stock_price GDP Employment
stock_price 1.0000 0.2469 0.3417
GDP 0.2469 1.0000 0.8939
Employment 0.3417 0.8939 1.0000
Step 4: Using Cross Validation
Show the code
=length(dd$stock_price)
n=39
k
#n-k=92; 92/4=23;
<- matrix(NA, 96,3)
rmse1 <- matrix(NA, 96,3)
rmse2 <- matrix(NA,23,4)
rmse3 <-c()
year
# Convert data frame to time series object
<- ts(dd[, c(2:4)], star=decimal_date(as.Date("1990-01-01",format = "%Y-%m-%d")),frequency = 4)
ts_obj
<- tsp(ts_obj )[1]+(k-1)/4
st
for(i in 1:23)
{
<- window(ts_obj, end=st + i-1)
xtrain <- window(ts_obj, start=st + (i-1) + 1/4, end=st + i)
xtest
<- VAR(ts_obj, p=5, type='both')
fit <- predict(fit, n.ahead = 4)
fcast
<-fcast$fcst$GDP
fgdp<-fcast$fcst$Employment
femp<-fcast$fcst$stock_price
fsp<-data.frame(fsp[,1],fgdp[,1],femp[,1])
ff
<-st + (i-1) + 1/4
year
<-ts(ff,start=c(year,1),frequency = 4)
ff
= 4*i-3
a = 4*i
bc(a:b),] <-sqrt((ff-xtest)^2)
rmse1[
<- VAR(ts_obj, p=9, type='both')
fit2 <- predict(fit2, n.ahead = 4)
fcast2
<-fcast2$fcst$GDP
fgdp<-fcast2$fcst$Employment
femp<-fcast2$fcst$stock_price
fsp<-data.frame(fsp[,1],fgdp[,1],femp[,1])
ff2
<-st + (i-1) + 1/4
year
<-ts(ff2,start=c(year,1),frequency = 4)
ff2
= 4*i-3
a = 4*i
bc(a:b),] <-sqrt((ff2-xtest)^2)
rmse2[
}
= rep(c(1999:2022),each =4)
yr = rep(paste0("Q",1:4),24)
qr
= data.frame(yr,qr,rmse1)
rmse1 names(rmse1) =c("Year", "Quater","Stock_Price","GDP","Employment")
= data.frame(yr,qr,rmse2)
rmse2 names(rmse2) =c("Year", "Quater","Stock_Price","GDP","Employment")
ggplot() +
geom_line(data = rmse1, aes(x = Year, y = Stock_Price),color = "blue") +
geom_line(data = rmse2, aes(x = Year, y = Stock_Price),color = "red") +
labs(
title = "CV RMSE for Stock_Price",
x = "Date",
y = "RMSE",
guides(colour=guide_legend(title="Fit")))
Show the code
ggplot() +
geom_line(data = rmse1, aes(x = Year, y = GDP),color = "blue") +
geom_line(data = rmse2, aes(x = Year, y = GDP),color = "red") +
labs(
title = "CV RMSE for GDP",
x = "Date",
y = "RMSE",
guides(colour=guide_legend(title="Fit")))
Warning: Removed 4 rows containing missing values (`geom_line()`).
Removed 4 rows containing missing values (`geom_line()`).
Show the code
ggplot() +
geom_line(data = rmse1, aes(x = Year, y = Employment),color = "blue") +
geom_line(data = rmse2, aes(x = Year, y = Employment),color = "red") +
labs(
title = "CV RMSE for Employment",
x = "Date",
y = "RMSE",
guides(colour=guide_legend(title="Fit")))
Warning: Removed 4 rows containing missing values (`geom_line()`).
Removed 4 rows containing missing values (`geom_line()`).
fit 1 is better
Step 5: Forecast
Show the code
<- predict(VAR(dd[, c(2:4)], p=5, type='both'))
forecasts
# visualize the iterated forecasts
plot(forecasts)