ARIMAX/SARIMAX/VAR

ARIMAX

ARIMAX model assumes that future values of a variable linearly depend on its past values, as well as on the values of past (stochastic) shocks. It is an extended version of the ARIMA model, with other independent (predictor) variables. The model is also referred to as the dynamic regression model. The X added to the end stands for “exogenous”. In other words, it suggests adding a separate different outside variable to help measure our endogenous variable.

The ‘exogenous’ variables added here are COVID-19 case numbers and COVID-19 vaccine rates. The number of COVID-19 cases in US can affect healthcare stock prices, e.g. UNH. Higher case numbers may result in increased demand for healthcare services, such as hospitalizations, treatments, and testing, which could positively impact the stock prices of healthcare companies involved in providing those services. Conversely, lower case numbers may lead to reduced demand for healthcare services, potentially resulting in lower stock prices for healthcare companies.

Vaccine rates, specifically the rate at which a population is vaccinated against COVID-19, can also impact healthcare stock prices. Higher vaccine rates are generally seen as positive for healthcare companies, as vaccines are considered a key tool in controlling the spread of the virus and reducing the severity of illness. Higher vaccine rates may lead to decreased demand for COVID-19 treatments and testing, but increased demand for vaccines and other preventive healthcare measures, which could positively impact the stock prices of healthcare companies involved in vaccine production or distribution, as well as other preventive healthcare services.

In this section, we choose the UNH stock price to fit ARIMAX model with COVID-19 case numbers and COVID-19 vaccine rates.

Step 1: Data Prepare

Show the code
getSymbols("UNH", from="2020-12-16", src="yahoo")
Show the code
UNH_df <- as.data.frame(UNH)
UNH_df$Date <- rownames(UNH_df)
UNH_df <- UNH_df[c('Date','UNH.Adjusted')]
UNH_df <- UNH_df %>%
  mutate(Date = as.Date(Date)) %>%
  complete(Date = seq.Date(min(Date), max(Date), by="day"))

# fill missing values in stock 
UNH_df <- UNH_df %>% fill(UNH.Adjusted)

new_dates <- seq(as.Date('2020-12-16'), as.Date('2023-3-21'),'week')

UNH_df <- UNH_df[which((UNH_df$Date) %in% new_dates),]

vaccine_df <- read.csv('data/vaccine_clean.csv')

new_dates <- seq(as.Date('2020-12-16'), as.Date('2023-3-22'),'week')

#vaccine_df
vaccine_df$Date <- as.Date(vaccine_df$Date)
vaccine_df <- vaccine_df[which((vaccine_df$Date) %in% new_dates),]

#covid_df
covid_df <- read.csv('data/covid.csv')

#covid_ts <- ts(covid_df$Weekly.Cases, start = c(2020,1,29), frequency = 54)
covid_df$Date <- as.Date(covid_df$Date)
covid_df <- covid_df[covid_df$Date >= '2020-12-16'&covid_df$Date < '2023-03-22',]

# combine all data, create dataframe
df <- data.frame(UNH_df, vaccine_df$total_doses, covid_df$Weekly.Cases)
colnames(df) <- c('Date', 'stock_price', 'vaccine_dose','covid_case')

knitr::kable(head(df))
Date stock_price vaccine_dose covid_case
2020-12-16 329.2309 160010 1526464
2020-12-23 327.5330 577895 1499703
2020-12-30 334.7126 848447 1299023
2021-01-06 348.5672 1029958 1584212
2021-01-13 344.4632 1334188 1716354
2021-01-20 340.3883 1614134 1391546

Step 2: Plotting the Data

Show the code
df.ts<-ts(df,star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)

autoplot(df.ts[,c(2:4)], facets=TRUE) +
  xlab("Date") + ylab("") +
  ggtitle("Variables influencing UNH Stock Price in USA")

UNH stock price, Covid, Vaccine values

Step 3: Fitting the model using ’auto.arima()`

Here I’m using auto.arima() function to fit the ARIMAX model. Here we are trying to predict UNH stock price using COVID vaccine dose and COVID cases. All variables are time series and the exogenous variables in this case are vaccine_dose and covid_case.

Show the code
xreg <- cbind(Vac = df.ts[, "vaccine_dose"],
              Imp = df.ts[, "covid_case"])

fit <- auto.arima(df.ts[, "stock_price"], xreg = xreg)
summary(fit)
Series: df.ts[, "stock_price"] 
Regression with ARIMA(0,1,0) errors 

Coefficients:
        Vac    Imp
      0e+00  0e+00
s.e.  1e-04  1e-04

sigma^2 = 177.9:  log likelihood = -468.1
AIC=942.2   AICc=942.41   BIC=950.49

Training set error measures:
                   ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
Training set 1.097616 13.16639 10.30662 0.233993 2.273228 0.1046469 0.04011524
Show the code
checkresiduals(fit)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(0,1,0) errors
Q* = 21.112, df = 24, p-value = 0.6321

Model df: 0.   Total lags used: 24

This is an ARIMA model. This is a Regression model with ARIMA(0,1,0) errors.

Step 4: Fitting the model manually

Here we will first have to fit the linear regression model predicting stock price using Covid cases and vaccine doses.

Then for the residuals, we will fit an ARIMA/SARIMA model.

Show the code
df$stock_price <- ts(df$stock_price,star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)
df$vaccine_dose <-ts(df$vaccine_dose,star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)
df$covid_case<-ts(df$covid_case,star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)

######### First fit the linear model#######
fit.reg <- lm(stock_price ~ vaccine_dose+covid_case, data=df)
summary(fit.reg)

Call:
lm(formula = stock_price ~ vaccine_dose + covid_case, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-154.24  -42.59    8.62   42.91   82.36 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.954e+02  7.932e+00  62.461  < 2e-16 ***
vaccine_dose -4.341e-05  5.043e-06  -8.607 4.47e-14 ***
covid_case   -3.275e-06  5.345e-06  -0.613    0.541    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 52.02 on 115 degrees of freedom
Multiple R-squared:  0.3944,    Adjusted R-squared:  0.3839 
F-statistic: 37.45 on 2 and 115 DF,  p-value: 2.988e-13
Show the code
res.fit<-ts(residuals(fit.reg),star=decimal_date(as.Date("2020-12-16",format = "%Y-%m-%d")),frequency = 52)

########## Then look at the residuals ########
acf(res.fit)

Show the code
Pacf(res.fit)

Show the code
res.fit %>% diff() %>% ggtsdisplay()

Show the code
res.fit %>% diff() %>% diff(52) %>% ggtsdisplay()

Finding the model parameters.

Show the code
d=1
i=1
temp= data.frame()
ls=matrix(rep(NA,6*23),nrow=23) # roughly nrow = 3x4x2


for (p in 1:3)# p=1,2,
{
  for(q in 1:3)# q=1,2,
  {
    for(d in 0:1)# 
    {
      
      if(p-1+d+q-1<=8)
      {
        
        model<- Arima(res.fit,order=c(p-1,d,q-1),include.drift=TRUE) 
        ls[i,]= c(p-1,d,q-1,model$aic,model$bic,model$aicc)
        i=i+1
        #print(i)
        
      }
      
    }
  }
}

temp= as.data.frame(ls)
names(temp)= c("p","d","q","AIC","BIC","AICc")

#temp
knitr::kable(temp)
p d q AIC BIC AICc
0 0 0 1217.4187 1225.7307 1217.6292
0 1 0 993.5902 999.1146 993.6955
0 0 1 1117.7725 1128.8552 1118.1264
0 1 1 993.1256 1001.4121 993.3380
0 0 2 1058.3969 1072.2503 1058.9326
0 1 2 994.2046 1005.2533 994.5617
1 0 0 1005.4074 1016.4901 1005.7614
1 1 0 992.7214 1001.0079 992.9338
1 0 1 1003.8974 1017.7508 1004.4331
1 1 1 994.2405 1005.2892 994.5976
1 0 2 1003.9563 1020.5804 1004.7130
1 1 2 995.7980 1009.6089 996.3386
2 0 0 1002.8800 1016.7334 1003.4157
2 1 0 994.1790 1005.2277 994.5362
2 0 1 999.6163 1016.2404 1000.3730
2 1 1 995.8971 1009.7080 996.4376
2 0 2 1005.7699 1025.1647 1006.7881
2 1 2 995.6381 1012.2111 996.4017
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
Show the code
print('Minimum AIC')
[1] "Minimum AIC"
Show the code
temp[which.min(temp$AIC),] 
  p d q      AIC      BIC     AICc
8 1 1 0 992.7214 1001.008 992.9338
Show the code
print('Minimum BIC')
[1] "Minimum BIC"
Show the code
temp[which.min(temp$BIC),] 
  p d q      AIC      BIC     AICc
2 0 1 0 993.5902 999.1146 993.6955
Show the code
print('Minimum AICc')
[1] "Minimum AICc"
Show the code
temp[which.min(temp$AICc),] 
  p d q      AIC      BIC     AICc
8 1 1 0 992.7214 1001.008 992.9338
Show the code
set.seed(1234)

model_output12 <- capture.output(sarima(res.fit, 0,1,0)) 

Show the code
cat(model_output12[9:38], model_output12[length(model_output12)], sep = "\n")
$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
      constant
        1.0885
s.e.    1.5357

sigma^2 estimated as 275.9:  log likelihood = -494.8,  aic = 993.59

$degrees_of_freedom
[1] 116

$ttable
         Estimate     SE t.value p.value
constant   1.0885 1.5357  0.7088  0.4799

$AIC
[1] 8.492224

$AICc
[1] 8.492522

$BIC
[1] 8.539441
Show the code
set.seed(1234)

model_output13 <- capture.output(sarima(res.fit, 1,1,0)) 

Show the code
cat(model_output13[16:46], model_output13[length(model_output13)], sep = "\n")
$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
         ar1  constant
      0.1556    1.1047
s.e.  0.0913    1.7935

sigma^2 estimated as 269.2:  log likelihood = -493.36,  aic = 992.72

$degrees_of_freedom
[1] 115

$ttable
         Estimate     SE t.value p.value
ar1        0.1556 0.0913  1.7049  0.0909
constant   1.1047 1.7935  0.6160  0.5391

$AIC
[1] 8.484798

$AICc
[1] 8.485698

$BIC
[1] 8.555623

ARIMA(0,1,0) and ARIMA(1,1,0) both look okay.

Step 5: Using Cross Validation

Show the code
k <- 36 # minimum data length for fitting a model 
n <- length(res.fit)
n-k # rest of the observations
[1] 82
Show the code
i=1
err1 = c()
err2 = c()

rmse1 <- c()
rmse2 <- c()

for(i in 1:(n-k))
{
  xtrain <- res.fit[1:(k-1)+i] #observations from 1 to 75
  xtest <- res.fit[k+i] #76th observation as the test set
  
  fit <- Arima(xtrain, order=c(1,1,0),include.drift=FALSE, method="ML")
  fcast1 <- forecast(fit, h=1)
  
  fit2 <- Arima(xtrain, order=c(0,1,0),include.drift=FALSE, method="ML")
  fcast2 <- forecast(fit2, h=1)
  
  #capture error for each iteration
  # This is mean absolute error
  err1 = c(err1, abs(fcast1$mean-xtest)) 
  err2 = c(err2, abs(fcast2$mean-xtest))
  
  # This is mean squared error
  err3 = c(err1, (fcast1$mean-xtest)^2)
  err4 = c(err2, (fcast2$mean-xtest)^2)
  
  rmse1 <- c(rmse1, sqrt((fcast1$mean-xtest)^2))
  rmse2 <- c(rmse2, sqrt((fcast2$mean-xtest)^2))
  
}

(MAE1=mean(err1)) # This is mean absolute error
[1] 13.45554
Show the code
(MAE2=mean(err2)) #has slightly higher error
[1] 13.46804
Show the code
MSE1=mean(err1) #fit 1,1,0
MSE2=mean(err2)#fit 0,1,0

MSE1
[1] 13.45554
Show the code
MSE2
[1] 13.46804
Show the code
rmse_df <- data.frame(rmse1,rmse2)
rmse_df$x <- as.numeric(rownames(rmse_df))

plot(rmse_df$x, rmse_df$rmse1, type = 'l', col=2, xlab="horizon", ylab="RMSE")
lines(rmse_df$x, rmse_df$rmse2, type="l",col=3)
legend("topleft",legend=c("fit1","fit2"),col=2:4,lty=1)

Step 6: forcasting

Show the code
vac_fit<-auto.arima(df$vaccine_dose) #fiting an ARIMA model to the vaccine_dose variable
summary(vac_fit)
Series: df$vaccine_dose 
ARIMA(1,1,1) 

Coefficients:
         ar1      ma1
      0.8341  -0.6298
s.e.  0.0916   0.1128

sigma^2 = 4.824e+10:  log likelihood = -1604.18
AIC=3214.36   AICc=3214.57   BIC=3222.65

Training set error measures:
                    ME     RMSE      MAE          MPE     MAPE      MASE
Training set -1592.486 216829.1 132002.8 0.0009291879 13.08388 0.1149613
                    ACF1
Training set -0.06829849
Show the code
fvac<-forecast(vac_fit)

covid_fit<-auto.arima(df$covid_case) #fiting an ARIMA model to the covid_case variable
summary(covid_fit)
Series: df$covid_case 
ARIMA(2,0,2) with non-zero mean 

Coefficients:
         ar1      ar2     ma1     ma2      mean
      1.3306  -0.5128  0.8165  0.4848  733469.2
s.e.  0.1153   0.1117  0.1363  0.1076  189902.4

sigma^2 = 2.904e+10:  log likelihood = -1588.94
AIC=3189.87   AICc=3190.63   BIC=3206.5

Training set error measures:
                    ME     RMSE      MAE       MPE     MAPE      MASE
Training set -424.0988 166757.6 99191.78 -6.959882 17.02194 0.1095754
                    ACF1
Training set -0.00122146
Show the code
fcov<-forecast(covid_fit)

fxreg <- cbind(Vac = fvac$mean,
              Cov = fcov$mean)

fcast <- forecast(fit, xreg=fxreg) #fimp$mean gives the forecasted values
Warning in forecast.forecast_ARIMA(fit, xreg = fxreg): xreg not required by this
model, ignoring the provided regressors
Show the code
autoplot(fcast) + xlab("Date") +
  ylab("Price")

Discussion

Based on the result above, the number of daily COVID-19 vaccination number is not significant on predicting the UNH stock price, while the number of COVID-19 cases is. Though the number of COVID-19 vaccination number is not significant here, it does not mean that it does not affect healthcare stock price. While UNH is a healthcare company that is involved in health insurance, healthcare services, and technology solutions, it is not directly involved in COVID-19 vaccine development or production. Therefore, the number of COVID-19 vaccine doses administered may not have a direct impact on UNH’s operations or revenue.

VAR

VAR models (vector autoregressive models) are used for multivariate time series. The structure is that each variable is a linear function of past lags of itself and past lags of the other variables. A Vector autoregressive (VAR) model is useful when one is interested in predicting multiple time series variables using a single model.

The variables we are interested here are GDP and Unemployment rate in US. The overall health of the economy, as reflected by the GDP, can influence healthcare stock prices. A strong GDP generally indicates a healthy economy with higher levels of consumer spending, business investment, and economic growth. In such an environment, healthcare companies may experience increased demand for their products and services, which could positively impact their stock prices. Conversely, a weak GDP may signal an economic slowdown or recession, which could result in reduced demand for healthcare products and services, potentially leading to lower stock prices for healthcare companies. The unemployment rate, which reflects the percentage of the labor force that is unemployed, can also affect healthcare stock prices. A low unemployment rate is generally indicative of a strong labor market, with more people employed and potentially having access to employer-sponsored healthcare benefits. This may result in increased demand for healthcare services, positively impacting healthcare stock prices. On the other hand, a high unemployment rate may indicate a weak labor market, with more people losing their jobs and potentially losing access to healthcare benefits, leading to reduced demand for healthcare services and potentially lower stock prices for healthcare companies.

In this section, I’m fitting a VAR model to find multivariate relationship between the series UNH stock price, GDP, and Employment.

Step 1: Data Preparing

Show the code
getSymbols("UNH", from="1989-12-29", src="yahoo")
[1] "UNH"
Show the code
UNH_df <- as.data.frame(UNH)
UNH_df$Date <- rownames(UNH_df)
UNH_df <- UNH_df[c('Date','UNH.Adjusted')]
UNH_df <- UNH_df %>%
  mutate(Date = as.Date(Date)) %>%
  complete(Date = seq.Date(min(Date), max(Date), by="day"))

# fill missing values in stock 
UNH_df <- UNH_df %>% fill(UNH.Adjusted)
#UNH_df
new_dates <- seq(as.Date('1990-01-01'), as.Date('2022-10-01'),'quarter')
#new_dates
UNH_df <- UNH_df[which((UNH_df$Date) %in% new_dates),]

gdp <- read.csv('data/GDP59.CSV')
gdp$DATE <- as.Date(gdp$DATE)
gdp <- gdp[which((gdp$DATE) %in% new_dates),]

emp <- read.csv('data/Employment59.CSV')
emp$DATE <- as.Date(emp$DATE)
emp <- emp[which((emp$DATE) %in% new_dates),]
Show the code
dd <- data.frame(UNH_df,gdp,emp)
dd <- dd[,c(1,2,4,6)]
colnames(dd) <- c('DATE', 'stock_price','GDP','Employment')
knitr::kable(head(dd))
DATE stock_price GDP Employment
125 1990-01-01 0.308110 5872.701 109418.7
126 1990-04-01 0.247759 5960.028 109786.7
127 1990-07-01 0.438343 6015.116 109651.0
128 1990-10-01 0.435356 6004.733 109255.0
129 1991-01-01 0.591069 6035.178 108783.0
130 1991-04-01 0.985114 6126.862 108312.3

Step 2: Plotting the data

Show the code
dd.ts<-ts(dd,star=decimal_date(as.Date("1990-01-01",format = "%Y-%m-%d")),frequency = 4)

autoplot(dd.ts[,c(2:4)], facets=TRUE) +
  xlab("Year") + ylab("") +
  ggtitle("UNH Stock Price, GDP and Employment in USA")

Step 3: Fitting a VAR model

Show the code
VARselect(dd[, c(2:4)], lag.max=10, type="both")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
    10      9      9      9 

$criteria
                  1            2            3            4            5
AIC(n) 2.941404e+01 2.865601e+01 2.849637e+01 2.814189e+01 2.812288e+01
HQ(n)  2.955407e+01 2.888005e+01 2.880444e+01 2.853398e+01 2.859898e+01
SC(n)  2.975880e+01 2.920762e+01 2.925484e+01 2.910721e+01 2.929506e+01
FPE(n) 5.948597e+12 2.788633e+12 2.379347e+12 1.671830e+12 1.644336e+12
                  6            7            8            9           10
AIC(n) 2.758294e+01 2.723307e+01 2.696700e+01 2.669062e+01 2.668670e+01
HQ(n)  2.814305e+01 2.787720e+01 2.769516e+01 2.750279e+01 2.758289e+01
SC(n)  2.896196e+01 2.881895e+01 2.875974e+01 2.869021e+01 2.889314e+01
FPE(n) 9.616264e+11 6.809549e+11 5.251270e+11 4.014819e+11 4.038690e+11

It’s clear that according to selection criteria p=10 and 9 are good.

I’m fitting several models with p=1(for simplicity), 5, and 9.=> VAR(1), VAR(5), VAR(9)

Show the code
summary(VAR(dd[, c(2:4)], p=1, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: stock_price, GDP, Employment 
Deterministic variables: both 
Sample size: 131 
Log Likelihood: -2461.36 
Roots of the characteristic polynomial:
1.028 0.9422  0.78
Call:
VAR(y = dd[, c(2:4)], p = 1, type = "both")


Estimation results for equation stock_price: 
============================================ 
stock_price = stock_price.l1 + GDP.l1 + Employment.l1 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1  1.0341243  0.0394781  26.195   <2e-16 ***
GDP.l1         -0.0005115  0.0039797  -0.129   0.8979    
Employment.l1  -0.0006163  0.0003038  -2.029   0.0446 *  
const          69.2145818 34.7475964   1.992   0.0485 *  
trend           0.3006699  0.4555344   0.660   0.5104    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 14.62 on 126 degrees of freedom
Multiple R-Squared: 0.9849, Adjusted R-squared: 0.9844 
F-statistic:  2057 on 4 and 126 DF,  p-value: < 2.2e-16 


Estimation results for equation GDP: 
==================================== 
GDP = stock_price.l1 + GDP.l1 + Employment.l1 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1    2.55503    0.64584   3.956 0.000126 ***
GDP.l1            0.79020    0.06511  12.137  < 2e-16 ***
Employment.l1    -0.01069    0.00497  -2.151 0.033349 *  
const          2355.16290  568.45522   4.143 6.24e-05 ***
trend            27.89290    7.45234   3.743 0.000275 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 239.2 on 126 degrees of freedom
Multiple R-Squared: 0.998,  Adjusted R-squared: 0.998 
F-statistic: 1.607e+04 on 4 and 126 DF,  p-value: < 2.2e-16 


Estimation results for equation Employment: 
=========================================== 
Employment = stock_price.l1 + GDP.l1 + Employment.l1 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1  6.183e+00  4.862e+00   1.272  0.20584    
GDP.l1         -5.610e-01  4.901e-01  -1.145  0.25458    
Employment.l1   9.253e-01  3.741e-02  24.733  < 2e-16 ***
const           1.161e+04  4.279e+03   2.712  0.00762 ** 
trend           8.544e+01  5.610e+01   1.523  0.13028    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1801 on 126 degrees of freedom
Multiple R-Squared: 0.9786, Adjusted R-squared: 0.9779 
F-statistic:  1441 on 4 and 126 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
            stock_price    GDP Employment
stock_price       213.8   1386      12148
GDP              1385.8  57224     396720
Employment      12147.6 396720    3242998

Correlation matrix of residuals:
            stock_price    GDP Employment
stock_price      1.0000 0.3962     0.4613
GDP              0.3962 1.0000     0.9209
Employment       0.4613 0.9209     1.0000
Show the code
summary(VAR(dd[, c(2:4)], p=5, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: stock_price, GDP, Employment 
Deterministic variables: both 
Sample size: 127 
Log Likelihood: -2267.863 
Roots of the characteristic polynomial:
1.054 0.9435 0.9435 0.8781 0.8781 0.7952 0.7952 0.7806 0.7806 0.7709 0.7709 0.6959 0.4107 0.4107 0.3355
Call:
VAR(y = dd[, c(2:4)], p = 5, type = "both")


Estimation results for equation stock_price: 
============================================ 
stock_price = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1  5.454e-01  1.048e-01   5.204 9.13e-07 ***
GDP.l1          6.007e-02  1.789e-02   3.358  0.00108 ** 
Employment.l1  -7.709e-03  2.385e-03  -3.232  0.00162 ** 
stock_price.l2  2.323e-01  1.468e-01   1.583  0.11639    
GDP.l2         -6.188e-02  2.695e-02  -2.297  0.02353 *  
Employment.l2   8.708e-03  4.117e-03   2.115  0.03668 *  
stock_price.l3 -6.701e-02  1.636e-01  -0.410  0.68293    
GDP.l3          1.256e-02  2.614e-02   0.480  0.63198    
Employment.l3  -3.778e-03  4.122e-03  -0.917  0.36136    
stock_price.l4  2.159e-01  1.796e-01   1.202  0.23192    
GDP.l4         -1.114e-02  2.275e-02  -0.490  0.62525    
Employment.l4   2.759e-03  3.733e-03   0.739  0.46146    
stock_price.l5  2.480e-01  1.956e-01   1.268  0.20751    
GDP.l5         -1.060e-02  1.379e-02  -0.769  0.44375    
Employment.l5  -6.394e-04  2.197e-03  -0.291  0.77155    
const           1.189e+02  3.723e+01   3.194  0.00183 ** 
trend           1.493e+00  5.610e-01   2.661  0.00897 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 12.06 on 110 degrees of freedom
Multiple R-Squared: 0.9909, Adjusted R-squared: 0.9896 
F-statistic: 750.5 on 16 and 110 DF,  p-value: < 2.2e-16 


Estimation results for equation GDP: 
==================================== 
GDP = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1   -1.30445    1.54245  -0.846 0.399555    
GDP.l1            1.92765    0.26327   7.322 4.28e-11 ***
Employment.l1    -0.18525    0.03510  -5.277 6.65e-07 ***
stock_price.l2    7.85075    2.15990   3.635 0.000425 ***
GDP.l2           -0.49800    0.39656  -1.256 0.211852    
Employment.l2     0.14315    0.06059   2.363 0.019907 *  
stock_price.l3    1.97665    2.40806   0.821 0.413510    
GDP.l3           -0.68441    0.38474  -1.779 0.078023 .  
Employment.l3     0.07846    0.06066   1.293 0.198571    
stock_price.l4   -3.50097    2.64351  -1.324 0.188127    
GDP.l4           -0.53338    0.33481  -1.593 0.114012    
Employment.l4     0.03338    0.05493   0.608 0.544643    
stock_price.l5   -4.52830    2.87918  -1.573 0.118643    
GDP.l5            0.62554    0.20300   3.081 0.002603 ** 
Employment.l5    -0.07633    0.03233  -2.361 0.019978 *  
const          1593.63764  547.91857   2.909 0.004394 ** 
trend            21.22796    8.25642   2.571 0.011473 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 177.5 on 110 degrees of freedom
Multiple R-Squared: 0.999,  Adjusted R-squared: 0.9988 
F-statistic:  6815 on 16 and 110 DF,  p-value: < 2.2e-16 


Estimation results for equation Employment: 
=========================================== 
Employment = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1 -3.786e+01  1.190e+01  -3.182  0.00190 ** 
GDP.l1          6.787e+00  2.030e+00   3.343  0.00113 ** 
Employment.l1  -1.385e-02  2.707e-01  -0.051  0.95928    
stock_price.l2  7.770e+01  1.666e+01   4.664 8.77e-06 ***
GDP.l2         -2.705e+00  3.058e+00  -0.885  0.37831    
Employment.l2   6.207e-01  4.673e-01   1.328  0.18686    
stock_price.l3  4.014e+00  1.857e+01   0.216  0.82927    
GDP.l3         -8.103e+00  2.967e+00  -2.731  0.00736 ** 
Employment.l3   1.098e+00  4.678e-01   2.346  0.02075 *  
stock_price.l4 -1.641e+01  2.039e+01  -0.805  0.42269    
GDP.l4         -1.499e+00  2.582e+00  -0.581  0.56270    
Employment.l4  -1.900e-01  4.237e-01  -0.448  0.65470    
stock_price.l5 -2.700e+01  2.221e+01  -1.216  0.22657    
GDP.l5          4.674e+00  1.566e+00   2.985  0.00349 ** 
Employment.l5  -5.865e-01  2.493e-01  -2.352  0.02044 *  
const           1.222e+04  4.226e+03   2.891  0.00462 ** 
trend           1.203e+02  6.368e+01   1.890  0.06146 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1369 on 110 degrees of freedom
Multiple R-Squared: 0.9879, Adjusted R-squared: 0.9861 
F-statistic: 560.4 on 16 and 110 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
            stock_price      GDP Employment
stock_price       145.5    455.4       5503
GDP               455.4  31506.6     225145
Employment       5503.3 225144.8    1874014

Correlation matrix of residuals:
            stock_price    GDP Employment
stock_price      1.0000 0.2127     0.3333
GDP              0.2127 1.0000     0.9266
Employment       0.3333 0.9266     1.0000
Show the code
summary(VAR(dd[, c(2:4)], p=9, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: stock_price, GDP, Employment 
Deterministic variables: both 
Sample size: 123 
Log Likelihood: -2076.371 
Roots of the characteristic polynomial:
1.058 1.051 1.018 1.018 1.012 1.012 1.012 1.012 1.005 1.005 1.003 1.003 0.9793 0.9793 0.9726 0.9726 0.8256 0.8256 0.8098 0.8098 0.7926 0.7926 0.6662 0.6662 0.4868 0.4868 0.0237
Call:
VAR(y = dd[, c(2:4)], p = 9, type = "both")


Estimation results for equation stock_price: 
============================================ 
stock_price = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + stock_price.l6 + GDP.l6 + Employment.l6 + stock_price.l7 + GDP.l7 + Employment.l7 + stock_price.l8 + GDP.l8 + Employment.l8 + stock_price.l9 + GDP.l9 + Employment.l9 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1  0.6177986  0.1140057   5.419 4.61e-07 ***
GDP.l1          0.0415175  0.0139527   2.976 0.003717 ** 
Employment.l1  -0.0069645  0.0020733  -3.359 0.001131 ** 
stock_price.l2  0.3603716  0.1652332   2.181 0.031680 *  
GDP.l2         -0.0157162  0.0211664  -0.743 0.459633    
Employment.l2   0.0048987  0.0033424   1.466 0.146093    
stock_price.l3  0.0177119  0.1607489   0.110 0.912499    
GDP.l3         -0.0009781  0.0212811  -0.046 0.963439    
Employment.l3  -0.0018913  0.0034992  -0.540 0.590135    
stock_price.l4 -0.1568989  0.1711382  -0.917 0.361596    
GDP.l4         -0.0352143  0.0218848  -1.609 0.110953    
Employment.l4   0.0072039  0.0035652   2.021 0.046168 *  
stock_price.l5  0.3383735  0.2044761   1.655 0.101294    
GDP.l5         -0.0156536  0.0222246  -0.704 0.482964    
Employment.l5   0.0016545  0.0036673   0.451 0.652934    
stock_price.l6  0.1416245  0.2112799   0.670 0.504299    
GDP.l6         -0.0184066  0.0221573  -0.831 0.408236    
Employment.l6  -0.0012687  0.0036506  -0.348 0.728965    
stock_price.l7 -0.9691876  0.2444899  -3.964 0.000144 ***
GDP.l7          0.0190303  0.0219980   0.865 0.389190    
Employment.l7  -0.0017353  0.0035712  -0.486 0.628163    
stock_price.l8  1.0643895  0.2599676   4.094 8.96e-05 ***
GDP.l8          0.0123808  0.0212167   0.584 0.560927    
Employment.l8  -0.0058935  0.0033845  -1.741 0.084893 .  
stock_price.l9 -0.2681893  0.2116088  -1.267 0.208149    
GDP.l9          0.0063637  0.0130395   0.488 0.626663    
Employment.l9   0.0035409  0.0021429   1.652 0.101786    
const          79.8647702 39.9369777   2.000 0.048411 *  
trend           0.9524397  0.5617195   1.696 0.093276 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 8.687 on 94 degrees of freedom
Multiple R-Squared: 0.9959, Adjusted R-squared: 0.9947 
F-statistic:   819 on 28 and 94 DF,  p-value: < 2.2e-16 


Estimation results for equation GDP: 
==================================== 
GDP = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + stock_price.l6 + GDP.l6 + Employment.l6 + stock_price.l7 + GDP.l7 + Employment.l7 + stock_price.l8 + GDP.l8 + Employment.l8 + stock_price.l9 + GDP.l9 + Employment.l9 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1 -5.605e+00  1.777e+00  -3.154  0.00217 ** 
GDP.l1          1.818e+00  2.175e-01   8.360 5.58e-13 ***
Employment.l1  -1.497e-01  3.232e-02  -4.631 1.17e-05 ***
stock_price.l2  1.352e+01  2.576e+00   5.250 9.43e-07 ***
GDP.l2         -7.098e-01  3.300e-01  -2.151  0.03403 *  
Employment.l2   1.262e-01  5.211e-02   2.422  0.01734 *  
stock_price.l3 -1.116e+00  2.506e+00  -0.445  0.65722    
GDP.l3         -2.534e-01  3.318e-01  -0.764  0.44683    
Employment.l3   3.585e-02  5.455e-02   0.657  0.51262    
stock_price.l4  1.673e+00  2.668e+00   0.627  0.53208    
GDP.l4         -8.506e-02  3.412e-01  -0.249  0.80365    
Employment.l4   1.384e-02  5.558e-02   0.249  0.80384    
stock_price.l5 -7.495e-01  3.188e+00  -0.235  0.81462    
GDP.l5          1.171e-01  3.465e-01   0.338  0.73604    
Employment.l5  -2.497e-02  5.717e-02  -0.437  0.66327    
stock_price.l6 -1.389e+01  3.294e+00  -4.216 5.71e-05 ***
GDP.l6          2.705e-02  3.454e-01   0.078  0.93775    
Employment.l6  -9.452e-03  5.691e-02  -0.166  0.86846    
stock_price.l7 -1.613e+00  3.812e+00  -0.423  0.67319    
GDP.l7         -2.916e-01  3.429e-01  -0.850  0.39727    
Employment.l7   7.971e-02  5.567e-02   1.432  0.15551    
stock_price.l8  7.701e+00  4.053e+00   1.900  0.06049 .  
GDP.l8          9.947e-02  3.308e-01   0.301  0.76428    
Employment.l8  -1.132e-01  5.276e-02  -2.144  0.03457 *  
stock_price.l9  1.073e+00  3.299e+00   0.325  0.74571    
GDP.l9          1.156e-01  2.033e-01   0.569  0.57100    
Employment.l9   3.477e-02  3.341e-02   1.041  0.30064    
const           1.603e+03  6.226e+02   2.575  0.01159 *  
trend           2.134e+01  8.757e+00   2.437  0.01669 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 135.4 on 94 degrees of freedom
Multiple R-Squared: 0.9995, Adjusted R-squared: 0.9993 
F-statistic:  6211 on 28 and 94 DF,  p-value: < 2.2e-16 


Estimation results for equation Employment: 
=========================================== 
Employment = stock_price.l1 + GDP.l1 + Employment.l1 + stock_price.l2 + GDP.l2 + Employment.l2 + stock_price.l3 + GDP.l3 + Employment.l3 + stock_price.l4 + GDP.l4 + Employment.l4 + stock_price.l5 + GDP.l5 + Employment.l5 + stock_price.l6 + GDP.l6 + Employment.l6 + stock_price.l7 + GDP.l7 + Employment.l7 + stock_price.l8 + GDP.l8 + Employment.l8 + stock_price.l9 + GDP.l9 + Employment.l9 + const + trend 

                 Estimate Std. Error t value Pr(>|t|)    
stock_price.l1 -6.801e+01  1.216e+01  -5.595 2.17e-07 ***
GDP.l1          6.439e+00  1.488e+00   4.328 3.75e-05 ***
Employment.l1   1.372e-01  2.211e-01   0.621  0.53643    
stock_price.l2  1.137e+02  1.762e+01   6.453 4.74e-09 ***
GDP.l2         -4.116e+00  2.257e+00  -1.824  0.07136 .  
Employment.l2   5.335e-01  3.564e-01   1.497  0.13778    
stock_price.l3 -4.159e+00  1.714e+01  -0.243  0.80882    
GDP.l3         -3.838e+00  2.269e+00  -1.691  0.09408 .  
Employment.l3   5.003e-01  3.731e-01   1.341  0.18318    
stock_price.l4  1.066e+01  1.825e+01   0.584  0.56038    
GDP.l4         -6.501e-02  2.334e+00  -0.028  0.97783    
Employment.l4   1.168e-01  3.802e-01   0.307  0.75925    
stock_price.l5  2.259e+01  2.180e+01   1.036  0.30277    
GDP.l5          8.889e-01  2.370e+00   0.375  0.70845    
Employment.l5  -2.821e-01  3.911e-01  -0.721  0.47245    
stock_price.l6 -1.130e+02  2.253e+01  -5.017 2.48e-06 ***
GDP.l6         -1.567e+00  2.363e+00  -0.663  0.50891    
Employment.l6   2.161e-01  3.893e-01   0.555  0.58019    
stock_price.l7 -2.070e+01  2.607e+01  -0.794  0.42930    
GDP.l7         -1.096e+00  2.346e+00  -0.467  0.64133    
Employment.l7   4.753e-01  3.808e-01   1.248  0.21512    
stock_price.l8  8.864e+01  2.772e+01   3.198  0.00189 ** 
GDP.l8          1.933e+00  2.262e+00   0.854  0.39515    
Employment.l8  -1.204e+00  3.609e-01  -3.336  0.00122 ** 
stock_price.l9 -3.272e+01  2.256e+01  -1.450  0.15042    
GDP.l9          8.076e-01  1.390e+00   0.581  0.56273    
Employment.l9   4.222e-01  2.285e-01   1.848  0.06777 .  
const           1.262e+04  4.259e+03   2.963  0.00386 ** 
trend           9.703e+01  5.990e+01   1.620  0.10859    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 926.4 on 94 degrees of freedom
Multiple R-Squared: 0.9945, Adjusted R-squared: 0.9928 
F-statistic: 603.2 on 28 and 94 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
            stock_price      GDP Employment
stock_price       75.47    290.5       2750
GDP              290.52  18342.3     112144
Employment      2749.66 112143.8     858125

Correlation matrix of residuals:
            stock_price    GDP Employment
stock_price      1.0000 0.2469     0.3417
GDP              0.2469 1.0000     0.8939
Employment       0.3417 0.8939     1.0000

Step 4: Using Cross Validation

Show the code
n=length(dd$stock_price)
k=39

#n-k=92; 92/4=23;

rmse1 <- matrix(NA, 96,3)
rmse2 <- matrix(NA, 96,3)
rmse3 <- matrix(NA,23,4)
year<-c()

# Convert data frame to time series object
ts_obj <- ts(dd[, c(2:4)], star=decimal_date(as.Date("1990-01-01",format = "%Y-%m-%d")),frequency = 4)

st <- tsp(ts_obj )[1]+(k-1)/4 


for(i in 1:23)
{
  
  xtrain <- window(ts_obj, end=st + i-1)
  xtest <- window(ts_obj, start=st + (i-1) + 1/4, end=st + i)
  
  
  fit <- VAR(ts_obj, p=5, type='both')
  fcast <- predict(fit, n.ahead = 4)
  
  fgdp<-fcast$fcst$GDP
  femp<-fcast$fcst$Employment
  fsp<-fcast$fcst$stock_price
  ff<-data.frame(fsp[,1],fgdp[,1],femp[,1])
  
  year<-st + (i-1) + 1/4
  
  ff<-ts(ff,start=c(year,1),frequency = 4)
  
  a = 4*i-3
  b= 4*i
  rmse1[c(a:b),]  <-sqrt((ff-xtest)^2)
  
  fit2 <- VAR(ts_obj, p=9, type='both')
  fcast2 <- predict(fit2, n.ahead = 4)
  
  fgdp<-fcast2$fcst$GDP
  femp<-fcast2$fcst$Employment
  fsp<-fcast2$fcst$stock_price
  ff2<-data.frame(fsp[,1],fgdp[,1],femp[,1])
  
  year<-st + (i-1) + 1/4
  
  ff2<-ts(ff2,start=c(year,1),frequency = 4)
  
  a = 4*i-3
  b= 4*i
  rmse2[c(a:b),]  <-sqrt((ff2-xtest)^2)
}

yr = rep(c(1999:2022),each =4)
qr = rep(paste0("Q",1:4),24)

rmse1 = data.frame(yr,qr,rmse1)
names(rmse1) =c("Year", "Quater","Stock_Price","GDP","Employment")
rmse2 = data.frame(yr,qr,rmse2)
names(rmse2) =c("Year", "Quater","Stock_Price","GDP","Employment")

ggplot() + 
  geom_line(data = rmse1, aes(x = Year, y = Stock_Price),color = "blue") +
  geom_line(data = rmse2, aes(x = Year, y = Stock_Price),color = "red") +
  labs(
    title = "CV RMSE for Stock_Price",
    x = "Date",
    y = "RMSE",
    guides(colour=guide_legend(title="Fit")))

Show the code
ggplot() + 
  geom_line(data = rmse1, aes(x = Year, y = GDP),color = "blue") +
  geom_line(data = rmse2, aes(x = Year, y = GDP),color = "red") +
  labs(
    title = "CV RMSE for GDP",
    x = "Date",
    y = "RMSE",
    guides(colour=guide_legend(title="Fit")))
Warning: Removed 4 rows containing missing values (`geom_line()`).
Removed 4 rows containing missing values (`geom_line()`).

Show the code
ggplot() + 
  geom_line(data = rmse1, aes(x = Year, y = Employment),color = "blue") +
  geom_line(data = rmse2, aes(x = Year, y = Employment),color = "red") +
  labs(
    title = "CV RMSE for Employment",
    x = "Date",
    y = "RMSE",
    guides(colour=guide_legend(title="Fit")))
Warning: Removed 4 rows containing missing values (`geom_line()`).
Removed 4 rows containing missing values (`geom_line()`).

fit 1 is better

Step 5: Forecast

Show the code
forecasts <- predict(VAR(dd[, c(2:4)], p=5, type='both'))

# visualize the iterated forecasts
plot(forecasts)