Time Series (1)

Time Series (Introduction)

Characteristics of Time Series

The primary objective of time series analysis is to develop mathematical models that provide plausible descriptions for sample data with time correlations. In order to provide a statistical setting for describing the character of data that seemingly fluctuate in a random fashion over time, we assume a time series can be defined as a collection of random variables indexed according to the order they are obtained in time. In general, a collection of random variables {Xt} indexed by t is referred to as a stochastic process. In this text, t will typically be discrete and vary over the integers.

Example of Series:

White Noise: A collection of uncorrelated, independent and identically distributed random variables Wt with mean 0 and finite variance σw2. A particular useful white noise is Gaussian white noise, that is Wti.i.dN(0,σw2)

Moving Average: We might replace the white noise series Wt by a moving average that smooths the series: Vt=13(Wt1+Wt+Wt+1) This introduces a smoother version of white noise series, reflecting the fact that the slower oscillations are more apparent and some of the faster oscillations are taken out.

Autoregressions: Suppose we consider the white noise series Wt as input and calculate the output using the second-order equation: Xt=Xt10.9Xt2+Wt For t=1,...,500. We can see the periodic behavior of the series.

Autocorrelation and Cross-Correlation

A complete description of a time series with N random variables at arbitrary integer time points t1,...,tN is given by joint distribution function (joint CDF), evaluated as the probability that the values of the series are jointly less than the N constants c1,...,cN:

FXt1,...,XtN(c1,....,cN)=P(Xt1c1,...,XtNcN)

In practice, the multidimensional distribution function cannot usually be written easily unless the random variables are jointly normal. It is an unwieldy tool for displaying and analyzing time series data. On the other hand, the marginal distribution functions:

FXt(xt)=P(Xtxt)

or the corresponding marginal density functions:

fXt(xt)=FXt(xt)xt

And The Mean Function:

When they exist, are often informative for examining the marginal behavior of the series.


Autocovariance

The autocovariance measures the linear dependence between two points on the same series observed at different times:

  1. Very Smooth series exhibit autocovariance functions that stay large even when t and s are far apart
  2. Very Choppy series tend to have auto covariance functions that are nearly zero for large separations.


Autocorrelation

The ACF measures the linear predictability of the series at time t, say Xt, using only the value Xs. iF we can predict Xt perfectly from Xs through a linear relationship Xt=β0+β1Xs, then the correlation will be +1 or 1 depends on the sign of β1. Hence, we have a rough measure of the ability to forecast the series at time t from the value at time s.


Cross-covariance and Cross-correlation

Often, we want to measure the predictability of another series (different components) Yt from the series Xs. Assuming both series have finite variances, we have the following definition:

We can easily extend the idea to multivariate time series where each sample contains r attributes:

Xt=<Xt1,....,XtR>

The extension of autocovariance is then:

γjk(s,t)=E[(Xsjμsj)(Xtkμtk)]

Stationary Time Series

There may exist a sort of regularity over time in the behavior of a time series.

Strictly Stationarity

When k=1, we can conclude that random variables are identically distributed and mean is constant regardless of time:

P(Xsc)=P(Xtc)=P(X1c),c,s,tE[Xs]=E[Xt]=μ

When k=2, we can conclude that for any pairs of s,t, the joint distribution is identical regardless of time:

P(Xsc1,Xtc2)=P(Xs+hc1,Xtc2),h,c1,c2

(X1,X2),(X3,X4),(X8,X9) are identically distributed

(X2,X4),(X3,X5),(X8,X10) are identically distributed

Thus, if the variance function of the process exists, we have autocovariance function of the series {Xt} satisfies:

γ(s,t)=γ(s+h,t+h)

Same conclusions for all possible values of k. The version of strictly stationarity is too strong for most applications and it is difficult to assess strict stationarity from a single data set.

Weak Stationarity

Thus, s,t,h:

  1. E[Xt]=E[Xs]=μ
  2. γ(s,s+h)=Cov(Xs,Xs+h)=Cov(X0,Xh)=γ(0,h) or γ(s,t)=γ(s+h,t+h)

From above, we can clearly see that a strictly stationary, finite variance, time series is also stationary. The converse is not true unless there are further conditions.

Several Properties:

  1. |γ(h)|Var[Xt]=γ(0)
  2. γ(h)=γ(h)
  3. ρxy(h)=ρyx(h)
  4. γ() is non-negative definite, for all positive integers N and vector <X1,...,XN>TRN: i=1Nj=1NXiγ(ij)Xj0

Gaussian Process

If a Gaussian Process is weakly stationary then it is also strictly stationary.

Linear Process

A Linear process Xt is a stationary process defined to be a linear combination of white noise variates Xt and is given by:

Xt=μ+j=ψjWtj,j=|ψj|<

The autocovariance function for linear process is:

γ(h)=σw2j=ψj+hψj

Estimation of Correlation

If a time series is stationary, the mean function is constant, so that we can estimate it by the sample mean:

X¯=1Nt=1NXt


The sum runs over a restricted range because Xt+h is not available for t+h>n. Thus, this estimator is a biased estimator of γ(h). The normalizing term 1N guarantees a non-negative definite function because autocovariance function is non-negative definite for stationary series, so it is preferred over 1Nh.


The sample autocorrelation function has a sampling distribution that allows us to assess whether the data comes from a completely random or white series or whether correlations are statistically significant at some lags.

Based on the property, we obtain a rough method of assessing whether peaks in ρ^(h) are significant by determining whether the observed peak is outside the interval 0±2N (95% of data should be within 2 s.e of a standard normal distribution).


Vector-Valued and Multidimensional Series

Consider the notion of a vector time series Xt=<Xt1,....,Xtp>TRp that contains p univariate time series. For the stationary case, the mean vector μ is:

μ=E[Xt]=<μt1,....,μtp>T

And the p×p autocovariance matrix is denoted as:

Γ(h)=E[(Xt+hμ)(Xtμ)T] γij(h)=E[(Xt+h,iμi)(Xt,jμj)],i,j=1,....,p

Since, γij(h)=γji(h):

Γ(h)=ΓT(h)

The sample autocovariance matrix is defined as:

Γ^(h)=1Nt=1Nh(Xt+hX¯)(XtX¯)T Γ^(h)=Γ^T(h)

Where X¯=1Nt=1NXt is the sample mean vector.

In many applied problems, an observed series may be indexed by more than one time alone. For example, the position in space of an experimental unit might be described by two coordinates. We may proceed in these cases by defining a multidimensional process (does not have multiple dependent variables as multivariate case)Xs as a function of the r×1 vector s=<s1,...,sr>T, where si denotes the coordinate of the ith index. The autocovariance function of a stationary multidimensional process, Xs can be defined as a function of the multidimensional lag vector, h=<h1,...,hr>T as:

γ(h)=E[(Xs+h)μ)(Xsμ)]

Where μ=E[Xs]

The multidimensional sample autocovariance function is defined as:

γ^(h)=(S1S2...Sr)1s1....sr(Xs+hX¯)(XsX¯)

Where each summation has range 1siSihi,i=1,...,r

X¯=(S1...Sr)1s1...srXs

Where each summation has range 1siSi,i=1,...,r

The multidimensional sample autocorrelation function follows:

ρ^(h)=γ^(h)γ^(0)

EDA

In general, it is necessary for time series data to be stationary, so averaging lagged products over time will be a sensible thing to do (fixed mean). Hence, to achieve any meaningful statistical analysis of time series data, it will be crucial that the mean and the autocovariance functions satisfy the conditions of stationarity.

Trend

Detrend

The easiest form of nonstationarity to work with is the trend stationary model wherein the process has stationary behavior around a trend. We define this as:

Xt=μt+Yt

Where Xt are the observations, μt denotes the trend, and Yt is a stationary process. Strong trend will obscure the behavior of the stationary process Yt. Hence, there is some advantage to removing the trend as a first step in an exploratory analysis of such time series. The steps involved are to obtain a reasonable estimate of the trend component, and then work with the residuals:

Y^t=Xtμ^t

μ^=11.2+0.006tY^t=Xt+11.20.006t

Differencing

Differencing can be used to produce a stationary time series. The first difference operator is a linear operator denoted as:

Xt=XtXt1

If μt=β1+β2t, then:

Xt=(μt+Yt)(μt1+Yt1)=β2+YtYt1Zt=Xtβ2

Where Zt=YtYt1 is a stationary time series given Yt is a stationary time series.

One advantage of differencing over detrending to remove trend is that no parameters are estimated in the differencing operation. One disadvantage, however, is that differencing does not yield an estimate of the stationary process Yt as detrending. If an estimate of Yt is essential, then detrending may be more appropriate. If the goal is to coerce the data to be stationarity, then differencing is more appropriate.

Backshift Operator

Backshift Operator is linear. We can rewrite first difference as:

Xt=XtXt1=(1B)Xt

And second difference as:

2(Xt)=(Xt)=(1B)2Xt

Transformations

If a time series presents nonstationary as well as nonlinear behavior, transformations may be useful to equalize the variability over the length of a single series. A particular useful transformation is the log transformation:

Yt=logXt$

Which tends to suppress larger fluctuations that occur over portions of the series where the underlying values are larger.

Smoothing

Smoothing is useful in discovering certain traits in a time series, such as long-term trend and seasonal components.

Moving Average Smoother

If Xt represents the observations, then

Mt=j=kkajxtj

Where aj=aj0 and j=kkaj=1 is a symmetric moving average of the data centered at Xt.

Kernel Smoothing

Kernel smoothing is a moving average smoother that uses a weight function or kernel to average the observations:

f^t=i=1Nwi(t)Xi

wi(t)=K(tib)j=1NK(tjb)

Where b is a bandwidth parameter, the wider the bandwidth, the smoother the result. K() is a kernel function that is often the normal kernel:

K(z)=12πexp(z22)

Lowess and Nearest Neighbor Regression

Another approach to smooth a time series is nearest neighbor regression. The technique is based on k-nearest neighbors linear regression, wherein one uses the neighbouring data {Xtk2,...,Xt,...,Xt+k2} to predict Xt using a linear regression. The result is f^t.

Lowess is a method of smoothing that is complex but similar to nearest neighbor regression:

  1. A certain proportion of nearest neighbors to Xt are included in a weighting scheme, values closer to Xt in time get more weight.
  2. A robust weighted regression is used to predict Xt and obtain the smoothed estimate of ft.

Visualization

Lagged Scatterplot Matrices (Non-linearity and Lag correlation)

In the definition of the ACF, we are essentially interested in relations between Xt and Xth. The autocorrelation function tells us whether a substantial linear relation exists between the series and its own lagged values. The ACF gives a profile of the linear correlation at all possible lags and shows which values of h leads to the best predictability. The restriction of this idea to linear predictability, however, may mask a possible nonlinear relation between current values and past values. Thus, to check for nonlinear relationship of this form, it is convenient to display a lagged scatterplot matrix.

The plot displays values St on the vertical axis plotted against Sth on the horizontal axis.The sample autocorrelations are displayed in the upper right-hand corner and superimposed on the sctterplots are locally weighted scatterplot smoothing lines (LOWESS) that can be used to help discover any nonlinearities.

Ref

Time Series Analysis and Its Applications With R Examples by Robert H.Shumway and David S.Stoffer

https://www.math-stat.unibe.ch/e237483/e237655/e243381/e281679/files281692/Chap13_ger.pdf