In a moving average, observations within the window get this weight, while observations outside the window get this weight.
What is equal weight (1/k) inside the window and zero weight outside?
A time series has this property when its mean, variance, and autocorrelation structure remain constant over time.
What is stationarity?
SSM has two main equations: this one links observed data to hidden states, and this one describes how states evolve over time.
What are the observation equation and transition equation (or state equation)?
This factor determines how much you trust the new observation versus your model's prediction, acting as a weight between 0 and 1.
What is the Kalman Gain (Factor)?
This is the probability of observing the actual data given specific parameter values, answering "How likely is this data if the parameters were θ?"
What is likelihood?
This smoothing method assigns decaying weights to all past observations, with the most recent getting weight α and none ever receiving exactly zero weight.
What is exponential smoothing?
You apply this transformation—taking Y_t minus Y_{t-1}—to remove trends and restore stationarity, and the number of times you do this becomes the "d" in ARIMA(p, d, q).
What is differencing (or first differencing)?
These are the two main differences between ARIMA and SSM: ARIMA uses observed/unobserved variables and constant/time-varying parameters.
What is: ARIMA uses observed variables and constant parameters; SSMuses unobserved variables and time-varying parameters?
The time update does this to the state estimate, while the information update does this using the new observation.
What is: projects it forward using the transition equation (time update), and corrects it using the new observation (information update)?
We optimize this function instead of likelihood itself because it converts products to sums, preventing numerical underflow.
What is the log-likelihood function?
The Holt-Winters αβ-filter has these two equations, not just a one-level equation.
What are: (1) Level update: L_t = L_{t-1} + b_{t-1} + α·e_t, and (2) Trend update: b_t = b_{t-1} + β·e_t?
The model Y_t = φ₁Y_{t-1} + φ₂Y_{t-2} + ε_t + θε_{t-1} is identified as this ARIMA(p, d, q).
What is ARIMA(2, 0, 1)?
In the local level model, both Z_t and T_t equal this value, making the observed value equal to level plus noise.
What is 1 (scalar)?
High Kalman Gain (close to 1) means this about observation and prediction reliability, while low Kalman Gain means the opposite.
What is: observation is reliable and prediction is uncertain (high K), observation is noisy and prediction is reliable (low K)?
MLE algorithms can converge to local maxima instead of global maxima, fail to converge, or converge slowly if you don't have these.
What are good starting values?
The full Holt-Winters method with α, β, and γ parameters produces this type of pattern, following the trend but oscillating around it.
What is a cyclic trend (or seasonal pattern)?
PACF cuts off after lag p to identify this component, while ACF cuts off after lag q to identify this component.
What is AR(p) for PACF and MA(q) for ACF?
For the local level plus trend model, the state vector contains these two elements, and the transition matrix T_t has this specific structure.
What are [L_t, b_t]ᵀ (level and trend), and T_t = [[1, 1], [0, 1]]?
The notation t|(t-1) means this type of estimate, while t|t means this type, and (t-h)|t where 0<h<t means this type.
What are: predicted/prior estimate, filtered/updated estimate, and smoothed estimate?
Simulated Annealing is this type of method, while BFGS is this type of method.
What is derivative-free (Simulated Annealing) and derivative-based (BFGS)?
This practice combines predictions from multiple models rather than selecting one "best" model, and research shows it often outperforms even the best individual model.
What is forecast combination (or ensemble forecast)?
This is the forecasting equation for ARIMA(1, 1, 0): ŷ_{t+1} equals this expression in terms of Y_t, Y_{t-1}, and φ₁.
What is ŷ_{t+1} = (1 + φ₁)Y_t - φ₁Y_{t-1}?
For a linear regression model WITHOUT time-varying parameters in SSM, you set this matrix Q_t to this special value to ensure parameters never change.
What is Q_t = 0 (zero matrix)?
In Bayesian terms, the posterior equals this relationship between likelihood and prior
What is Posterior ∝ Likelihood × Prior?
In the context of State Space Models, this method requires iterative maximization and may have convergence problems.
What is Maximum Likelihood Estimation (MLE)?