From: Forecasting patient flows with pandemic induced concept drift using explainable machine learning
Method | Description |
---|---|
Benchmark | Current estimation method used in-house by the clinics which forecasts patient demand for a given day to be 5% higher than that of the same day in the previous year. |
Persistence Model | A benchmark model implemented as a Random Walk [35] method with the forecast being the same as the value for the same period of the previous year |
Enhanced Persistence Model | An optimised benchmark model that made forecasts based on the weighted mean value of autoregressive values in respect to time t with time lags of t-7, t-14, t-364, t-728 and t-1092. The weightings were optimised through an empirical approach and set as [5, 4, 3, 2, 1] respectively for each autoregressive feature, with features representing recency being allocated greater importance. |
ARIMA | Traditional autoregressive statistical technique, predicting future values based on past values. |
kNN Regression | A non-parametric algorithm that bases its predictions on the principle of proximity, producing a forecast that is an aggregation of k nearest observations with respect to the characteristics of the data point in question. |
Ridge Regression | A technique that creates a parsimonious model which shrinks the coefficients towards zero using L2 regularization. The resulting models generally reduce the variance resulting in an improved mean-squared error. |
Support Vector Machines Regression | SVR is an extension of SVMs. It uses a pre-defined kernel function to transform the data from a non-linear space to a higher dimension in order to find an approximate fit that satisfies a pre-determined error margin. To that end, the objective function of SVR is to reduce the coefficients rather than the error term (epsilon). SVRs are particularly effective on smaller datasets and are more robust to outliers. |
Kernel Ridge Regression | Kernel ridge regression extends Ridge Regression with the integration of the kernel trick technique from SVR. It differs to SVR in that it uses the squared error loss as opposed to the epsilon-insensitive loss in SVR, combined with l2 regularization. |
Prophet | Auto-tunable, additive forecasting model with the ability to handle non-linear trends using yearly, weekly, and daily seasonality with capabilities to integrate effects from holidays, having robustness to dislocations in trend. |
Random Forest Regression | Ensemble-based algorithm consisting of decision trees whose outputs are combined. Each decision tree is induced based on random feature subsets, resulting in an uncorrelated forest of trees. The combined accuracy of the forest results in a higher fidelity than that of any individual tree. |
CatBoost | CatBoost is an ensemble-based algorithm that generates gradient-boosted decision trees. During training, successive trees are induced with a reduction in loss. The size of the ensemble is preset by defining the maximum number of trees as a parameter. |
Voting Regressor | Ensemble-based meta-estimator. Combines machine learning and traditional time-series approaches. Initially generates models for the underlying base regressors: Prophet, CatBoost, Random Forest and ARIMA. It then combines the outputs of these algorithms for the final forecast using a weighted combination scheme. |
Averaging Model | This algorithm was a customised version of the Voting Regressor which combined the outputs of five algorithms (Prophet, CatBoost, Random Forest, Voting and Stacking) but discarded the highest and lowest predictions in the calculation of each prediction. |
Stacking | An ensemble-based meta-estimator which models the forecast outputs of the underlying base estimators (Prophet, CatBoost and Random Forest) using an overarching regressor whose output constitutes the final forecast. |