Impact Of Random Forest and Ensemble Methods on Infection Trend Forecasting: A Quantitative Evaluation Using Global Post Covid-19 Data
DOI:
https://doi.org/10.63125/b0yw2q91Keywords:
Random Forest, Ensemble Methods, Infection Forecasting, Machine Learning, Post-COVID DataAbstract
This study quantitatively evaluated the impact of Random Forest and ensemble methods on infection trend forecasting using global post-COVID-19 datasets. A longitudinal analytical framework was applied to 18,720 time-series observations across 52 countries, incorporating key variables such as daily infection cases, vaccination rates (mean = 64.2%), mobility indices (mean deviation = −18.5%), and policy stringency scores (mean = 57.3). The findings indicated that both Random Forest and ensemble methods significantly outperformed baseline models in predictive accuracy. Random Forest achieved a root mean square error of 2,145, mean absolute error of 1,620, mean absolute percentage error of 12.8%, and coefficient of determination of 0.87. Ensemble methods demonstrated superior performance with lower error values, including a root mean square error of 1,980, mean absolute error of 1,480, mean absolute percentage error of 11.3%, and a higher coefficient of determination of 0.91. Regional analysis showed that ensemble methods consistently produced lower mean absolute error values, ranging from 1,290 in Oceania to 1,710 in Africa, compared to Random Forest values ranging from 1,430 to 1,890. Temporal analysis revealed that forecasting accuracy improved by approximately 15% during stable transmission phases, with root mean square error decreasing from 2,480 during outbreak periods to 1,720 under stable conditions. Statistical testing confirmed significant differences between models (p < 0.05), with effect sizes ranging from 0.49 to 0.68, indicating moderate to strong practical significance. Overall, the results demonstrated that ensemble methods provided enhanced stability, reduced prediction variance by approximately 11.6%, and improved generalizability across heterogeneous datasets, while Random Forest maintained strong adaptability in capturing complex nonlinear relationships in global infection trends.