Machine learning for estimating catastrophic health spending in disaster-affected, data-scarce settings (ESCoE DP 2026-05)

cube-no-animation-1

Machine learning for estimating catastrophic health spending in disaster-affected, data-scarce settings (ESCoE DP 2026-05)

By Rozana Himaz, Dimitra Salmanidou, Saman Ghaffarian

Go to next section

Abstract

Natural hazard events can increase out-of-pocket health costs and push vulnerable households into poverty. Mitigation measures require understanding changes in health spending patterns using pre- and post-event data, but such data are often unavailable in disaster-affected settings. This represents a fundamental measurement challenge: the absence of pre-event baseline data makes it impossible to construct the counterfactual quantities needed for welfare analysis.

To address this measurement problem, we develop a hybrid machine learning approach to estimate unobserved household health spending using longitudinal survey data from Indonesia. We first develop a model around the 2006 Yogyakarta earthquake, for which complete data are available. The model learns spending patterns across income, hazard intensity, and other characteristics, achieving >70% accuracy in a noisy and complex domain.

After testing the model for transportability, we apply it to post-2004 Indian Ocean tsunami survey data in Indonesia, to predict plausible baseline health spending. These predictions are used to evaluate the impact of the tsunami on health spending to reveal that without targeted aid, catastrophic health spending would have increased from 4.5% to 29.4% and that moderately damaged households experienced more cost increases than heavily damaged ones.

By combining artificial intelligence with 2 household survey data, our framework is a proof-of-concept, for addressing data gaps in official economic statistics, demonstrating how machine learning can enable counterfactual welfare measurement where conventional data collection is absent or incomplete.