Research Article: On the Identification of Associations between Five World Health Organization Water, Sanitation and Hygiene Phenotypes and Six Predictors in Low and Middle-Income Countries

Date Published: January 26, 2017

Publisher: Public Library of Science

Author(s): Hugh Ellis, Erica Schoenberger, Robert K. Hills.


According to the most recent estimates, 842,000 deaths in low- to middle-income countries were attributable to inadequate water, sanitation and hygiene in 2012. Despite billions of dollars and decades of effort, we still lack a sound understanding of which kinds of WASH interventions are most effective in improving public health outcomes, and an important corollary–whether the right things are being measured. The World Health Organization (WHO) has made a concerted effort to compile comprehensive data on drinking water quality and sanitation in the developing world. A recent 2014 report provides information on three phenotypes (responses): Unsafe Water Deaths, Unsafe Sanitation Deaths, Unsafe Hygiene Deaths; two grouped phenotypes: Unsafe Water and Sanitation Deaths and Unsafe Water, Sanitation and Hygiene Deaths; and six explanatory variables (predictors): Improved Sanitation, Unimproved Water Source, Piped Water To Premises, Other Improved Water Source, Filtered and Bottled Water in the Household and Handwashing.

Regression analyses were performed to identify statistically significant associations between these mortality responses and predictors. Good fitted-model performance required: (1) the use of population-normalized death fractions as opposed to number of deaths; (2) transformed response (logit or power); and (3) square-root predictor transformation. Given the complexity and heterogeneity of the relationships and countries being studied, these models exhibited remarkable performance and explained, for example, about 85% of the observed variance in population-normalized Unsafe Sanitation Death fraction, with a high F-statistic and highly statistically significant predictor p-values. Similar performance was found for all other responses, which was an unexpected result (the expected associations between responses and predictors–i.e., water-related with water-related, etc. did not occur). The set of statistically significant predictors remains the same across all responses. That is, Unsafe Water Source (UWS), Improved Sanitation (IS) and Filtered and Bottled Water in the Household (FBH) were the only statistically significant predictors whether the response was Unsafe Sanitation Death Fraction, Unsafe Hygiene Death Fraction or Unsafe Water Death Fraction. Moreover, the fraction of variance explained for all fitted models remained relatively high (adjusted R2 ranges from 0.7605 to 0.8533). We find that two of the statistically significant predictors–Improved Sanitation and Unimproved Water Sources–are particularly influential. We also find that some predictors (Piped Water to Premises, Other Improved Water Sources) have very little explanatory power for predicting mortality and one (Other Improved Water Sources) has a counterintuitive effect on response (Unsafe Sanitary Death Fraction increases with increases in OIWS) and one predictor (Hand Washing) to have essentially no explanatory usefulness.

Our results suggest that a higher priority may need to be given to improved sanitation than has been the case. Nevertheless, while our focus in this paper is mortality, morbidity is a staggering consequence of inadequate water, sanitation and hygiene, and lower impact on mortality may not mean a similarly low impact on morbidity. More specifically, those predictors that we found uninfluential for predicting mortality-related responses may indeed be important when morbidity is the response.

Partial Text

According to the most recent estimates, 842,000 deaths in low- to middle-income countries were attributable to inadequate water, sanitation and hygiene in 2012 [1]. This figure represented 58% of total deaths attributed to diarrheal disease which, in turn, constituted an estimated 1.5% of the total Global Burden of Disease (GBD) [1, 2]. It is a notable reduction from the estimated 88% of total deaths attributed to diarrheal disease related to inadequate WASH in 2000. Diarrheal deaths as a whole fell from an estimated 2.2 million in 2000 to 1.5 million in 2012 [1–4].

Without response and predictor transformation, with number of deaths as response and including People’s Democratic Republic of the Congo (DRCongo), an OLS regression model explains essentially no observed variance and contains no statistically significant explanatory variables. Its F statistic is approximately one (and is not significant at the 0.05 level) providing further evidence that in this model, there are no significant predictors. These results occur for all modeled responses. When response is Unsafe Sanitation Death Fraction, dividing death count by country population (scaled response) we see a modest improvement in fit (R2 = 0.1165, F = 4.165) with three predictors (IS, UWS, PWTP) emerging as significant. DRCongo is an extreme outlier and also possesses high leverage [27] such that its removal increases R2 to 0.6404, F increases to 43.45 with a different subset of predictors becoming significant (IS and HW). Transforming response necessitates a modification to the data (zero deaths are computationally problematic) by either: (1) setting zero deaths to a small number (e.g., 1); or, (2) removing zero death countries from the analysis. We prefer approach 2 because we find zero death counts implausible. When a power transformation is performed for response USDF, R2 increases to 0.8344 with a concomitant increase in F to 102.6 (the corresponding results for approach 1 are 0.7022 and 57.58 as shown in §A in S1 File–run012). IS becomes the sole significant predictor. Transforming predictors (square root) yields a model with slightly improved R2 and F (0.8533, 117.3) and yet again a different set of significant predictors (IS, UWS, FBH).

Piped Water to Premises (PWTP) and handwashing (HW) had little value in predicting any mortality response. Good fitted model performance required: (1) the use of population-scaled death fractions as opposed to death totals; (2) transformed response (logit or power); and (3) predictor transformation (square root). The best models passed diagnostic tests for normality of residuals, linearity between predictors and response, and constant error variance, and exhibited remarkable performance given the heterogeneity of the countries involved and the complexity of the relationships between response and predictors. In the case of population-normalized Unsafe Sanitation Death fraction as response, the model explained about 85% of the observed variance in, with a high F-statistic and highly statistically significant predictor p-values. Two predictors—Improved Sanitation and Unimproved Water Sources–were most responsible for good model performance. Piped Water to Premises (PWTP) and hand washing (HW) had little value in explaining Unsafe Sanitation Death Fraction variance–results that were consistent across responses.




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments