Date Published: October 2, 2018
Publisher: Public Library of Science
Author(s): Won Seob Oh, Sanghyun Yoon, Juhwan Noh, Jungwoo Sohn, Changsoo Kim, Joon Heo, Taulant Muka.
Geographical variations and influential factors of disease prevalence are crucial information enabling optimal allocation of limited medical resources and prioritization of appropriate treatments for each regional unit. The purpose of this study was to explore the geographical variations and influential factors of cardiometabolic disease prevalence with respect to 230 administrative districts in South Korea. Global Moran’s I was calculated to determine whether the standardized prevalences of cardiometabolic diseases (hypertension, stroke, and diabetes mellitus) were spatially clustered. The CART algorithm was then applied to generate decision tree models that could extract the diseases’ regional influential factors from among 101 demographic, economic, and public health data variables. Finally, the accuracies of the resulting model–hypertension (67.4%), stroke (62.2%), and diabetes mellitus (56.5%)–were assessed by ten-fold cross-validation. Marriage rate was the main determinant of geographic variation in hypertension and stroke prevalence, which has the possibility that married life could have positive effects in lowering disease risks. Additionally, stress-related variables were extracted as factors positively associated with hypertension and stroke. In the opposite way, the wealth status of a region was found to have an influence on the prevalences of stroke and diabetes mellitus. This study suggested a framework for provision of novel insights into the regional characteristics of diseases and the corresponding influential factors. The results of the study are anticipated to provide valuable information for public health practitioners’ cost-effective disease management and to facilitate primary intervention and mitigation efforts in response to regional disease outbreaks.
The geographical variations and influential factors of diseases have been intensively studied in recent years [1–12]. Although recent studies dealt with various kinds of diseases on different scales (i.e. international, national, regional, and local), the common main purpose has been the investigation of the behaviors, conditions, and/or exposures that decisively influence disease incidence or prevalence . Providing reliable and timely information related to disease outbreaks, these studies have the potential to be utilized in augmenting existing etiologic hypotheses and finding undiscovered casual chains in the pathogenesis of diseases, thereby helping to effectively accomplish primary prevention or mitigation of diseases in the public health field . Certainly, epidemiologists, public health practitioners, and medical researchers can refer to this knowledge when initiating regional health promotion programs, prioritizing appropriate treatments specifically required in their communities, and concentrating resources for evidence-based interventions.
In the present study, we attempted to explore the geographical variations and influential factors for hypertension, stroke, and diabetes mellitus in 230 administrative districts in South Korea. As a result of spatial autocorrelation analysis, all three diseases showed statistically significant spatial autocorrelation. Then, decision tree models of each disease were generated using CART and a pruning algorithm. After assessing model accuracy with ten-fold cross-validation, positive and negative influential factors of the diseases were presented, and some important insights were derived from factor analysis. However, there are some issues conducting statistical analysis of geographical data. Classical problem called modifiable areal unit problem (MAUP) which significantly impacts the result, should be considered. The MAUP was first identified by . Its idea is that, the statistical results using same basic data in the same study area can be different when the study area is aggregated in different ways. However, in this study we only focused on the determination of influence factors based on 230 administrative districts in South Korea.
This study highlights significances in four perspectives. First, this study provided comparative results on the geographical distributions of three different diseases in 230 administrative districts in South Korea. Second, geographic properties were considered in classifying the tertile prevalence groups of the given diseases and in identifying corresponding influential regional factors. Third, statistical data was exhaustively collated from the most representative, highly regarded community-based and cross-sectional public health survey in South Korea. Finally, data-mining techniques were utilized to identify the latent and underlying influential factors of cardiometabolic diseases, avoiding bias from the well-documented knowledge about the diseases.