Research Article: Utilization of genetic data can improve the prediction of type 2 diabetes incidence in a Swedish cohort

Date Published: July 12, 2017

Publisher: Public Library of Science

Author(s): Hadi Zarkoob, Sarah Lewinsky, Peter Almgren, Olle Melander, Hossein Fakhrai-Rad, David Meyre.


The aim of this study was to measure the impact of genetic data in improving the prediction of type 2 diabetes (T2D) in the Malmö Diet and Cancer Study cohort. The current study was performed in 3,426 Swedish individuals and utilizes of a set of genetic and environmental risk data. We first validated our environmental risk model by comparing it to both the Finnish Diabetes Risk Score and the T2D risk model derived from the Framingham Offspring Study. The area under the curve (AUC) for our environmental model was 0.72 [95% CI, 0.69–0.74], which was significantly better than both the Finnish (0.64 [95% CI, 0.61–0.66], p-value < 1 x 10−4) and Framingham (0.69 [95% CI, 0.66–0.71], p-value = 0.0017) risk scores. We then verified that the genetic data has a statistically significant positive correlation with incidence of T2D in the studied population. We also verified that adding genetic data slightly but statistically increased the AUC of a model based only on environmental risk factors (RFs, AUC shift +1.0% from 0.72 to 0.73, p-value = 0.042). To study the dependence of the results on the environmental RFs, we divided the population into two equally sized risk groups based only on their environmental risk and repeated the same analysis within each subpopulation. While there is a statistically significant positive correlation between the genetic data and incidence of T2D in both environmental risk categories, the positive shift in the AUC remains statistically significant only in the category with the lower environmental risk. These results demonstrate that genetic data can be used to increase the accuracy of T2D prediction. Also, the data suggests that genetic data is more valuable in improving T2D prediction in populations with lower environmental risk. This suggests that the impact of genetic data depends on the environmental risk of the studied population and thus genetic association studies should be performed in light of the underlying environmental risk of the population.

Partial Text

Type 2 diabetes, the most common form of diabetes, is a rising healthcare problem worldwide. The number of people affected with type 2 diabetes has risen significantly over the past 30 years. The global prevalence of diabetes among adults over 18 years of age has increased from 4.7% in 1980 to 8.5% in 2014. This resulted in 1.5 million deaths due to diabetes, making it the eighth leading cause of death [1].

The characteristics of the 3,426 individuals from the MDC-CC used in this study are shown in Tables 1 and 2. Among the 21 environmental RFs and 154 genetic RFs that were utilized by the RAE for its T2D assessment, data for only 13 environmental and 139 genetic RFs was available from the MDC-CC (Tables 3 and 4). The 139 genetic RFs (SNPs) span 19 LD blocks in people of European descent. The RAE has a ranking system that scores SNPs in each LD block and selects the highest scoring SNP with available data in each LD block for use in the risk assessment. In 18 of the covered 19 LD blocks the data for the highest scoring SNP was available for the majority of the individuals. In one of the LD blocks the data for the highest scoring SNP was not available and thus the data for the second highest scoring SNP was picked by RAE. For an individual to be included in the analysis they must have data for SNPs in all 19 LD blocks. The list of SNPs used in the risk assessment are presented in Table 5.

Our objective with this study was to measure the effect of utilizing both genetic and environmental data to predict the incidence of type 2 diabetes in a Swedish cohort. Frequently, studies of human health and common complex diseases have focused on identifying either genetic or environmental RFs that could explain variation in disease susceptibility. Type 2 diabetes is a multifactorial disease caused by both genetic and environmental RFs. Therefore, it’s important to have the ability to accurately measure the impact of each RF individually and in combination with other RFs.




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments