Research Article: A prediction model for advanced colorectal neoplasia in an asymptomatic screening population

Date Published: August 25, 2017

Publisher: Public Library of Science

Author(s): Sung Noh Hong, Hee Jung Son, Sun Kyu Choi, Dong Kyung Chang, Young-Ho Kim, Sin-Ho Jung, Poong-Lyul Rhee, John Green.


An electronic medical record (EMR) database of a large unselected population who received screening colonoscopies may minimize sampling error and represent real-world estimates of risk for screening target lesions of advanced colorectal neoplasia (CRN). Our aim was to develop and validate a prediction model for assessing the probability of advanced CRN using a clinical data warehouse.

A total of 49,450 screenees underwent their first colonoscopy as part of a health check-up from 2002 to 2012 at Samsung Medical Center, and the dataset was constructed by means of natural language processing from the computerized EMR system. The screenees were randomized into training and validation sets. The prediction model was developed using logistic regression. The model performance was validated and compared with existing models using area under receiver operating curve (AUC) analysis.

In the training set, age, gender, smoking duration, drinking frequency, and aspirin use were identified as independent predictors for advanced CRN (adjusted P < .01). The developed model had good discrimination (AUC = 0.726) and was internally validated (AUC = 0.713). The high-risk group had a 3.7-fold increased risk of advanced CRN compared to the low-risk group (1.1% vs. 4.0%, P < .001). The discrimination performance of the present model for high-risk patients with advanced CRN was better than that of the Asia-Pacific Colorectal Screening score (AUC = 0.678, P < .001) and Schroy’s CAN index (AUC = 0.672, P < .001). The present 5-item risk model can be calculated readily using a simple questionnaire and can identify the low- and high-risk groups of advanced CRN at the first screening colonoscopy. This model may increase colorectal cancer risk awareness and assist healthcare providers in encouraging the high-risk group to undergo a colonoscopy.

Partial Text

Colorectal cancer (CRC) is the third most common cancer in the world [1]. A colonoscopy is considered the preferred CRC screening modality [2]; however, adherence is generally not sufficient [3]. One of the barriers to CRC screening is a lack of perceived risk among the patients and primary care providers [4]. Risk stratification provides a rational strategy for facilitating appropriate CRC screening and can improve the distribution of resources. A prerequisite for this risk stratification approach is the accessibility of a precise risk assessment tool.

Big data can improve health by providing insights into public health, such as enhanced disease prediction and prevention. Using a big data analytics algorithm, we explored a large health screening examination database. The refined database with structured and unstructured data contained first screening colonoscopy and comprehensive health examination data from 49,450 patients. Big data can not only be applied for verifying alleged associations, but can also be used as a hypothesis-generating machine [24]. In this study, we generated a prediction model for advanced CRN, which might be the first trial for utilization of big data analytics in the field of gastroenterology. The final simplified prediction model was shown to have acceptable discriminative power for patients with advanced CRN. Our simple risk score using easily available information from the patient’s clinical questionnaire stratified asymptomatic patients into low- and high-risk groups for advanced CRN before a screening colonoscopy was performed. The discrimination performance of the developed model for high-risk patients with advanced CRN was better than that of existing models.