Research Article: Comparing the validity of different ICD coding abstraction strategies for sepsis case identification in German claims data

Date Published: July 30, 2018

Publisher: Public Library of Science

Author(s): Carolin Fleischmann-Struzek, Daniel O. Thomas-Rüddel, Anna Schettler, Daniel Schwarzkopf, Angelika Stacke, Christopher W. Seymour, Christoph Haas, Ulf Dennler, Konrad Reinhart, Ashham Mansur.


Administrative data are used to generate estimates of sepsis epidemiology and can serve as source for quality indicators. Aim was to compare estimates on sepsis incidence and mortality based on different ICD-code abstraction strategies and to assess their validity for sepsis case identification based on a patient sample not pre-selected for presence of sepsis codes.

We used the national DRG-statistics for assessment of population-level sepsis incidence and mortality. Cases were identified by three previously published International Statistical Classification of Diseases (ICD) coding strategies for sepsis based on primary and secondary discharge diagnoses (clinical sepsis codes (R-codes), explicit coding (all sepsis codes) and implicit coding (combined infection and organ dysfunction codes)). For the validation study, a stratified sample of 1120 adult patients admitted to a German academic medical center between 2007–2013 was selected. Administrative diagnoses were compared to a gold standard of clinical sepsis diagnoses based on manual chart review.

In the validation study, 151/937 patients had sepsis. Explicit coding strategies performed better regarding sensitivity compared to R-codes, but had lower PPV. The implicit approach was the most sensitive for severe sepsis; however, it yielded a considerable number of false positives. R-codes and explicit strategies underestimate sepsis incidence by up to 3.5-fold. Between 2007–2013, national sepsis incidence ranged between 231-1006/100,000 person-years depending on the coding strategy.

In the sample of a large tertiary care hospital, ICD-coding strategies for sepsis differ in their accuracy. Estimates using R-codes are likely to underestimate the true sepsis incidence, whereas implicit coding overestimates sepsis cases. Further multi-center evaluation is needed to gain better understanding on the validity of sepsis coding in Germany.

Partial Text

Acknowledging that sepsis is the leading cause of death from infection and affecting more than 30 million patients globally [1], the World Health Organization declared the prevention, diagnosis and management of sepsis as leading priority in its member states [2, 3]. For most countries, population-level sepsis incidence and mortality rates remain unknown, thus this resolution urges to implement measures of specific epidemiologic surveillance and “to apply and improve the use of the International Classification of Diseases system to establish the prevalence and profile of sepsis” [2]. In the US and several European countries, estimates on sepsis incidence are commonly drawn from retrospective studies based on hospital claims data using different International Classification of Diseases (ICD) codes for case identification [4, 5]. Administrative data are also increasingly used to compare risk-adjusted mortality rates in different conditions between health care providers [6–8]. In Germany, both the Initiative for Quality Medicine [9] and the German Quality Network Sepsis [10] provide their participating hospitals quality indicators on hospital mortality based on diagnostics-related-groups (DRG) data. Various ICD combinations emerged which attempt to capture sepsis patients. While explicit coding approaches use ICD codes for septicemia/sepsis [11], implicit coding approaches link infection and organ dysfunction codes to mirror clinical sepsis criteria [5]. Depending on the underlying codes, estimates on sepsis incidence and mortality differ considerably [12]. Efforts were made to validate ICD case identification strategies compared to a gold standard of manual patient chart review [13]. These studies have shown good specificity, but poor sensitivity of sepsis coding in claims data [14]. Recent population based studies from Scandinavia suggest that ICD abstraction may result in an up to 6-fold underestimation of the incidence of traditional severe sepsis and also in comparison to the newly proposed sepsis definition designated as “sepsis-3” [15] compared to medical record review [16, 17]. Existing validation studies mostly rely on small sample sizes, selective populations and restrict their review to charts selected based on the presence or absence of relevant ICD sepsis codes [13], an important source of bias, since sensitivity could not reliably be estimated by this approach. This is why we aimed compare validity of different ICD 10 code abstraction regarding a gold standard of manual patient chart review not pre-selected for presence of sepsis codes. Furthermore, no data exists in Germany on the variations in incidence and mortality due to different identification of sepsis cases in administrative data. Above that, it is yet unknown how patients identified using new “sepsis-3” definitions will be coded in administrative claims, perhaps impacting estimates of sepsis epidemiology. Further aims of this study were therefore to compare ICD code abstraction strategies’ estimates on sepsis incidence and mortality by using German hospital claims data, and to assess the concordance of “sepsis-3” definitions with cases identified by retrospective chart review.

Using manual patient chart review as gold standard in a single center validation study in a large tertiary care hospital in Germany, we found that current ICD abstraction strategies differ substantially in their accuracy of sepsis case identification in administrative data. There is a trade-off between sensitivity and positive predictive value across different strategies. Explicit coding strategies have a better positive prediction compared to implicit coding strategies, but show a limited sensitivity and may miss a relevant number of sepsis cases in administrative data. R-codes and explicit sepsis coding strategies may underestimate sepsis incidence by 3.5-fold and 3-fold, respectively. Severe sepsis incidence rates may also be underestimated by 2.2-fold and 1.4-fold when using R-codes or explicit strategy, whereas implicit strategies risk overestimation by 2.7-fold. Our findings are in accordance with the results of studies from Denmark, Sweden and the US, which found an underestimation of severe sepsis cases in administrative data [4, 16, 17]. In a recent population based study from Sweden [17], a minority of 15.6% of patients with clinically diagnosed sepsis according to “sepsis-3” was coded as sepsis. In the US, only 30.5% of sepsis cases identified by “sepsis-3” criteria in eletronic health records had an explicit sepsis code [4].




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments