Research Article: Generating and evaluating a propensity model using textual features from electronic medical records

Date Published: March 4, 2019

Publisher: Public Library of Science

Author(s): Zubair Afzal, Gwen M. C. Masclee, Miriam C. J. M. Sturkenboom, Jan A. Kors, Martijn J. Schuemie, Sreeram V. Ramagopalan.


Propensity score (PS) methods are commonly used to control for confounding in comparative effectiveness studies. Electronic health records (EHRs) contain much unstructured data that could be used as proxies for potential confounding factors. The goal of this study was to assess whether the unstructured information can also be used to construct PS models that would allow to properly deal with confounding. We used an example of coxibs (Cox-2 inhibitors) vs. traditional NSAIDs and the risk of upper gastro-intestinal bleeding as example, since this association is often confounded due to channeling of coxibs to patients at higher risk of upper gastro-intestinal bleeding.

In a cohort study of new users of nonsteroidal anti-inflammatory drugs (NSAIDs) from the Dutch Integrated Primary Care Information (IPCI) database, we identified all patients who experienced an upper gastrointestinal bleeding (UGIB). We used a large-scale regularized regression to fit two PS models using all structured and unstructured information in the EHR. We calculated hazard ratios (HRs) to estimate the risk of UGIB among selective cyclo-oxygenase-2 (COX-2) inhibitor users compared to nonselective NSAID (nsNSAID) users.

The crude hazard ratio of UGIB for COX-2 inhibitors compared to nsNSAIDs was 0.50 (95% confidence interval 0.18–1.36). Matching only on age resulted in an HR of 0.36 (0.11–1.16), and of 0.35 (0.11–1.11) when further adjusted for sex. Matching on PS only, the first model yielded an HR of 0.42 (0.13–1.38), which reduced to 0.35 (0.96–1.25) when adjusted for age and sex. The second model resulted in an HR of 0.42 (0.13–1.39), which dropped to 0.31 (0.09–1.08) after adjustment for age and sex.

PS models can be created using unstructured information in EHRs. An incremental benefit was observed by matching on PS over traditional matching and adjustment for covariates.

Partial Text

Electronic health records (EHRs) are primarily used for routine medical care, but secondary use of EHR data for observational research is becoming increasingly popular especially in studying of drug effects postmarketing [1]. In this era data is used to generate information on drug safety and effectiveness in a cost-efficient way and by exploiting actual care patterns, which differ largely from experimental settings [2–5]. In an experimental setting such as in randomized clinical trials, the choice for a treatment is randomized, which would take care of potential confounding by indication [6]. In actual care the treatment decision is usually influenced by measurable patient characteristics such as medical history, concomitant drug intake but also by personal prescriber preferences, which cannot be measured easily. This phenomenon of preferential prescribing is also known as channeling and may lead to confounding by indication [7,8]. A well-known example of channeling is the preference of doctors to prescribe selective cyclo-oxygenase-2 inhibitors (COX-2 inhibitors) over nonselective (ns) non-steroidal anti-inflammatory drugs (NSAIDs) to patients at risk of developing upper gastrointestinal bleeding (UGIB) [9,10], as the COX-2 inhibitors were developed on purpose to mitigate the GI effects of NSAIDs. Although clinical trials showed that COX-2 inhibitors are ‘safer’ than nsNSAIDs in relation to UGIB [11], observational studies showed no large differences between the rate of UGIB between COX-2 inhibitor and nsNSAIDs, possibly due to residual confounding by indications arising from channeling [12]. In order to obtain unbiased estimates in observational studies this confounding must be dealt with adequately. However, it is challenging to capture all relevant channeling factors in the EHR databases because information is not primarily recorded for research purposes. Moreover, relevant information may also be recorded in EHRs in an unstructured way [13,14].

In this study, we generated a propensity model using unstructured information from EHRs. We tested different methods to construct this and demonstrated the feasibility to do so as well as its performance. Since electronic health records are now widely available for secondary use, we need to develop methods and test performance of these methods for use in epidemiological evaluations such as drug effects.




Leave a Reply

Your email address will not be published.