Research Article: Genetic Drift Assessment of Cancer Cell Lines in Pharmacogenomic Studies

By: Benison Zerrudo

Genetic variation can be linked to pharmacological response using cell line model systems, but biomedical research faces dilemma when using these cell lines due to unstable genes and misidentification. Genomic instability can influence the reproducibility of research experiments that use cancer cell lines. These issues are common, and guidelines has been presented ensuring authentication and integrity of cell lines in every part of a research. Measuring genomic integrity of these cancer cell lines throughout a research study can be very expensive. This study validates the cancer cell line identity in three pharmacogenic studies with screening for genetic drift. Researchers also developed a toolkit called cancer cell line identification or CCLid that enables screening of cell line genomic profiles. This topic can supplement the understanding as to why the need for personalized cancer treatment exist, and that it reinforces the idea that every cancer is unique. Everyday, researchers understand better as to why cancers are not the same and a one-size-fits-all treatment is increasingly unrealistic (Wanner, 2015). The struggle in fighting cancer not only exist at the patient level but also in research.

Cellosaurus is a knowledge resource to document all cell lines in biomedical research (Bairoch, 2018). It can be used as a reference to address problems related to uncertain annotated cell lines and their biomedical research usage. Regardless of these resources, genetic drift and mismatch repair deficiencies mostly affect short tandem repeats. There is a lack of available resources to monitor cell line stability in research studies.

Cancer is a disease caused by the division of abnormal cells in an uncontrollable manner and its hallmark includes unstable genes. Gene instability is especially increased in cancer cell lines which affects the genomic profile stability throughout the study. Structure variations such as increased base pair mutation, microsatellite instability, and chromosomal number change are all definitions of genomic instability (Al-Sohaily et al., 2012). How the instability initiates remain unclear.

The scale of genetic drift was investigated between 106 cell lines and found to have 19 percent inconsistency in mutations that have observable effect and 26 percent inconsistency in copy-number variants (Ben-David et al., 2018). Copy number variation occurs when regions of the genome are repeated and the number of repeats in the genome is different between individuals (McCarroll and Altshuler, 2007). MCF-7 (breast cancer), A549 (adenocarcinomic human alveolar basal epithelial), and HeLa (immortal) cell lines were analyzed for genetic stability and show that subclonal population with different genetic backgrounds exist at the cell bank level. More analyses of these cell lines reveal genetic drift amplification due to subculturing and interlaboratory effects. This genetic instability can lead to several functional effects including inconsistent structural formation, change in tissue growth, various pattern in gene expression, irregular responses to reduced toxic stress, and significant influence on drug response. Cell STRAINER, a strain instability profile, was developed to allow users to differentiate their cell line copy-number profile against Cancer Cell Line Encyclopedia and mark disagreeing regions.

In addition to Cell STRAINER, this study contributes to the genetic stability issue by investigating the identity and stability of more than 1000 cell lines used over few of the largest studies in pharmacogenomics. Technique used includes single-nucleotide polymorphism allelic-fraction method that allows to take advantage of data from RNA sequencing in addition to single-nucleotide polymorphism array within each dataset to evaluate for genetic stability.

The study analyzed two arrays, Affymetrix Genome-Wide Human SNP and HumanOmni2.5-Quad, from cell lines described from the pharmacogenomic research studies. Cellosaurus was used to linked cell lines with their IDs using functions from CCLid. Affymetrix SNP Array files were preprocessed and analyzed using GenomeWideSNP_6 Annotations. Based on poorly resolved genotype clusters and bias toward restriction enzyme, samples and datasets were removed according to Affymetrix guidelines. Frequency of B-allele and Log2 Ratios were exported from the EaCoN package. EaCoN package was modified and applied measures to export total and allele specific Log2 Ratios. Human Omni Array raw files were preprocessed using GenomeStudio. CnvPartition CNV Analysis Plugin was used to convert signal intensities to log-R ratios and frequencies of B-allele which were then inputted into the EaCoNand fed to ASCAT to create copy-number profiles. Human genome and transcript annotation GENCODE were used to align the raw FASTQ files.

The consistency between genotypes of two samples was computed. Pairwise cell line comparison was performed to obtain the summation of Euclidean distances between the frequencies of B-alleles for each single-nucleotide polymorphism. The limit of detection was measured by selecting single-nucleotide polymorphism subset from the scattered single-nucleotide polymorphism and retrained a regression model per test. If the predicted similarity is lower than the q-value of 0.05, two cell lines were considered genotypically closely similar. The q-value was compared to the annotations of cell line to build a confusion matrix for precision-recall estimations.

Karyotypes between samples were compared. Consistency between copy-number profiles were calculated using B-allele frequency method and Log2 ratio method. In B-allele frequency method, the standard deviation for each segment region were calculated and used to create a z-score threshold of +4 to -4 to determine significant drifts. In Log2 ratio method, the standard deviation was calculated by using the Euclidean distance between probe set Log2 ratios that occupy that region. A z-score threshold of +2 and -2 to determine significant drifts. Chromosomal instability score was calculated for comparing phenotypes. Total genomic fraction drift estimates together with the chromosomal instability score were used to obtain the Pearson correlation with the ABC metrics for each drug used.

Cancer cell lines were first authenticated, and significant number of consistencies were observed between cell lines with only few misidentification or unfamiliar cell line pair relationship. During genetic drift detection, the study found that almost all cell lines exhibit karyotypic drift using log-R ratio (LRR) but few drifts using B-allele frequency (BAF). The LRR drift estimate analysis revealed highly sensitive to the presence of subclonal population and genetic diversity within a cell line population. Intra-institutional genetic drift was analyzed by comparing the drift profiles generated in BAF from RNA-seq data with profiles generated from single-nucleotide polymorphism array. After the comparison, the same drifted and non-drifted regions were observed in at least 85% of the genome for 90% of the overlapping samples. The overall results indicate that RNA-sequence data can conclude genetic drift, and that genetic variation issue is common throughout research studies.

Abnormal number of chromosomes can cause chromosomal instability, and the correlation between chromosomal instability and genetic drift between genetically similar cell lines was investigated. The total genomic region of the drifted regions was calculated across all the cell lines and found no relationship between the genetic drift level and chromosomal instability score. However, significant relationship between chromosomal instability scores and the variance of total genetic drift was observed. This indicate that chromosomal instability can be a marker to predict cell lines to drift more than normal.

Issues regarding pharmacological responses in genetically similar cell lines were analyzed to see if its correlated to genetic drift or chromosomal instability. Unfortunately, very few samples were used for this part of the study and the drugs used were not found in one of the datasets. Moreover, one comparison did not indicate significant relationship between genetic drift/chromosomal instability and drug response inconsistency. As a result, information regarding correlation between drug response and genetic drift/chromosomal instability was obtained but limited.

To address the limited resource for genotype-based authentication of cell lines, this study developed the CCLid toolkit. This resource uses the resulting sample by B-allele frequency matrix for cell identity prediction, cross-contamination resolution, and genetic drift estimation. Researchers can still make sure of their cell line identity and genetic drift without the genomic data despite of the flow of transcriptomic profiling. CCLid may benefit researchers by validating cell line genetic identity and using in longitudinal research designs making sure genetic integrity is observed along the process of research.

Cell line comparison across the three largest pharmacogenomic studies revealed inconsistencies such as discordant genotype despite the same cell line annotation and isogenic cell lines previously described as different. Genetic drift was observed prevalent in more than 1000 unique cancer cell lines, but drifts were due to sub clonality and noise. Variable responses to drugs possibly caused by genetic drift in isogenic cell lines was demonstrated from previous work (Ben-David et al., 2018). This study revealed intra-/inter-institutional cell line genetic drift which can affect the result of pharmacogenomic analysis. Unstable chromosome can lead to shifts in the genomic landscape across each subculture (Thompson and Compton, 2008), and that the same treatment used during research studies may not work with an actual cancer patient. The study reinforces the need for genetic stability measures in research. Cell line authentication and handling are supported by existing guidelines but detecting and monitoring genetic profile stability do not have protocols. Proposal from this study explains the additional use of stability screener such as Cell STRAINER and CCLid to maintain cell line identity and stability. These tools can ensure the integrity of cancer cell lines in future research. This research not only contributed to the genetic instability issues, but also gave a better understanding as to why cancer is different in every case. At the research level, scientists are struggling to fight a nemesis with many faces. Genetic drift is prevalent in cancer cell lines and that the resulting phenotypic traits are mostly not beneficial to research studies. The issue of reproducing cancer experiments tells us that genetic drift may have occurred and that the cell line used in an experiment may not be the same cells in future experiments. About 90% of cancer publications are not reproducible which reflects the diversity in the evolutionary pathways in tumor development (Wen et. Al., 2018). The results contribute to our understanding in evolution by understanding a mechanism in cancer cell lines. Cancer development is an evolutionary process which reflects the evolution of species. Genetic drift can change cellular phenotype or evolve cell line identity which may cause issues with future studies using the same environment or drug treatment.


Al-Sohaily, Sam et al. (2012). Molecular pathways in colorectal cancer. Journal of gastroenterology and hepatology vol. 27,9: 1423-31. doi:10.1111/j.1440-1746.2012.07200.x

Bairoch, A. (2018). The Cellosaurus, a cell-line knowledge resource. Journal of Biomolecular Techniques. 29 ([Epub ahead of print): 1–14. doi:10.7171/jbt.18-2902-002

Ben-David, Uri et al. (2018). Genetic and transcriptional evolution alters cancer cell line drug response. Nature vol. 560,7718: 325-330. doi:10.1038/s41586-018-0409-3

Darlington, A., Bates, D. (2020). Architectures for Combined Transcriptional and Translational Resource Allocation Controllers. Cell Systems. Accessed October 23, 2020.

McCarroll, S.A., Altshuler D.M. (2007). Copy-number variation and association studies of human disease. Nature Genetics. 39 (7 Suppl): S37-42. Accessed October 23, 2020.

Quevedo, R., Smirnov, P., Tkachuk, D., et. Al. (2020). Assessment of Genetic Drift in Large Pharmacogenomic Studies. Cell Systems Vol 11, Issue 4, P393-401.E2.

Thompson, S.L., Compton, D.A. (2008). Examining the link between chromosomal instability and aneuploidy in human cells. The Journal of cell biology, 180(4), 665–672.

Wanner, M. (2015). Why is Cancer so Difficult to Cure? The Jackson Laboratory. Accessed October 23, 2020.

Wen, Haijun et al. (2018). On the low reproducibility of cancer studies. National science review vol. 5,5: 619-624. doi:10.1093/nsr/nwy021


Leave a Reply

Your email address will not be published.