Research Article: GenomeGraphR: A user-friendly open-source web application for foodborne pathogen whole genome sequencing data integration, analysis, and visualization

Date Published: February 28, 2019

Publisher: Public Library of Science

Author(s): Moez Sanaa, Régis Pouillot, Francisco Garcés Vega, Errol Strain, Jane M. Van Doren, Yung-Fu Chang.


Food safety risk assessments and large-scale epidemiological investigations have the potential to provide better and new types of information when whole genome sequence (WGS) data are effectively integrated. Today, the NCBI Pathogen Detection database WGS collections have grown significantly through improvements in technology, coordination, and collaboration, such as the GenomeTrakr and PulseNet networks. However, high-quality genomic data is not often coupled with high-quality epidemiological or food chain metadata. We have created a set of tools for cleaning, curation, integration, analysis and visualization of microbial genome sequencing data. It has been tested using Salmonella enterica and Listeria monocytogenes data sets provided by NCBI Pathogen Detection (160,000 sequenced isolates in 2018). GenomeGraphR presents foodborne pathogen WGS data and associated curated metadata in a user-friendly interface that allows a user to query a variety of research questions such as, transmission sources and dynamics, global reach, and persistence of genotypes associated with contamination in the food supply and foodborne illness across time or space. The application is freely available (

Partial Text

The implementation of Whole Genome Sequencing-based techniques (WGS) as a routine typing method for specific foodborne pathogens has significantly improved surveillance, increased the number of outbreaks being detected, shortened the time to detect them and the time to find their source [1–6].

To date efforts in WGS networks have focused primarily on how to discover the most complete and accurate set of nucleic acid sequences for outbreak identification [1]. This scope of outbreak identification can be fulfilled with non-standard source metadata, as a relatively small set of strains are considered for a given outbreak. Indeed, facing an outbreak, interdisciplinary outbreak investigative teams must carefully evaluate the genomic and epidemiologic links between each strain involved or potentially involved in the outbreak. They will need, as a minimum, the metadata that are provided in the NCBI database, but will surely complement this analysis with additional metadata or samples specifically collected from the field.

GenomeGraphR is a unique flexible application, that presents foodborne pathogen WGS data and their associated metadata designed to address a variety of research questions relating to, for example, transmission sources and dynamics, global reach, and persistence of genotypes associated with contamination in the food supply and foodborne illness across time or space. With the development of specific analytics to address missing values, bias, and non-representative sample/strain, these data may provide novel approaches to foodborne illness attribution at the population level and will provide critical data for exposure assessment and hazard characterization (dose-response) needed for risk assessment. Integration of isolate genome characteristics in GenomeGraphR in the future, including anti-microbial resistance, virulence, and persistence factors, will allow further discernment among genetically related strains, and will enhance the scope of research insights available from WGS integration for population level epidemiology research and food safety risk assessment.