Date Published: January 1, 2010
Publisher: Public Library of Science
Author(s): Nicole Creanza, Jason S. Schwarz, Joel E. Cohen, Robert C. Fleischer. http://doi.org/10.1371/journal.pone.0008544
Abstract: Long-term influenza evolution has been well studied, but the patterns of sequence diversity within seasons are less clear. H3N2 influenza genomes sampled from New York State over ten years indicated intraseasonal changes in evolutionary dynamics. Using the mean Hamming distance of a set of amino acid or nucleotide sequences as an indicator of its diversity, we found that influenza sequence diversity was significantly higher during the early epidemic period than later in the influenza season. Diversity was lowest during the peak of the epidemic, most likely due to the high prevalence of a single dominant amino acid sequence or very few dominant sequences during the peak epidemic period, corresponding with rapid expansion of the viral population. The frequency and duration of dominant sequences varied by influenza protein, but all proteins had an abundance of one distinct sequence during the peak epidemic period. In New York State from 1995 to 2005, high sequence diversity during the early epidemic suggested that seasonal antigenic drift could have occurred primarily in this period, followed by a clonal expansion of typically one clade during the peak of the epidemic, possibly indicating a shift to neutral drift or purifying selection.
Partial Text: The three human influenza pandemics of the 20th century originated from avian strains that changed through mutation or reassortment of the RNA segments to produce a virus novel to human immune systems and capable of human-to-human transmission . The recent spread of the avian influenza strain H5N1 and the swine H1N1 strain and the resulting fear that small changes in these viruses could cause a devastating human pandemic have made the goal of understanding influenza evolution timely and important. To date, seasonal influenza has caused cumulatively more deaths than pandemic influenza , killing an estimated 30,000 Americans each year  and 250,000 to 500,000 worldwide . Over the past decade, many seasonal influenza genomes have been fully sequenced, providing a database for detailed evolutionary studies of the virus.
The database of H3N2 sequences from New York State was a valuable resource for close scrutiny of influenza dynamics. We utilized these data to test predictions about intraseasonal influenza evolution. Influenza nucleotide and protein sequence diversity was highest at the beginning of the epidemic; during the expansion of the epidemic (and of the influenza population), influenza diversity fell significantly (Tables 1 and 2). Based on phylogenies (Supplemental Figures S3-S4) and the prevalence of the dominant sequences (Figure 2), the rapid expansion of influenza during the epidemic peak consisted largely of the clonal expansion of one clade. The evolutionary dynamics behind this clonal expansion were unclear. Mathematical models of evolution within an influenza season predicted differences in sequence evolution under neutral versus non-neutral antigenic drift . When the assumption of neutrality was removed and the host’s immunity could affect viral fitness, models indicated that “significant amounts of antigenic drift can occur in the early phases of the epidemic when there are still relatively few infected hosts” and much of the antigenic drift in flu sequences would occur before the peak of the epidemic . It is possible that the increased sequence diversity in the early epidemic was somehow related to non-neutral drift during that period, perhaps across larger populations than we have investigated. Influenza could have shifted between two dominant processes, antigenic drift and purifying selection . In New York State from 1995 to 2005, antigenic drift could have occurred primarily in the early part of each season’s epidemic, followed by a clonal expansion in the population of typically one clade. We have no concrete indications that non-neutral drift was occurring early in the epidemic. Analysis of dN/dS ratios between periods was inconclusive, but these small groups of closely related sequences were not particularly amenable to analysis by dN/dS . The finer details of the early period of the epidemic, such as how many distinct sequences are circulating and how fast amino acid changes accumulate, could not be measured with the available data due to the low number of infections and inadequate sampling.