Research Article: The Importance of Species Name Synonyms in Literature Searches

Date Published: September 14, 2016

Publisher: Public Library of Science

Author(s): Gerald F. Guala, Wolfgang Arthofer.


The synonyms of biological species names are shown to be an important component in comprehensive searches of electronic scientific literature databases but they are not well leveraged within the major literature databases examined. For accepted or valid species names in the Integrated Taxonomic Information System (ITIS) which have synonyms in the system, and which are found in citations within PLoS, PMC, PubMed or Scopus, both the percentage of species for which citations will not be found if synonyms are not used, and the percentage increase in number of citations found by including synonyms are very often substantial. However, there is no correlation between the number of synonyms per species and the magnitude of the effect. Further, the number of citations found does not generally increase proportionally to the number of synonyms available. Users looking for literature on specific species across all of the resources investigated here are often missing large numbers of citations if they are not manually augmenting their searches with synonyms. Of course, missing citations can have serious consequences by effectively hiding critical information. Literature searches should include synonym relationships and a new web service in ITIS, with examples of how to apply it to this issue, was developed as a result of this study, and is here announced, to aide in this.

Partial Text

Latin binomials, the scientific names of biological species, are typically used as terms in searches for literature across the biological disciplines. However, because synonyms for those binomials often exist, the efficiency of individual binomials in guaranteeing retrieval of all relevant items can be greatly affected by the inclusion (or exclusion) of synonyms in the search. In biological nomenclature, synonyms are scientific names, other than the currently accepted one, that apply to an organism. The purpose of this study is to assess the importance of this effect in online searches of a popular cross-section of relevant scientific literature databases.

A set of paired query URLs for each resource [S1 File] (Plos, Scopus, PubMed and PMC) by kingdom [8] was generated from the new ITIS Solr web service [9]. These consisted of one query URL for each accepted or valid name in ITIS at the species rank with at least one synonym in ITIS and another URL for each accepted or valid name plus all of its linked synonyms in ITIS using exact phrase delimiters for each name string and an inclusive Boolean “OR” according to the specific requirements of each resource. The synonyms could include names at other ranks (e.g. subspecies and varieties) but not valid or accepted children. One set for each resource was generated to query all available fields. For PubMed an additional set was generated to query only the Title and Abstract, and for Scopus a set to query only the Abstract as well. Result reports for each resource were constrained to the smallest response deliverable in JSON that reported the number of citations obtained for the individual search being conducted. Actual text fields from the articles themselves were not downloaded other than in a test set of records for each resource to confirm search request fidelity to expected results. All URLs were batch submitted using GNU WGet version 1.16.3 [10] and required information was extracted from the JSON responses using simple javascript functions. Sample URL strings (minus API keys) are provided in the associated data for this paper.

Among all valid or accepted species names in ITIS, only those with synonyms (~17%) were used in this study, and among those, the number yielding any citations at all varied widely across kingdoms and sources. See supporting documentation. However, for those species that had synonyms, and citations in the literature, both the number of species affected by adding synonyms to the search string (See Table 1), and the number of citations returned (See Table 2), was generally substantial.

All of the major resources examined in this study are heavily used and support millions of visits a month [11]. Of course, only some fraction of those visits are for searches in which the user wants literature about a given species or set of species. However, it seems to be a reasonable assumption that there is a high interest if only because ITIS species names appear in at least 8 million citation records returned for only the searches of species with their synonyms in Scopus alone. Further, ITIS is neither taxonomically complete, nor does it include all possible synonyms where it is complete. And, of course, there can be other taxonomic views. However, with any complete taxonomic view, the relationships may change but the totality of names will not. The exact search goals of each user, and optimization of metrics for searches are of course a much broader question, but because a scientist can’t evaluate what isn’t known to exist, we must assume for the purposes of this paper that more results are better in this case. And it is clear that the inclusion of synonyms increases results. As shown here with both the minimal effect of large numbers of synonyms [S3 File], and the low correlation of number of synonyms to number of citations returned across all taxa and sources [S2 File], the relative value of the large investments needed to achieve exhaustive synonymies needs further justification if the goal is for efficient literature search.