Date Published: July 29, 2019
Publisher: Public Library of Science
Author(s): Jonathan Iyandemye, Marshall P. Thomas, Wolfgang Glanzel.
Open access publication rates have been steadily increasing over time. In spite of this growth, academics in low income settings struggle to gain access to the full canon of research literature. While the vast majority of open access repositories and funding organizations with open access policies are based in high income countries, the geographic patterns of open access publication itself are not well characterized. In this study, we developed a computational approach to better understand the topical and geographical landscape of open access publications in the biomedical research literature. Surprisingly, we found a strong negative correlation between country per capita income and the percentage of open access publication. Open access publication rates were particularly high in sub-Saharan Africa, but vastly lower in the Middle East and North Africa, South Asia, and East Asia and the Pacific. These effects persisted when considering papers only bearing authors from within each region and income group. However, papers resulting from international collaborations did have a higher percentage of OA than single-country papers, and inter-regional collaboration increased OA publication for all world regions. There was no clear relationship between the number of open access policies in a region and the percentage of open access publications in that region. To understand the distribution of open access across topics of biomedical research, we examined keywords that were most enriched and depleted in open access papers. Keywords related to genomics, computational biology, animal models, and infectious disease were enriched in open access publications, while keywords related to the environment, nursing, and surgery were depleted in open access publications. This work identifies geographic regions and fields of research that could be priority areas for open access advocacy. The finding that open access publication rates are highest in sub-Saharan Africa and low income countries suggests that factors other than open access policy strongly influence authors’ decisions to make their work openly accessible. The high proportion of OA resulting from international collaborations indicates yet another benefit of collaborative research. Certain applied fields of medical research, notably nursing, surgery, and environmental fields, appear to have a greater proportion of fee-for-access publications, which presumably creates barriers that prevent researchers and practitioners in low income settings from accessing the literature in those fields.
Open access (OA) describes materials that are free to access and read online, either through publisher websites or through publication repositories. It seems self-evident that OA publication maximizes the benefits of scientific findings for researchers, funders, and the public . Some OA advocates now argue that all research publications should be openly accessible by default  and that access to knowledge stemming from research should be considered a fundamental human right . In keeping with this, many government agencies, private foundations, and universities have concluded that the results of research they support should be openly accessible and have adopted mandates and policies to support OA publication. This has been accompanied by steady growth in OA repositories . The most common routes to OA publication are either “gold” open access, which refers to papers that are made immediately available from the publisher under a creative commons license, or “green” OA papers, which are deposited by authors or publishers in a public repository. In addition, a large fraction of the literature is also made available on publisher’s websites without an explicit OA license. Most funders that have OA policies mandate that authors deposit papers in repositories, thus promoting the green OA publication route, but some have more recently established policies intended to promote gold OA . Although there is evidence that OA policies and compliance efforts have increased OA publication , OA policies that promote green OA can place the burden of compliance upon authors, who may misunderstand OA or the policies.
We used PubMed to identify a set of papers that matched specified search criteria. MEDLINE indexes journals that cover a broad array of topics, so we limited our search to papers in the biological sciences and medicine using MeSH terms and MeSH headings. The exact search term was: (“2015/01/01”[Date—Publication]: “2015/12/31″[Date—Publication]) AND (((Health Care category [mh]) OR (Psychiatry and Psychology category [mh])) OR (“Education”[MeSH Term]) OR (“Biological Science Disciplines”[MeSH Term])). These search criteria were designed to return a large body of literature, but restrict results to work in the biomedical sciences or medical education and exclude work in related fields, such as physics, mathematics, and the humanities (all of which also have MeSH terms). MeSH headings are hierarchical, and PubMed returns all papers that are below a given term in the hierarchy by default , so this search returned a very large volume of papers, comprising approximately 63% of all MEDLINE-indexed papers for 2015. We downloaded all of the PubMed IDs returned, then used the Entrez e-utilities to extract MeSH terms, digital object identifier (DOI), and affiliation metadata for each paper. We used Unpaywall  to identify the OA status of each paper, using the DOI to identify the paper. Unpaywall comprehensively tracks OA publications by compiling open access status from a wide variety of resources, institutional repositories, and databases. The affiliation strings were split into substrings using regular expressions, and we used a two-tiered approach to identify the country named in the substrings. We first identified countries of affiliation by their names, abbreviations, or major cities named in the affiliations. If this failed to yield a result, we submitted the affiliation substring to the google maps geocoding API . For analysis of world economies and world regions, we used World Bank data from 2015 . For analysis of enriched and depleted MeSH terms, we split terms into individual words, and tabulated all instances of each word. Word frequencies were normalized to total word counts, and words that were more than 33.33% enriched or depleted in OA papers relative to non-OA papers were considered. Words that appeared more than 4,000 times in the full set of words extracted from MeSH terms were considered for this portion of the analysis. All of the data extraction and processing was done with Python and the code is openly accessible on Github at https://github.com/iyandemye/oa_project.
To better understand the geographic and topical trends of OA publishing in biomedical research, we developed a computational pipeline that identifies authors’ country of affiliation from the text of affiliations provided by PubMed. We used PubMed to identify MEDLINE-indexed papers focused on biomedical sciences or medicine published in 2015. We ran the analysis on a small random sample (250 papers) of the full dataset and manually checked affiliations. In this sample, the method had a sensitivity of 99.7% and a false discovery rate of 0.5%. In the full dataset, of 643,138 papers matching the search criteria, 579,853 (90.2%) listed a text affiliation, and we identified at least one affiliation in 571,033 of these papers (98.5%). We identified OA status for 583,937 of the papers (90.8%). In total, we were able to establish OA status and at least one country of affiliation for 547,404 papers (85.1% of the full set of papers).
This work highlights unappreciated complexities in the geography of OA publication. The percentage of OA publication was highest in low income countries and particularly in sub-Saharan Africa, which has few OA policies and repositories, suggesting that factors in addition to OA policies play a major role in authors’ decisions to publish OA papers. We also observed a consistent effect of international and inter-regional collaboration: papers with authors based in multiple countries or regions had a substantially higher percentage of OA publication than their single-country and single-region counterparts.