Research Article: Practices of research data curation in institutional repositories: A qualitative view from repository staff

Date Published: March 16, 2017

Publisher: Public Library of Science

Author(s): Dong Joon Lee, Besiki Stvilia, Hussein Suleman.


The importance of managing research data has been emphasized by the government, funding agencies, and scholarly communities. Increased access to research data increases the impact and efficiency of scientific activities and funding. Thus, many research institutions have established or plan to establish research data curation services as part of their Institutional Repositories (IRs). However, in order to design effective research data curation services in IRs, and to build active research data providers and user communities around those IRs, it is essential to study current data curation practices and provide rich descriptions of the sociotechnical factors and relationships shaping those practices. Based on 13 interviews with 15 IR staff members from 13 large research universities in the United States, this paper provides a rich, qualitative description of research data curation and use practices in IRs. In particular, the paper identifies data curation and use activities in IRs, as well as their structures, roles played, skills needed, contradictions and problems present, solutions sought, and workarounds applied. The paper can inform the development of best practice guides, infrastructure and service templates, as well as education in research data curation in Library and Information Science (LIS) schools.

Partial Text

The access and sharing of research data have been emphasized by the government [1], funding agencies [2–4] and scholarly communities [5,6]. The increased access to research data elevates the impact, efficiency, and effectiveness of scientific activities and funding opportunities. The access, however, is facilitated not just by appropriate policies, but also by the employment of effective infrastructure mechanisms, including enhancing data with effective metadata [7]. There is an increasing number of academic institutions that plan to provide research data services through their Institutional Repositories (IRs; [8,9]). Although many research universities already have operational IRs that provide open access to the digital content produced by the universities’ communities, only a small number of institutions provide research data services through their IRs [10].

The purpose of this descriptive study is to examine research data curation practices in IRs. Although there have been many previous studies of IRs [11,12,13,16], to the best of our knowledge, there has not been an in-depth, qualitative study to provide rich, detailed descriptions of data curation practices. This study addresses that need by examining data management and use activities in IRs, including the roles played, tools used, conventions and rules followed, contradictions among different components of the activities, and solutions sought to address those contradictions. Findings of the study can inform the design and planning of data curation services in IRs. The study’s findings also can guide IR curators’ training in LIS schools and professional organizations. The study examined the following research questions:

Three types of repositories (i.e., domain, discipline and institutional) exist [11]. The main difference between them is the granularity of the organizations that operate the repositories. For example, a chemistry community may develop a domain repository; a crystallography community may operate a disciplinary repository; and a university may run an institutional repository (IR). An IR may increase an institution’s name value by providing access to intellectual work produced by its communities [11,12,14]. This access, along with an emphasis on research data archiving and sharing from major funding agencies [2,3,15], motivates many institutions to put their efforts into the development of IRs and services for research data [8,16]. According to many researchers, IRs storing and curating research data can help reuse and repurpose the data [17] and increase the value and credibility of the data [11].

Studying the practices of research data curation requires multifaceted contextual analysis [29,38,43]. Hence, this study requires a research design that can examine the sociotechnical and cultural factors that may affect data curation. The study was guided by an analysis of the literature on data curation and Activity Theory [44,45]. In particular, Activity Theory was used to conceptualize the general context of data curation work in IRs. This context comprises a system of different work activities and their structures, including different roles (e.g., providers, users, curators), types of data, tools and skills needed, rules and policies used, and mediation relationships among those structures.

This study examined research data curation practices in IRs based on Activity Theory [44,45]. The study identified activities performed by IR staff, the activity context, and role-specific sets of the activities and skillsets in IRs. Based on the findings and discussion, we provide the following recommendations or curation knowledge that can benefit institutions that currently manage or plan to implement institutional data repositories.




