Research Article: How many to sample? Statistical guidelines for monitoring animal welfare outcomes

Date Published: January 30, 2019

Publisher: Public Library of Science

Author(s): Jordan O. Hampton, Darryl I. MacKenzie, David M. Forsyth, Emmanuel Serrano Ferron.


There is increasing scrutiny of the animal welfare impacts of all animal use activities, including agriculture, the keeping of companion animals, racing and entertainment, research and laboratory use, and wildlife management programs. A common objective of animal welfare monitoring is to quantify the frequency of adverse animal events (e.g., injuries or mortalities). The frequency of such events can be used to provide pass/fail grades for animal use activities relative to a defined threshold and to identify areas for improvement through research. A critical question in these situations is how many animals should be sampled? There are, however, few guidelines available for data collection or analysis, and consequently sample sizes can be highly variable. To address this question, we first evaluated the effect of sample size on precision and statistical power in reporting the frequency of adverse animal welfare outcomes. We next used these findings to assess the precision of published animal welfare investigations for a range of contentious animal use activities, including livestock transport, horse racing, and wildlife harvesting and capture. Finally, we evaluated the sample sizes required for comparing observed outcomes with specified standards through hypothesis testing. Our simulations revealed that the sample sizes required for reasonable levels of precision (i.e., proportional distance to the upper confidence interval limit (δ) of ≤ 0.50) are greater than those that have been commonly used for animal welfare assessments (i.e., >300). Larger sample sizes are required for adverse events with low frequency (i.e., <5%). For comparison with a required threshold standard, even larger samples sizes are required. We present guidelines, and an online calculator, for minimum sample sizes for use in future animal welfare assessments of animal management and research programs.

Partial Text

There is increasing scrutiny of the animal welfare outcomes of animal use activities, including agriculture, the keeping of companion animals, racing and entertainment, research and laboratory use, and wildlife management programs [1]. Animal welfare is a young science [2] and, while containing philosophical elements necessitating qualitative and discussive studies, it does not have a strong statistical underpinning relative to other life sciences [3]. This weakness can hinder the robustness of efforts to provide regulatory oversight for aspects of animal welfare that are of societal concern. Research activities have oversight from institutional committees (e.g., Animal Ethics Committees; AECs), but there is often little monitoring of outcomes for operational activities [4]. The absence of statistical guidelines for collecting and analysing animal welfare data has led to intractable contention surrounding efforts to monitor industries such as sea transport ‘live export’ of livestock [5–8].

Tables 6 and 7 show the sample sizes required for a range of Type I error rates when the true probability of an adverse outcome is 0.05 and 0.01, respectively.

For previously untested research or animal management techniques, research trials (pilot studies or validation studies) are typically advised or required before approval is given for operational use [9]. The three Rs approach to minimising animal welfare impacts in research indicates that the number of animals affected should be minimised wherever possible [56, 57]. This logically leads to a compromise between statistical rigour (maximising the sample sizes in order to create greater confidence) and animal impacts (minimising the numbers of animals affected) for trials of novel animal manipulation techniques [58]. Hence, small sample sizes must be used according to the precautionary principle, and tolerance levels must be high in order for outcome data to exceed specified thresholds [59]. For this reason, many pilot studies use small sample sizes (e.g., ~10 animals) and often do not report confidence intervals for frequency data [29, 30, 50]. For example, in New Zealand, lethal traps for carnivorous and omnivorous mammals are required to pass animal welfare threshold tests designed from guidelines produced by ISO (the International Organization for Standardization) [58]. These specify that 90% confidence is required that insensibility occurs within 3 minutes in >70% of test animals [59]. Sample sizes used are typically 10–19 animals (Table 3). We suggest that in this context, smaller sample sizes could be used by allowing high tolerance.

The sample sizes used to estimate the frequency of adverse events in animal welfare studies have been high variable. The desired level of precision for the outcome(s) of interest should be reported in all publications, along with the required sample size(s). The guidelines presented here should be used to determine the number of animals to be sampled in order to estimate the proportion of adverse animal events with a desired level of precision.




Leave a Reply

Your email address will not be published.