Research Article: Using the sample maximum to estimate the parameters of the underlying distribution

Date Published: April 25, 2019

Publisher: Public Library of Science

Author(s): Alex Capaldi, Tiffany N. Kolba, Eugene Demidenko.


We propose novel estimators for the parameters of an exponential distribution and a normal distribution when the only known information is a sample of sample maxima; i.e., the known information consists of a sample of m values, each of which is the maximum of a sample of n independent random variables drawn from the underlying exponential or normal distribution. We analyze the accuracy and precision of the estimators using extreme value theory, as well as through simulations of the sampling distributions. For the exponential distribution, the estimator of the mean is unbiased and its variance decreases as either m or n increases. Likewise, for the normal distribution, we show that the estimator of the mean has negligible bias and the estimator of the variance is unbiased. While the variance of the estimators for the normal distribution decreases as m, the number of sample maxima, increases, the variance increases as n, the sample size over which the maximum is computed, increases. We apply our method to estimate the mean length of pollen tubes in the flowering plant Arabidopsis thaliana, where the known biological information fits our context of a sample of sample maxima.

Partial Text

Consider the scenario where one has obtained data where each observation is the maximum value of n independent, identically distributed random variables drawn from either an exponential distribution or a normal distribution with unknown parameters. That is, Xij∼iidExp(β) or Xij∼iidN(μ,σ2) for i = 1, …, n and known data is drawn from Yj=max{Xij}i=1n for j = 1, …, m. Here we present a process to estimate the mean β of the underlying exponential distribution or the mean μ and variance σ2 of the underlying normal distribution from only the set of Yj’s.

We begin by considering the case where the underlying distribution is exponential with unknown mean β. In Theorem 1 below, we propose an estimator for β and compute its expected value and variance.

We now consider the case where the underlying distribution is normal with unknown mean μ and unknown variance σ2. In Theorem 2 below, we propose estimators for μ and σ2 and analyze their expected value, while in Theorem 3 we analyze the variance of the estimators.

During fertilization in flowering plants, once pollen land on the stigma, the pollen will grow tubes that travel down through a transmitting tract from the stigma toward an ovule. Pollen compete against each other in a race towards the limited number of ovules to determine which pollen will father the seeds. The mean length of the population of pollen tubes at various time points is of interest to plant biologists, yet, to date, there are only measures of the lengths of the longest pollen tubes in such competitions [7]. Since the pollen tube lengths must have a positive value, it is reasonable to assume that the lengths follow an exponential distribution. Hence, our method described in Section 2 will allow the mean pollen tube length to be estimated given the structure of the experimental data.