Research Article: Optimization by Adaptive Stochastic Descent

Date Published: March 16, 2018

Publisher: Public Library of Science

Author(s): Cliff C. Kerr, Salvador Dura-Bernal, Tomasz G. Smolinski, George L. Chadderdon, David P. Wilson, Lars Kaderali.

http://doi.org/10.1371/journal.pone.0192944

Abstract

When standard optimization methods fail to find a satisfactory solution for a parameter fitting problem, a tempting recourse is to adjust parameters manually. While tedious, this approach can be surprisingly powerful in terms of achieving optimal or near-optimal solutions. This paper outlines an optimization algorithm, Adaptive Stochastic Descent (ASD), that has been designed to replicate the essential aspects of manual parameter fitting in an automated way. Specifically, ASD uses simple principles to form probabilistic assumptions about (a) which parameters have the greatest effect on the objective function, and (b) optimal step sizes for each parameter. We show that for a certain class of optimization problems (namely, those with a moderate to large number of scalar parameter dimensions, especially if some dimensions are more important than others), ASD is capable of minimizing the objective function with far fewer function evaluations than classic optimization methods, such as the Nelder-Mead nonlinear simplex, Levenberg-Marquardt gradient descent, simulated annealing, and genetic algorithms. As a case study, we show that ASD outperforms standard algorithms when used to determine how resources should be allocated in order to minimize new HIV infections in Swaziland.

Partial Text

Consider a human H who is attempting to minimize a nonlinear objective function, E = f(x), by manually adjusting parameters in the vector x. H typically begins with a uniform prior regarding which parameters to vary, and chooses step sizes that are a fixed fraction (e.g., 10%) of the initial parameter values. H will then pseudorandomly choose one or more parameters to adjust. Every time a parameter xi is found to reduce E, the probability that H will select xi in the future increases; conversely, if changes in xi are not found to improve E, the probability that H will select xi decreases (formally, H forms “hunches” about which parameters are “good”). H also adaptively adjusts the step size based on the information H obtains about the curvature of parameter space with respect to each dimension (e.g., if ΔE/Δxi ≈ const. over multiple iterations, H will increase the step size). Despite its drawbacks, the adaptive nature of manual parameter fitting makes it a remarkably powerful method.

Consider an objective function E = f(x), where E is the scalar error (or other quantity) to be minimized (or maximized) and x = [x1, x2, …, xn] is an n-element vector of parameters. There are 2n possible directions j to step in: an increase or decrease in the value of each parameter. Associated with each parameter xi are (a) two initial step sizes: sj=si+ or si-, which define the step size in the directions of increasing or decreasing xi, respectively (i.e., si+>0 and si-<0); and (b) two initial probabilities: pj=pi+ or pi-, which define the likelihood of selecting direction j (for a uniform prior, pj = 1/2n—satisfying the requirement that ∑p=∑j=12npj=1). Thus, the vectors s and p have length 2n.

This section describes several modifications to the basic algorithm that may make it more suitable for a broader range of optimization problems.

Here we compare ASD to four standard optimization methods: the Nelder-Mead nonlinear simplex algorithm [30], Levenberg-Marquardt gradient descent [31], simulated annealing [32], and a genetic algorithm [33]. All methods were implemented in MATLAB 2012b (The MathWorks, Nantick, MA), via the Optimization Toolbox functions “fminsearch”, “lsqnonlin”, “simulannealbnd”, and “ga”, respectively. These algorithms are also available in the “optimize” module of the Python package SciPy via “minimize(method = ’Nelder-Mead’)”, “leastsq()”, and “anneal()”, respectively (genetic algorithms are not available in SciPy, but are available via other modules). We chose these methods to compare against since, like ASD, they have relatively simple implementations and relatively few metaparameters that need to be specified.

In contrast to the foregoing theoretical discussion of error minimization for analytical functions, here we describe the practical application that ASD was designed for: finding the allocation of resources across different HIV prevention and treatment programs that minimizes new infections [36]. To do this, we used the Optima HIV model (formerly known as Prevtool [15]) to perform the analyses. An overview of this version of the model is presented in S1 Appendix, with further details provided in [14]. Subsequent modifications to the model have been described in [37], and the most recent version of the software can be accessed via hiv.optimamodel.com.

This paper presents a simple optimization method inspired by the process of manual parameter fitting that is capable of outperforming traditional algorithms for certain classes of problems. The algorithm is most effective for problems with moderate to large dimensionality (≥5 dimensions), which corresponds to the case in which there are enough parameters that different parameters are likely to have substantially different overall contributions to the objective function. Indeed, the relative uniformity of parameters in the simple test functions used here (in terms of both scale and effectiveness) does not necessarily reflect certain real-world situations in which some—or even most—of the objective function’s parameters may have little influence on its value. In such situations, as with the real-world example of HIV budget allocations, ASD is especially effective, as it is able to adapt to those parameters (and those scales) that produce the greatest improvements in the objective function. An example of this is provided in Fig 4, where ASD finds what appears to be the globally optimal solution more than 10 times faster than any other algorithm. In contrast, ASD is less effective for optimization problems where the objective function has large discontinuties or numerous local minima; for such problems, evolutionary algorithms typically provide superior performance [39].

 

Source:

http://doi.org/10.1371/journal.pone.0192944

 

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments