qEI with QMCPy

Quasi-Monte Carlo methods (QMC) are a valuable tool for sampling random variables in a structured fashion; this allows for computing key statistics of random variables more efficiently than with i.i.d. sampling.  Such quantities can play fundamental roles in larger algorithms, making their efficient computation fundamental to practical implementations of numerous applications.  This was the motivation for creating the QMCPy library.  In this post, we demonstrate the use of QMC methods in computing a key quantity in Bayesian optimization.

Bayesian optimization

Bayesian optimization (also called many other names, including sequential model based optimization or Gaussian process optimization) is a broad class of algorithms which involves alternately building statistical models of scattered data and optimally sampling for further data to try to optimize a given function.  To conduct this optimal sampling process, a strategy must be applied to take the statistical model and appropriately balance a desire to explore the optimization domain against a desire to exploit the high performing values found thus far and search in their proximity.  We refer to this strategy as the acquisition function.

Probably the most popular acquisition function is Expected Improvement (EI), which is widely used both because of its closed form (for a Gaussian process surrogate model) and because of its effectiveness.  When trying to push beyond the sequential model of Bayesian optimization to allow for parallelism, the EI acquisition function can be redefined for optimally choosing the next q points at which to sample.  This qEI acquisition function (first defined here, with more recent discussion here), unfortunately, does not have a simple closed form, and is generally computed through Monte Carlo estimation.

qEI estimation

We consider a simple problem to demonstrate the value of QMC in estimating this qEI quantity.  The left panel of the figure below depicts 5 noisy observations of a 1D function.  We fit a Gaussian process (GP) model to this data, and provide examples of posterior draws in the center panel.

Figure 1: left: 5 observed points, drawn from a function, with error bars displayed.  center: 128 quasi-random posterior draws from a Gaussian process fit to the observed points.  right: The qEI associated with $q=2$ points given this Gaussian process and the observed points; note the symmetry across the line $y=x$, which occurs because the points will be sampled at the same time.

The right panel of the figure above depicts the qEI quantity for $q=2$ future points to be sampled; qEI is defined (for a maximization problem) with a $q$ dimensional integral as

\[ \qEI(x_1,\ldots,x_q) = \int_{\RR^q} \max_{1\leq i\leq q}(y_i – y^*)_+ p_{Y_{x_1,\ldots, x_q}}(y_1, \ldots, y_q)\; \text{d}y_1\cdots\text{d}y_q, \]

where $Y_{x_1,\ldots, x_q}$ is the joint posterior GP distribution at $x_1,\ldots, x_q$ and $y^*$ is the best value observed thus far.  Note the symmetric nature of the qEI — since both of the $q=2$ points will be evaluated simultaneously, the designation of “first” and “second” is arbitrary.  The highest qEI value occurs when sampling near $x_1=0$ and $x_2=.6$, points which are near the highest valued draws of the posterior.

This integral is generally estimated using Monte Carlo (discussed here and here, among other places) with i.i.d. draws.  Using Quasi-Monte Carlo, we can reach the same level of accuracy with fewer GP posterior draws.  The figure below shows the results of using different quasi-random sequences, available in QMCPy, to estimate qEI for $q=5$ next points, located at $0.158,  0.416,  0.465,  0.718,  0.935$.

Contrast of the convergence of iid Monte Carlo methods versus QMC methods
Figure 2: Monte Carlo estimation, computed both using QMCPy’s i.i.d. sampler as well as numpy’s normal sampler, are compared to QMC methods available in QMCPy.  The Sobol’ and Lattice methods perform comparably, as expected, and converge at roughly $O(N^{-1})$, in contrast to the $O(N^{-1/2})$ convergence of i.i.d. sampling.  Integrals were estimated 50 times with different random seeds: the medians (solid lines) and interquartile ranges (shaded regions) are plotted.

The integrals present in BO are another example of a computation which benefits from QMC.  More efficient acquisition function computation gives us the ability to conduct more complicated strategies in a feasible amount of time.  Check out our docs, or contact us, to learn more about the QMCPy project and how QMC can help your work.

Website | + posts

I work at SigOpt, a SF startup, developing and deploying Bayesian optimization tools for customers in finance, AI, consumer goods and general research and development.

Published by Michael McCourt

I work at SigOpt, a SF startup, developing and deploying Bayesian optimization tools for customers in finance, AI, consumer goods and general research and development.

Leave a Reply

%d bloggers like this: