Articles An Exact Beta Generator by Peter Occil

emailx45

Бывалый
Staff member
Moderator
An Exact Beta Generator
Peter Occil - 13/Jul/2020
[SHOWTOGROUPS=4,20]

A sampler in Python for beta-distributed random numbers without floating-point arithmetic.
This article introduces a new sampler for beta-distributed random numbers. Here we discuss the Beta Distribution, and look at the Python code for my beta sampler, which includes class I wrote called "bernoulli.py", which collects a number of Bernoulli factories, and kthsmallest, which generates the 'k'th smallest 'bitcount'-bit uniform random number out of 'n' of them, is implemented in "randomgen.py." We also discuss the exact simulation of continuous distributions, and look at an example: The Continuous Bernoulli Distribution.
Introduction
This page introduces a new sampler for beta-distributed random numbers. Unlike any other specially-designed beta sampler I am aware of, this sampler—
  • avoids floating-point arithmetic, and
  • samples from the beta distribution (with both parameters 1 or greater) to an arbitrary precision and with user-specified error bounds (and thus is "exact" in the sense defined in (Karney 2014)(1)).
It takes advantage of a construct called the Bernoulli factory (Keane and O'Brien 1994)(2) (Flajolet et al., 2010)(3), which can simulate an arbitrary probability by transforming biased coins to biased coins, as well as the "geometric bag" technique by Flajolet and others (2010)(3), which generates heads or tails with a probability that is built up bit by bit. One important feature of Bernoulli factories is that they can simulate a given probability exactly, without having to calculate that probability manually, which is important if the probability can be an irrational number that no computer can compute exactly (such as pow(p, 1/2) or exp(-2)).
This page shows Python code for my new sampler.

About the Beta Distribution
The beta distribution is a bounded-domain probability distribution; its two parameters, alpha and beta, are both greater than 0 and describe the distribution's shape. Depending on alpha and beta, the shape can be a smooth peak or a smooth valley. The beta distribution can take on values in the interval [0, 1]. Any value in this interval (x) can occur with a probability proportional to—

pow(x, alpha - 1) * pow(1 - x, beta - 1). (1)
Although alpha and beta can each be greater than 0, this sampler only works if both parameters are 1 or greater.


Sampler Code
The following Python code relies on a class I wrote called "bernoulli.py", which collects a number of Bernoulli factories, some of which are relied on by the code below. This includes the "geometric bag" mentioned earlier, as well as a Bernoulli factory that transforms a coin that produces heads with probability p into a coin that produces heads with probability pow(p, y). The case where y is in (0, 1) is due to recent work by Mendo (2019)(4).

The Python code also relies on a method I wrote called kthsmallest, which generates the kth smallest number in an arbitrary precision. This will be described later in this page.

This code is far from fast, though, at least in Python.
Code:
import math
import random
import bernoulli
from randomgen import RandomGen
from fractions import Fraction

def _toreal(ret, precision):
        # NOTE: Although we convert to a floating-point
        # number here, this is not strictly necessary and
        # is merely for convenience.
        return ret*1.0/(1<<precision)

def betadist(b, ax, ay, bx, by, precision=53):
        # Beta distribution for alpha>=1 and beta>=1
        bag=[]
        bpower=Fraction(bx, by)-1
        apower=Fraction(ax, ay)-1
        # Special case for a=b=1
        if bpower==0 and apower==0:
           return random.randint(0, (1<<precision)-1)*1.0/(1<<precision)
        # Special case if a and b are integers
        if int(bpower) == bpower and int(apower) == apower:
           a=int(Fraction(ax, ay))
           b=int(Fraction(bx, by))
           return _toreal(RandomGen().kthsmallest(a+b-1,a, \
                  precision), precision)
        # Create a "geometric bag" to hold a uniform random
        # number (U), described by Flajolet et al. 2010
        gb=lambda: b.geometric_bag(bag)
        # Complement of "geometric bag"
        gbcomp=lambda: b.geometric_bag(bag)^1
        bPowerBigger=(bpower > apower)
        while True:
           # Create a uniform random number (U) bit-by-bit, and
           # accept it with probability U^(a-1)*(1-U)^(b-1), which
           # is the unnormalized PDF of the beta distribution
           bag.clear()
           r=1
           if bPowerBigger:
             # Produce 1 with probability (1-U)^(b-1)
             r=b.power(gbcomp, bpower)
             # Produce 1 with probability U^(a-1)
             if r==1: r=b.power(gb, apower)
           else:
             # Produce 1 with probability U^(a-1)
             r=b.power(gb, apower)
             # Produce 1 with probability (1-U)^(b-1)
             if r==1: r=b.power(gbcomp, bpower)
           if r == 1:
                 # Accepted, so fill up the "bag" and return the
                 # uniform number
                 ret=_fill_geometric_bag(b, bag, precision)
                 return ret

def _fill_geometric_bag(b, bag, precision):
        ret=0
        lb=min(len(bag), precision)
        for i in range(lb):
           if i>=len(bag) or bag[i]==None:
              ret=(ret<<1)|b.randbit()
           else:
              ret=(ret<<1)|bag[i]
        if len(bag) < precision:
           diff=precision-len(bag)
           ret=(ret << diff)|random.randint(0,(1 << diff)-1)
        # Now we have a number that is a multiple of
        # 2^-precision.
        return _toreal(ret, precision)

The kthsmallest Method
kthsmallest, which generates the 'k'th smallest 'bitcount'-bit uniform random number out of 'n' of them, is implemented in "randomgen.py" and relied on by this beta sampler. It is used when both a and b are integers, based on the known property that a beta random variable in this case is the ath smallest uniform (0, 1) random number out of a + b - 1 of them (Devroye 1986, p. 431)(5).


kthsmallest, however, doesn't simply generate 'n' 'bitcount'-bit numbers and then sort them. Rather, it builds up their binary expansions bit by bit, via the concept of "u-rands" (Karney 2014)(1). It uses the observation that each uniform (0, 1) random number is equally likely to be less than half or greater than half; thus, the number of uniform numbers that are less than half vs. greater than half follows a binomial(n, 1/2) distribution (and of the numbers less than half, say, the less-than-one-quarter vs. greater-than-one-quarter numbers follows the same distribution, and so on). Thanks to this observation, the algorithm can generate a sorted sample "on the fly".

The algorithm is as follows:
  1. Create n empty u-rands.
  2. Set index to 1.
  3. If index <= k:
    1. Generate LC, a binomial(n, 0.5) random number.
    2. Append a 0 bit to the first LC u-rands (starting at index) and a 1 bit to the next n - LC u-rands.
    3. If LC > 1, repeat step 3 and these substeps with the same index and n = LC.
    4. If n - LC > 1, repeat step 3 and these substeps with index = index+LC, and n = n - LC.
  4. Take the kth u-rand (starting at 1) and fill it with uniform random bits as necessary to make a bitcount-bit number. Return that u-rand.

[/SHOWTOGROUPS]
 

emailx45

Бывалый
Staff member
Moderator
[SHOWTOGROUPS=4,20]
Known Issues
The bigger alpha or beta is, the smaller the area of acceptance becomes (and the more likely random numbers get rejected by this method, raising its run-time). This is because max(u^(alpha-1)*(1-u)^(beta-1)), the peak of the density, approaches 0 as the parameters get bigger. One idea to solve this issue is to expand the density so that the acceptance rate increases. The following was tried:

  • Estimate an upper bound for the peak of the density peak, given alpha and beta.
  • Calculate a largest factor c such that peak * c = m < 0.5.
  • Use Huber's linear_lowprob Bernoulli factory (implemented in bernoulli.py) (Huber 2016)(6), taking the values found for c and m. Testing shows that the choice of m is crucial for performance.
But doing so apparently worsened the performance (in terms of random bits used) compared to the simple rejection approach.

Correctness Testing
To test the correctness of this sampler, the Kolmogorov–Smirnov test was applied with various values of alpha and beta and the default precision of 53, using SciPy's kstest method. The code for the test is very simple: kst = scipy.stats.kstest(ksample, lambda x: scipy.stats.beta.cdf(x, alpha, beta)), where ksample is a sample of random numbers generated using the sampler above. Note that SciPy uses a two-sided Kolmogorov–Smirnov test by default.

See the results of the correctness testing. For each pair of parameters, five samples with 50,000 numbers per sample were taken, and results show the lowest and highest Kolmogorov–Smirnov statistics and p-values achieved for the five samples. Note that a p-value extremely close to 0 or 1 strongly indicates that the samples do not come from the corresponding beta distribution.


Exact Simulation of Continuous Distributions on [0, 1]
The beta distribution is one case of a general approach to simulating continuous distributions with support on the interval [0, 1], and this with arbitrary precision, thanks to Bernoulli factories. This general approach can sample an n-bit binary expansion of a number following that continuous distribution, and is described as follows:
  1. Create a "geometric bag", that is, an "empty" uniform random number also known as a "u-rand".
  2. As the geometric bag builds up a uniform random number, accept the number with a probability that can be represented by Bernoulli factories, or reject it otherwise. As shown by Keane and O'Brien (2), this is possible if and only if the probability function, in the interval [0, 1]—
    • is continuous everywhere, and
    • either returns a constant value in [0, 1] everywhere, or returns a value in [0, 1] at each of the points 0 and 1 and a value in (0, 1) at each other point,
  3. and they give the example of 2*p as a probability function that cannot be represented by a Bernoulli factory.
  4. If the geometric bag is accepted, fill the unsampled bits of the bag with uniform random bits as necessary to make an n-bit number (for an example, see _fill_geometric_bag above).
The beta distribution's probability function at (1) fits these requirements (for alpha and beta both greater than 1), since it's continuous and never returns 0 or 1 outside of the points 0 and 1, thus it can be simulated by Bernoulli factories and is covered by this general approach.


An Example: The Continuous Bernoulli Distribution
The continuous Bernoulli distribution (Loaiza-Ganem and Cunningham 2019)(7) was designed to considerably improve performance of variational autoencoders (a machine learning model) in modeling continuous data that takes values in the interval [0, 1], including "almost-binary" image data.

The continous Bernoulli distribution takes one parameter lamda (a number in [0, 1]), and takes on values in the interval [0, 1] with a probability proportional to—

Code:
pow(lamda, x) * pow(1 - lamda, 1 - x).
Again, this function meets the requirements stated by Keane and O'Brien, so it can be simulated via Bernoulli factories. Thus, this distribution can be simulated in Python using a geometric bag (which represents x in the formula above) and a two-coin exponentiating Bernoulli factory, as follows:
Code:
def _twofacpower(b, fbase, fexponent):
    """ Bernoulli factory B(p, q) => B(p^q).
           Based on algorithm from (Mendo 2019),
           but changed to accept a Bernoulli factory
           rather than a fixed value for the exponent.
           To the best of my knowledge, I am not aware
           of any article or paper that presents this exact
           Bernoulli factory.
           - fbase, fexponent: Functions that return 1 if heads and 0 if tails.
             The first is the base, the second is the exponent.
             """
    i = 1
    while True:
        if fbase() == 1:
            return 1
        if fexponent() == 1 and \
            b.zero_or_one(1, i) == 1:
            return 0
        i = i + 1

def contbernoullidist(b, lamda, precision=53):
    # Continuous Bernoulli distribution
    bag=[]
    lamda=Fraction(lamda)
    gb=lambda: b.geometric_bag(bag)
    # Complement of "geometric bag"
    gbcomp=lambda: b.geometric_bag(bag)^1
    fcoin=b.coin(lamda)
    lamdab=lambda: fcoin()
    # Complement of "geometric bag"
    lamdabcomp=lambda: fcoin()^1
    acc=0
    while True:
       # Create a uniform random number (U) bit-by-bit, and
       # accept it with probability lamda^U*(1-lamda)^(1-U), which
       # is the unnormalized PDF of the beta distribution
       bag.clear()
       # Produce 1 with probability lamda^U
       r=_twofacpower(b, lamdab, gb)
       # Produce 1 with probability (1-lamda)^(1-U)
       if r==1: r=_twofacpower(b, lamdabcomp, gbcomp)
       if r == 1:
             # Accepted, so fill up the "bag" and return the
             # uniform number
             ret=_fill_geometric_bag(b, bag, precision)
             return ret
       acc+=1

Notes

(1) Karney, C.F.F., "Sampling exactly from the normal distribution", arXiv:1303.6257v2 [physics.comp-ph], 2014.
(2) Keane, M. S., and O'Brien, G. L., "A Bernoulli factory", ACM Transactions on Modeling and Computer Simulation 4(2), 1994.
(3) Flajolet, P., Pelletier, M., Soria, M., "On Buffon machines and numbers", arXiv:0906.5560v2 [math.PR], 2010.
(4) Mendo, Luis. "An asymptotically optimal Bernoulli factory for certain functions that can be expressed as power series." Stochastic Processes and their Applications 129, no. 11 (2019): 4366-4384.
(5) Devroye, L., Non-Uniform Random Variate Generation, 1986.
(6) Huber, M., "Optimal linear Bernoulli factories for small mean problems", arXiv:1507.00843v2 [math.PR], 2016
(7) Loaiza-Ganem, G., Cunningham, J.P., "The continuous Bernoulli: fixing a pervasive error in variational autoencoders", arXiv:1907.06845v5 [stat.ML], 2019.

[/SHOWTOGROUPS]
 
Top