# Weighted Random Selection

Posted by JH

## Random Selection Based on Weights

Q1: How to uniformly random-select a number from an array A?
Follow-up: If an additional array F is given, where F[i] is the frequency of occurrence for A[i], how to randomly select a number from A based on the frequency? The probability of A[i] being chosen should be greater if F[i] is larger.

e.g.
A = [A1, A2, A3]
F = [0, 5, 15]

P(A1 chosen) = 0%
P(A2 chosen) = 25%
P(A3 chosen) = 75%

## Solution

Two-pass solution:
The probability of each A[i] gets selected is
F[i] / sum(F , F, ..., F[n - 1])

Let SUM be sum(F , F, ..., F[n - 1]), randomly generate a number X between 0 to SUM - 1.
For the given example,
if X falls between [0, 0), choose A1;
if X falls between [0, 5), choose A2;
if X falls between [5, 20), choose A3;

The solution needs to know SUM beforehand to generate the random number, so the first pass is needed to get the SUM.

Greedy single-pass solution:
An optimal way is to build up the selection incrementally.

Iterate i through 0 to N - 1.
At the ith step, we only worry about whether A[i] is chosen or not.
If A[i] is chosen, set chosen number C = A[i].
Otherwise, keep the number chosen before.

To decide whether or not to choose A[i], up to the current step i, the total frequency of all the numbers before A[i], is SUM(F[0: i -1]), which represents the weight of A[i] not being chosen, while the weight of A[i] being chosen is F[i].

Therefore,
P(A[i] not chosen) = SUM(F[0: i - 1]) / SUM(F[0: i])
P(A[i] chosen) = F[i] / SUM(F[0: i])

Generate a random number between 0 and SUM(F[0: i]), if it falls in 0 to SUM(F[0: i - 1]), keep whatever has been chosen from before. Otherwise it falls in SUM(F[0: i - 1]) to SUM(F[0: i]), choose A[i].

No worries about whatever is after i at the moment, since the greedy solution only solves the subproblem up to the current i.

# Variants

There are a few variants of this problem that the same model will apply.
Randomly select a point in a bunch of rectangles. (Here the frequency is like the areas of rectangles)
Solution: Randomly select a rectangle based on the areas. Then it will be easy to get a random point from the one selected.

Random select a node in a linked list based on the weight of the nodes.

## Implementation

``````public Integer random(int[] array, int[] freq) {
int totalFreq = 0;
Integer selected = null;
Random rand = new Random();

for(int i = 0; i < array.length; i++) {
int r = rand.nextInt() * (totalFreq + freq[i]);
if(r >= totalFreq) {
selected = array[i];
totalFreq += freq[i];
}
}
return selected;
}
``````