Building Better Teams

Articles and Posts

Blog

 

The Luck of the Draw, but less annoying.

I’ve seen this same annoying script recreated in many different software teams. Some manual task needs to be done regularly. It could be chores (check the server logs for serious errors), process tasks (check the newly submitted candidates) to the physical (water the office plants). In this post I won’t argue with anyone about what the job of a developer is or how they should agree to organise work, but instead I just want to fix one annoying problem everyone living under the tyranny of this script experiences: repeatedly selecting the same person multiple times in a row.

Your code probably looks like this (or some language-specific alternative).

const peopleWhoCanDoTheTask = [
 "Xiao Yu'er", "Hua Wu Que", "Tie Xin Lan", "Su Ying", 
]; // These are characters from a TV series called "Handsom Siblings"

let luckyPerson = items[Math.floor(
  Math.random() * peopleWhoCanDoTheTask.length
)];

const { WebClient } = require('@slack/web-api');
(async () => {
  const result = await web.chat.postMessage({
    text: "Today's lucky person is " + luckyPerson,
    channel: "some-magic-hash-code",
  });
})();

Now the program runs and everything will be fair, right? Well, let’s run it a few times.

"Tie Xin Lan"
"Hua Wu Que"
"Xiao Yu'er"
"Xiao Yu'er"
"Xiao Yu'er"
"Tie Xin Lan"
"Xiao Yu'er"

Poor Xiao Yu’er, he was selected 4 of 7 times. Meanwhile, Su Ying wasn’t called even once! So what’s going on? Isn’t random supposed to be fair? Well let’s look:

(Array.apply(null, {length: 1000}))
.map(()=>{ return Math.floor(Math.random() * 10)} )
.reduce((acc, cur)=>{
  acc[cur] = 1 + acc[cur];
  return acc;
}, {0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0});

/*
{
0: 101 // **********
1: 96  // *********
2: 104 // **********
3: 107 // **********
4: 108 // **********
5: 102 // **********
6: 104 // **********
7: 63  // ******
8: 114 // ***********
9: 101 // **********
}
*/

The distribution tends to be even (with some flaws), but what about those “winning streaks”, where the same value will be called many times sequentially?

(Array.apply(null, {length: 1000}))
.map(()=>{ return Math.floor(Math.random() * 10)} )
.reduce((acc, cur)=>{
  if (cur === acc.prev) {
    acc.streaks += 1;
  }
  acc.prev = cur;
  return acc;
}, {prev: -1, streaks: 0});

// around 100 streaks


It’s not so bad, but it’s clear there’s some values that aren’t always fair. And about 10% of all calls will be a repeated name. That makes sense, because random is supposed to be random! Lotteries are random, and using Math.random only makes this into a lottery, not a draw or a card shuffle.

It’s true that a random system will eventually “even out” and Su Ying will get close to an equal count as Xiao Yu’er, and one day Xiao Yu’er will never get called on. However, it doesn’t change how annoying it is and nobody perceives living in this kind of lottery as fair, especially when 10% of all calls are “winning streaks”.

I’ll say that again: 10% of all results were part of a streak.

So what’s the solution? We need a way to check if the next item in a list was recently used. One intuitive/obvious option is to store every results in a database (or a textfile) and keep calling Math.random() until it returns something fair.

But I don’t like that solution. I don’t like it because it’s inefficient and non-deterministic code. Math.random() isn’t actually random, it’s using a clever but predictable algorithm to take a starting point (called a “random seed”) and each call to Math.random() will increment the series by one. In reality, Math.random() is hiding two variables, and the method call could actually be: Math.random(seed, increment). By the way, it’s not just JavaScript, most major languages do this. Also, most major languages offer a way to get “secure random numbers” which are not generated based on some formula, but another other complex system your operating system provides (in Node.js that would be crypto.randomBytes()).

Each pixel in this image was generated “randomly”. The top half used Math.random(), the lower half used a true-random-number-generator.

Each pixel in this image was generated “randomly”. The top half used Math.random(), the lower half used a true-random-number-generator.

So how does this seed and increment relationship work? It turns out there’s actually many algorithms used to generate “random numbers” (these are all called “pseudo random numbers”, pseudo meaning they’re not “really random”, just faking it).

Here’s a simple implementation to help explain the idea, it’s an implementation of the “Lehmer random number generator” designed nearly 70 years ago (a lot of progress to make better PRNGs has been made since then, but they don’t fit into the scope of something easy to understand in 1 minute of reading code).

// These 3 constants are specific to how LCG itself
// works, you don't need to learn this to understand
// how random seeds work in general.
const multiplier = 22695477;
const increment = 1;
const modulus = Math.pow(2, 32);

// I picked this number for no real reason, you could
// pick anything. A secure browser will "seed"
// Math.random with one true random value before 
// generating psudo-randoms.
let seed = 1000;

function randomLCG()
{
  // seed gets reassigned
  seed = (multiplier * seed + increment) % modulus;
  
  // the returned value is "psudo-random", and used as the next seed.
  return seed;
}

randomLCG() // 1220640521
randomLCG() // 3069053660
randomLCG() // 334448528
seed = 1000 // reset seed
randomLCG() // 1220640521
randomLCG() // 3069053660
randomLCG() // 334448528

The main point here isn’t to focus on the globals and non-idempotent state. Just to make this concept easy to understand, consider these two details:

  1. The initial seed for a pseudo-random sequence

  2. The number of times randomLCG was called (which only changes the seed, so actually you only need to care about the first point).

So now we discovered a very interesting property: all pseudo-random number generators are deterministic. So what is the implication here? Well, earlier it was mentioned that a database would need to store the count of every name selected to keep “re-generating” random numbers to prevent “winning streaks” and unfair distributions. But actually, we can replay our random numbers at any time. So we might be able to store just one number: the random seed. Just one integer needs to be persisted, which could be anything like a redis key, textfile, a value elsewhere in memory, a small message in a distributed CRDT system, or something you manually write on paper and type in.

So let’s re-imagine a script that stores this “state” in a simple text file, and can send a message to slack. It needs to do these things:

  • Use a PRNG library that can generate random numbers but also save/load the state of it from the last time it was used.

  • Check for “streaks”, ensuring the next item is not immediately repeated.

const seedrandom = require('seedrandom'); // Because JS doesn't let you set the seed on Math.random.
const { WebClient } = require('@slack/web-api');

const util = require('util');
const fs = require('fs');
const readFile = util.promisify(fs.readFile);
const writeFile = util.promisify(fs.writeFile);

let candidates = [
 "Xiao Yu'er",
 "Hua Wu Que",
 "Tie Xin Lan",
 "Su Ying",
];

const prngFile = '/tmp/prng.tmp';

async function loadPrngStateFile() {
  return await readFile(prngFile, 'utf8')
    .then(JSON.parse)
    .catch(()=>{return createRngWithSeed("something")});
}

function createRngWithSeed(seed) {
  const prng = seedrandom(seed, {state: true});
  const p = seedrandom("", {state: prng.state});
  return prng.state();
}

function selectCandidate(random) {
  return candidates[Math.floor(random * candidates.length)];
}

function selectNextCandidate(prng, lastCandidate) {
  let nextCandidateState = prng.state(); // "next", because prng() was called.
  let nextCandidate = selectCandidate(prng());

  const next = {
    candidate: nextCandidate,
    state: nextCandidateState
  };

  // Prevents "streaks".
  while (lastCandidate === next.candidate) {
    next.state = prng.state();
    next.candidate = selectCandidate(prng());
  }

  return next;
}

// Fyi: This function is async, so be sure to call it 
// sequentially if you put this into an array.
async function main() {
  const initialPrngState = await loadPrngStateFile()
  const prng = seedrandom("", {state: initialPrngState});

  let lastCandidate = selectCandidate(prng());
  let next = selectNextCandidate(prng, lastCandidate)

  await writeFile(prngFile, JSON.stringify(next.state), 'utf8');

  return next.candidate;
}
main();

/*
No streaks.
However, it has a slightly uneven distribution after 1000
runs on 10 candidates
{
  '0': 103, // **********
  '1': 103, // **********
  '2': 89,  // ********
  '3': 108, // **********
  '4': 107, // **********
  '5': 98,  // *********
  '6': 103, // **********
  '7': 87,  // ********
  '8': 97,  // *********
  '9': 105  // **********
}
*/

By the way, this code is also available as a gist.

By all means this is not the only way to do this either. We can go a bit further to fully remove the temp file if your task runs on a cron. You can compute which iteration in the generator is next based on the datetime, removing the need for any state file. This means avoiding duplicates in a “random assignment” over time could be made entirely stateless.

const parser = require('cron-parser');
const dateScriptWasLaunched = '20 Jan 2021 00:00:00 UTC'

const options = {
  currentDate: new Date(dateScriptWasLaunched),
  endDate: new Date(), // "Now"
  iterator: true
};

const interval = parser.parseExpression('30 8 * * *', options);
let iteration = 0;

while (interval.hasNext()) {
  interval.next();
  iteration += 1;
}

console.log(iteration)

This kind of statelessness would be helpful if you just want to push a script somewhere without worrying about maintaining persistence infrastructure like a database or a filesystem. Easy to push it to heroku, lambda, a random server, etc.

Of course, you can avoid all of this if you find better ways to assign work within your team and ensure people are able to get it done, eliminating the problem entirely. Try having an honest conversation about it at your next retro.

Brian Graham