Modelling Your Queues

Apr 9, 2020 · wordCount · 3 minute read

Note this is adapted from an original article that goes into much greater detail. Please check it for the nitty-gritty R code in full.

Using Simmer it’s possible to model the performance of a real-world software platform. Simmer is

a process-oriented and trajectory-based Discrete-Event Simulation (DES) package for R

It allows us to model a process as a “trajectory” an item takes through various stages or “resources” in Simmer parlance.

A Simple Platform 🔗

Let’s imagine we have a very simple system that receives a measurement via REST API, performs some analysis, and writes that analysis to the database.

We can set this up with a simmer trajectory:

an “api” that receives the measurement and passes it on to
an “analyser” that performs some kind of analysis, before finally
writing it to the database via a “database”

In simmer terms it looks like this:

measurement <- trajectory("measurements") %>%
   seize("api") %>%
   timeout(function() rnorm(1, 15)) %>%
   release("api") %>%
   seize("analyser") %>%
   timeout(function() rnorm(1, 20)) %>%
   release("analyser") %>%
   seize("database") %>%
   timeout(function() rnorm(1, 5)) %>%   release("database")

The seize functions acquire a resource, and the release functions release it. Note that each step has been given a random delay via the rnorm function.

We can then set up a simulated environment that has one of each resource and has a normal distribution of measurements arriving:

simplePlatform -> simmer("SimplePlatform") %>%
 add_resource("api", 1) %>%
   add_resource("analyser", 1) %>%
   add_resource("database", 1) %>%
   add_generator("measurement", measurement, function() rnorm(1, 10, 2))

If we then run the simulated environment for 80 time units we can see how each resource was used:

resource usage

Here the server line shows how the resource is used, whereas the queue line shows the queues filling up. The system line is the sum of both.

Now let’s look at a more complicated example.

A Real World(ish) Example 🔗

Let’s try again with a subset of a real analytics platform. It receives measurements as before, but:

they are not always in the right format — these measurements are pre-processed into a standard format
the processed data is then distributed to two different analysers which perform different types of analysis independently (i.e. in parallel)
finally, the results of all of these are written to the same database

There is one further complication: each of the steps needs to use the database. We will model this by seizing and releasing the “database” resource as part of each step.

See the original article for full details of how to set this up.

If we run this simulation for 80 time units we can see how each resource is used:

resource usage

It’s quite easy to see how the single database quickly reaches capacity while the intermediate steps don’t. You probably didn’t need a process model to guess that this would be the case, but with this model you can now plug in real numbers and hypothetical architectural changes to see what would happen.

We can go further with simmer — adding rejections, more complex behaviour at each step and so forth. Check out the simmer examples for more ideas.

Originally published at https://cosmo-opticon.com on April 9, 2020.

R simulation performance