Wrong Side of Memphis

Notes about ShipItCon 2017

Disclaimer: I know personally and worked with a good portion of the conference organizers and talkers. I label them with an asterisk*.

The ShipItCon finally took place last Friday. I think it’s quite impressive, given the short amount of time since announcing it and being the first edition, that was so well organized. The venue was very good (and fairly unusual for a tech conference), and all the usual things that are easy to take as granted (food, space, projector, sound, etc) work like clockwork. Kudos to the organizers.

The conference was oriented towards releasing online services, with special emphasis on Continuous Integration/Delivery. I think that focusing a conference over this kind of topic is challenging, as talks need to be generic enough in terms of tools, but narrow enough that is useful. Conferences about a specific technology (like PyConRubyConf or Linux Con) are more focused by concept.

The following is some notes, ideas and follow up articles that I took. Obviously, there are biased over the kind of things I find more interesting. I’ll try to link the presentation slides if/once they’re available.

  • The keynote by the Romero family was a great story and addressed a lot of specific points to the game industry (like the design challenges). It was also the exception in talking about shipping something other than a service, but a game (in Steam and iOS). I played a little GunMan Taco Track over the weekend!
    • Ship a game while on a ship“. They released part of the game while on the Queen Elizabeth cruise, crossing the Atlantic.
  • Release often and use feature toggles, detaching the code release and feature release. This is a point done in the Frederick Meyer talk that I heard recently in other places.
    • Friday night releases make me cringe, but it can make sense if the weekend is the lowest activity point of your customers.
    • Dependency trees grow to be more and more complex, to the point no one understands them anymore and only automated tools can plot them.
    • Challenges in treating data in CI. Use production data? A subset? Fake data? Redacted data? Performance analysis can be tricky.
    • Automate what you care about
  • The need for early testing, including integration/system/performance, was the theme around Chloe Condon talk. Typically, a lot of testing will be performed at the “main branch” (after a feature is merged back) level that can be prepared in advance, giving better and faster feedback to developers. Test early, test often.
    • She presented Codefresh with seems an interesting Cloud CI tool aimed at working with containers.
  • Lauri Apple talked about communication and how important READMEs and documentation are for projects, both internal and external. The WHAT to build is a key aspect that shouldn’t be overlooked.
    • READMEs should include a roadmap, as well as info about installation, run and configure the code.
    • This project offers help, review and advice for READMEs. I’ll definitively submit a  review for ffind (after I review it and polish it a little bit myself).
    • She talked about the Open Organization Maturity Model, a framework about how open organizations are.
    • A couple of projects in Zalando that catches my eye:
      • Patroni, an HA template for PostgreSQL
      • Zalenium, distribute a Selenium Grid over Docker to speed up Selenium tests.
      • External DNS, to help configure external DNS access (like AWS Route 53 or CloudFare) to Kubernetes cluster.
  • If it hurts, do it more frequently. A great quote for Continuous Delivery and automated pipelines. Darin Egan talked about the mindfullness principes and how the status quo get challenges and driving change opposes inertia.
  • The main point in Ingrid Epure‘s talk was the integration of security practices during the development process and the differences between academia and engineering practices.
    • Linters can play a part in enforcing security practices, as well as automating autoformatting to leave format differences out from the review process.
    • Standardizing the logs is also a great idea. Using Canonical Log Lines for Online Visibility. I talked before about the need to increasing logs and generate them during the development process.
  • Eric Maxwell talked about the need to standardise the “upper levels” of the apps, mainly related to logging and metrics, and making applications (Modern Applications) more aware of their environment (choreography vs orquestration) and abstracted from the underlying infrastructure.
    • He presented habitat.sh, a tool aimed at working with these principles.
    • Packaging the application code and letting the tool to do the heavy lifting on the “plumbing
  • The pipeline in Intercom was discussed by Eugene Kenny, and the differences between “the ideal pipeline” and “the reality” of making dozens of deployments every day.
    • For example, fully test and deploy only the latest change in the pipeline, speeding deployments at the expense of fewer separations of changes.
    • Or allow locking the pipeline when things are broken.
    • Follow up article: Continuous Deployment at Instagram
  • Observability is an indispensable property for online services: the ability to check what’s going on in production systems. Damien Marshall* had this concept of graphulsion that I can only share.

He gave some nice ideas on observability through the whole life cycle:

Development:

  • Make reporting logs and metrics simple
  • Account for the effort to do observability work
  • Standardize what to report. The three most useful metrics are Request Rate, Error Rate and Duration per Request.

Deployment:

  • Do capacity planning. Know approximately the limits of your system and calculate the utilization of the system (% of that limit)
  • Ship the observability

Production:

  • Make metrics easy to use
  • Centralise dashboard views across different systems
  • Good alerting is hard. Start and keep it simple.

 

  • Riot Games uses custom generation of services to generate skeletons and standardise good practices and reduce development time. Adam Comeford talked about those practices and how they implemented them.
    • Thinking inside the container.
    • Docker-gc is a tool to reduce the size of image repos, as they tend to grow very fast very quickly.
  • Jacopo Scrinzi talked about defining Infrastructure as Code, making the infrastructure changes through the same process as code (review, subjected to source control, etc). In particular using Terraform and Atlas (now Terraform Enterprise) to make automatic deployments, following CI practices for infrastructure.
    • Using modules in Terraform simplifies and standardises common systems.
  • The last keynote was about Skypilot, an initiative inside Demonware to deploy a game fully using Docker containers over Marathon/Mesos , in the Cloud. It was given by Tom Shaw* and the game was last year’s release of Skylanders. As I’ve worked in Demonware, I know how big an undertaking is to prepare the launch of a game previously in dedicated hardware (and how much in underused to avoid risks), so this is a huge improvement.

 

 

As noted by the amount of notes I took, I found the conference very interesting and full of ideas that are worth following up. I really expect a ShipItCon 2018 full of great content.

 

ffind v1.2.0 released!

The new version of ffind v1.2.0 is available in GitHub and PyPi. This version includes the ability to configure defaults by environment variables and to force case insensitivity in searches.

You can upgrade with

    pip install ffind --upgrade

This will be the latest version to support Python 2.6.

woman programming on a notebook
Photo by Christina Morillo on Pexels.com

Happy searching!

A Django project template for a RESTful Application using Docker

I used what I learn and some decisions to create a template for new projects. Part of software development is mainly plumbing. Laying bricks together and connecting parts so the important bits of software can be accessing. That’s a pretty important part of the work, but it can be quite tedious and frustrating.

This is somehow a very personal work. I am using my own opinionated ideas for it, but I’ll explain the thought process behind them. Part of the idea is to add to the discussion on how a containerised Django application should work and what is the basic functionality that is expected for a fully production-ready service.

The code is available here. This blog post covers more the why’s, while the README in the code covers more the how’s.

GO CHECK THE CODE IN GITHUB!

Screen Shot 2017-07-30 at 15.22.42.png

Summary

  • a Django RESTful project template
  • sets up a cluster of Docker containers using docker-compose
  • include niceties like logging, system tests and metrics
  • code extensively commented
  • related blog post explaining the why and opening the discussion to how to work with Docker

Docker containers

Docker is having a lot of attention in the last few years, and it promises to revolutionise everything. Part of it is just hype but nevertheless is a very interesting tool. And as every new tool, we’re still figuring out how it should be used.

The first impression when dealing with Docker is to treat it as a new way of a virtual machine. Certainly, it was my first approach. But I think is better to think of it a process (a service) wrapped in a filesystem.

All the Dockerfile and image building process is to ensure that the filesystem contains the proper code and configuration files to then start the process. This goes against daemonising and other common practices in previous environments, where the same server handled lots of processes and services working in unison. In the Docker world, we replace this with several containers working together, without sharing the same filesystem.

The familiar Linux way of working includes a lot of external services and conveniences that are no longer required. For example, starting multiple services automatically on boot makes no sense if we only care to run one process.

This rule has some exceptions, as we’ll see later, but I think is the best approach. In some cases, a single service requires multiple processes, but simplicity is key.

Filesystem

The way Docker file system work is by adding a layer on top of another layer. That makes the build system to series of steps that execute a command that changes the filesystem and then exits. The next step builds on top of the other.

This has very interesting properties, like the inherent caching system, which is very useful for the build process. The way to create a Dockerfile is to set the parts that are least common to change (e.g. dependencies) on top, and put at
the end of it the ones that are most likely to need to be updated. That way, build times are shorter, as only the latest steps need to be repeated.
Another interesting trick is to change the order of the steps in the Dockerfile while actively developing, and move them to their final place once the initial setup (packages installed, requirements, etc) is stable.

Another property is the fact that each of these layers can only add to a filesystem, but never subtract. Once a build operation has added something, removing it won’t free space of the container. As we want to keep our containers as minimal as possible, care on each of the steps should be taken. In some cases, this means to add something, use it for an operation, and then remove it in a single step. A good example is compilation libraries. Add the libraries, compile, and then remove them as they won’t be used (only the generated binaries).

Alpine Linux

Given this minimalistic approach, it’s better to start as small as possible. Alpine is a Linux distribution focused on minimalistic containers and security. Their base image is just around 4MB. This is a little misleading, as installing Python will bring it to around 70MB, but it’s still much better than something like an Ubuntu base image, at around 120MB start and very easy to get to 1GB if the image build is done in a traditional way, installing different services and calling apt-get with abandon.

This template creates an image around 120MB size.

Embed from Getty Images

Running a cluster

A single container is not that interesting. After all is not much more than a single service, probably a single process. A critical tool to work with containers is to be able to set several of them to work in unison.

The chosen tool in this case is docker-compose which is great to set up a development cluster. The base of it is the docker-compose.yaml file, that defines several “services” (containers) and links them all together. The docker-compose.yaml file contains the names, build instructions, dependencies and describes the cluster.

Note there are two kinds of services. One is the container that runs and ends, producing some result as an operation. For example, run the tests. It starts, runs the tests, and then ends.
The other one is to run a long running service. For example, run a web server. The server starts and it doesn’t stop on its own.
In the docker-compose there are both kind of services. server and db are long running services, while test and system-test are operations, but most of them are services.

It is possible to differentiate grouping them in different files, but dealing with multiple docker-compose.yaml files is cumbersome.

The different services defined, and their relationships, are described in this diagram. All the services are described individually later.

docker-template-diagram.png

As it is obvious from the diagram, the main one is server. The ones in yellow are operations, while the ones in blue are services.
Note that all services exposes their services in different ports.

Codebase structure

All the files that relates to the building of containers of the cluster are in the ./docker subdirectory, with the exception of the main Dockerfile and docker-compose.yaml, that are in the root directory.

Inside the ./docker directory, there’s a subdir for each service. Note that, because the image is the same, some services like dev-server or test inherits the files under ./docker/server

The template Django app

The main app is in the directory ./src and it’s a simple RESTful application that exposes a collection of tweets, elements that have a text, a timestamp and an id. Individual tweets can be retrieved and new ones can be created. A basic CRUD interface.

It makes use of the Django REST framework and it connects to a PostgreSQL database to store the data.

On top of that, there are unit tests stored in Django common way (inside ./src/tweet/tests.py. To run the tests, it makes usage of pytest and pytest-django. Pytest is a very powerful framework for running tests, and it’s worth to spend some time to learn how to use it for maximum efficiency.

All of this is the core of the application and the part that should be replaced for doing interesting stuff. The rest of it is plumbing to making this code to run and to have it properly monitored. There are also system tests, but I’ll talk about them later.

The application is labelled as templatesite. Feel free to change the name to whatever makes sense for you.

The services

The main server

The server service is the core of the system. Though the same image is used for multiple purposes, the main idea is to set up the Django code and make it run in a web server.

The way this is achieved is through uwsgi and nginx. And yes, this means this container is an exception about running a single process.

django-docker-server.png

As shown, the nginx process will serve the static files, as generated by Django collectstatic command, and redirect everything else towards a uWSGI container that runs the Django code. They are connected by a UNIX socket.

Another decision has been to create a single worker on the container. This follows the minimalistic approach. Also, Prometheus (see below) doesn’t like to be round robin behind a load balancer in the same server, as the metrics reported are inconsistent.

It is also entirely possible to run just uWSGI and create another container that runs nginx and handles the static files. I chose not to because this creates a single HTTP server node. Exposing HTTP with uWSGI is not as good as with nginx, and you’ll need to handle the static files externally. Exposing uWSGI protocol externally is complicated and will require some weird configuration in the nginx frontend. This makes a totally self-contained stateless web container that has the whole functionality.

The Database

The database container is mainly a PostgreSQL database, but a couple of details have been added to its Dockerfile.

Embed from Getty Images

After installing the database, we add our Django code, install it, and then run the migrations and load the configured fixtures. All of this at build time. This makes the base image to contain an empty test database and a pre-generated general database, helping for quick setup of tests. To get a fresh database, just bring down the db container and restart it. No rebuild is needed unless there are new migrations or new fixtures.

In my experience with Django, as project grow and migrations are added, it slowly takes more and more time to run the tests if the database needs to be regenerated and fixtures to be loaded again. Even if the --keepdb option is used from tests, sometimes a fresh start is required.

Another important detail is that this database doesn’t store data in a persistent volume, but just inside the container. This is aimed not to work as a persistent database, but to run quickly and to be able to be regenerated into a known state with ease. If you need to change the start data, change the fixtures loaded. Only put inside things you are ok losing.

As part of the setup, notice that the following happens. The database needs to be started and then another process, the Django manage.py, loads the fixtures. Then the database is turned down and the container exists. This is one of the cases where multiple processes need to run in a container. The turn down is important, as ending the PostgreSQL process abruptly can lead to data corruption. Normally on the next startup of the database, it will be corrected, but it will take a little time. It’s better to end the step cleanly.

Logs

Logging events is critical for a successful operation in a production environment and typically is done very late in the development process. I try to introduce logging as I run my unit tests and add different information while developing, which it helps quite a lot in figuring out what’s going on on production servers.

A big part of the pain of logging is to set up properly the routing of the logs. In this case, there’s a dedicated log container running syslog where all the INFO and up logs from the server are directed. The collected logs are stored in a file on the container that can be easily checked.

All the requests are also labelled with a unique id, using the Django log request id middleware. The id can also be forwarded through the X-REQUEST-ID HTTP header, and it will be returned in the responses. All the logs from a request will include this ID, making easy to follow what the request has done.

When running the unit tests, the DEBUG logs are also directed to the standard output, so they’ll show as part of the pytest run. Instead of using print in your unit test while debugging a problem, try to use logging and keep what it makes sense. This will keep a lot of useful information around when a problem arises in production.

Metrics

Another important part of successful production service. In this case, it is exposing metrics to a Prometheus container.
It uses the prometheus-django module and it’s exposing a dashboard.

Embed from Getty Images

There’s also included a Grafana container, called metrics-graph. Note that these two containers are being pulled from their official images, instead of including a tailored Dockerfile. The metrics container has some minimal configuration. This is because the only requirement is to expose the metrics in a Prometheus format, but creating dashboards or making more detailed work on metrics is out of the scope for this.

The good thing about Prometheus is that you can cascade it. It works by fetching the data from our web service itself (through the /metrics URL), and at the same time it exposes the same URL with all the data it pools. This makes possible to very easily create a hierarchical structure where a container picks information about a few servers and then exposes the information to another one, that groups the data from a few Prometheus containers.

Prometheus query language and metrics aggregation are very powerful, but at the same time is very confusing initially. The included console has a few queries for interesting data in Django.

Handling dependencies

The codebase includes two subdirectories, ./deps and ./vendor. The first one is to include your direct dependencies. That mean your own code, that lives in a different repo. This allows you to set a git submodule and use it as a imported module. There’s a README file to show some tips on using git submodules, as they are a little tricky.

The idea behind this is to avoid the usage of git pulling from a private repo inside a requirements file, which requires setup of authentication inside the container (adding ssh keys, git support, etc). I think is better to handle that at the same level as your general repo, and then import all the source code directly.

./vendor is a subdirectory to contain a cache of python modules in wheel format. The service build-deps builds a container with all the stated dependencies in the requierements.txt file and precompile them (among all sub-dependencies) in convenient wheel files. Then the wheel files can be used to speed up the setup of other containers. This is optional, as the dependencies will be installed in any case, but greatly speeds up rebuilding containers or tweaking requirements.

Testing and interactive development

The container test runs the unit tests. It doesn’t depend on any external service and can be run in isolation.

The service dev-server starts the Django development web server. This reloads the code as it changes, as it mounts the local directory. It also logs the runserver logs into standard output.

The container system-test independently run tests generating external HTTP requests against the server. They are under the ./system-test subdir and are run using pytest.

Healthcheck

Docker container can define a healthcheck to determine whether a container is properly behaving, and take action if necessary. The application includes a URL root for the heathcheck that currently is checking if the access to the database is correct. More calls to external services can be included in this view.

Embed from Getty Images

The healthcheck pings using curl the local nginx URL, so it also tests the routing of the request is correct.

Other ideas

  • Though this project aims to have something production-ready, the discussion on how to do it is not trivial. Expect details to require to be changed depending on the system to bring the container to a production environment.
  • Docker has some subtleties that are worth paying attention, like the difference between up and run. It pays up to try to read with care the documentation and understand the different options and commands.
  • Secrets management has been done through environment variables. The only real secrets in the project are the database password and the Django secret key. Secret management may depend on what system you use in production. Using docker native secret support for Docker swarm create files inside the container that can be poured into the environment with the start scripts. Something like adding to docker/server/start_server.sh

export SECRET=`cat /run/secret/mysecret`

The Django secret key is injected as part of the build as an ARG. Check that’s consistent in all builds at docker-compose.yaml.  The database password is stored in an environment variable.

  • The ideal usage of containers in CI should be to work with them locally, and when pushed to the repo, trigger a chain of build container, run tests and promote the build until deployment, probably in several stages. One of the best usages of containers is to be able to set them up and not change them all along the process, which was as easy as it sounds with traditional build agents and production servers.
  • All ideas welcome. Feel free to comment here or in the GitHub repo.

UPDATE: I presented this template in the PyCon Ireland 2017. The slides are available in this blog post.

Compendium of Wondrous Links vol XI

wondrous_links

It has been a while since the last time. More food for though!

Python

 

The craft of development and tools

 

The environment of development

  • Computers are fast. Even if you see a lot of delays from time to time, they do a lot of stuff each second. Try to guess with this questionnaire.

 

Miscellanea

  • The perfect man. I find quite fascinating the lives of people so driven by competitiveness.
  • Dealing with incentives is always more complicated that it looks like. The Cobra effect.

$7.11 in four prices and the Decimal type, revisited

I happen to take a look to this old post in this blog. The post is 7 years old, but still presents an interesting problem.

“A mathematician purchased four items in a grocery store. He noticed that when he added the prices of the four items, the sum came to $7.11, and when he multiplied the prices of the four items, the product came to $7.11.”


I wanted to check my old solutions again, with the things that I learn in the last years. At that time, I was still a newbie in Python, and there are certainly interesting points to check again.

Python 2 vs Python 3

The first interesting difference is the fact that Python3 is now quite mature, and there’s really no excuse for not using it as a default for small exercises or new projects.

The code can actually be adapted into running both in Python 2 and 3 to make comparisons:

import operator
from itertools import combinations
from functools import reduce
from decimal import Decimal

def solve(number, step, number_of_numbers):
  integer_range = int(number / step)
  all_numbers = (Decimal(i) * step
for i in range(1, integer_range + 1))
  multiples = 1 / (step ** number_of_numbers)
  correct_numbers = (i for i in all_numbers
                     if (number * multiples % i) == 0)

  def check(numbers):
    """Check the numbers added and multiplied are the same"""
    return (sum(numbers) == number
            == reduce(operator.mul, numbers))
  # only combinations are required, as order is irrelevant
  comb = combinations(correct_numbers, number_of_numbers)
  results = [p for p in comb if check(p)]
  return results

print(solve(number=Decimal('7.11'), step=Decimal('0.01'),
            number_of_numbers=4))

I changed a couple of things, like the usage of filter and reduce and better usage of list comprehensions, but the code is very similar. Obtain the numbers that are multiples or the total (as they are the only possible ones to fulfil the criteria), and then try all combinations.

I removed the internal time check, using instead the UNIX one, calling it at bash:

$ time python2.7 711.py
[(Decimal('1.20'), Decimal('1.25'), Decimal('1.50'), Decimal('3.16'))]

real 1m26.020s
user 1m22.199s
sys 0m1.124s

$ time python3.6 711.py
[(Decimal('1.20'), Decimal('1.25'), Decimal('1.50'), Decimal('3.16'))]

real 0m1.148s
user 0m0.955s
sys 0m0.030s

The time difference is quite staggering. 1 second vs 90. They clearly have spend time in optimise Decimal in Python3

Crunching numbers

For the second version of the code, the code is like this

def calc():
  for w in range(711):
    for x in range(711 - w):
      for y in range(711 - w - x):
        z = 711 - x - y - w
        if (w * x * y * z == 711000000):
          print("w: %d x: %d y: %d z: %d" % (w, x, y, z))
          return

wich, this time, is actually slower in python3, and slower than the first solution.

time python2.7 711_v2.py
w: 120 x: 125 y: 150 z: 316

real	0m7.963s
user	0m7.362s
sys	0m0.291s
$ time python3.6 711_v2.py
w: 120 x: 125 y: 150 z: 316

real	0m13.193s
user	0m11.955s
sys	0m0.461s

But, there are two tricks that can be used in this case.

The first one is a simple replacement for PyPy, which indeed speeds up things.

$ time pypy 711_v2.py
w: 120 x: 125 y: 150 z: 316

real	0m2.413s
user	0m0.177s
sys	0m0.094s

Not sure why, but the results where from ~200ms up to 4 seconds, which is quite a variability.

And the second one is to use Cython to compile the code into a C extension. This requires making two files:

# This compiles the extension automatically
import pyximport; pyximport.install()
from c711v3 import calc
calc()

and a c711v3.pyx file, with the Cython code.

def calc():
  # define the variables as C types
  cdef int w, x, y, z
  for w in range(711):
    for x in range(711 - w):
      for y in range(711 - w - x):
        z = 711 - x - y - w
        if (w * x * y * z == 711000000):
          print("w: %d x: %d y: %d z: %d" % (w, x, y, z))
          return

This speeds up the process sensibly, to a quarter of a second:

$ time python 711_v3.py
w: 120 x: 125 y: 150 z: 316

real	0m0.223s
user	0m0.138s
sys	0m0.070s

Not bad from 90 seconds at the start!

Conclussions

  • Python3 has a lot of optimisations, sometimes in non-obvious places. It should be the version to go unless there are legacy requirements.
  • It’s also quite simple to keep code compatible with Python2 and Python3 for easy stuff.
  • I can look at code I wrote 7 years ago and find ways of improving it… Good to feel like I know more.
  • PyPy is another useful tool that should be considered. It’s also getting better and betters and now it has as well Python3 compatibility. Also, my blog has been around for a while (though I don’t update it as often as I should)
  • Creating C extensions with Cython is very easy when dealing with certain bottlenecks.
  • And I still find joy in coding small exercises!

ffind v1.0.2 released!

The new version of ffind (1.0.2) is available in GitHub and PyPi. This version includes the ability to execute python modules and scripts directly and some other minor improvements.

Happy developing!

Happy 25th Anniversary, Python

25 years ago, on 20th February 1991, Python 0.9.0 was released publicly… I absolutely love it and use it everyday, and it seems to be as successful as ever…

900x900px-LL-4632c8c0_gallery8713121347386124-2.png

For another great 25 years! Cheers!

 

Do not spawn processes on users requests

I’ve been playing recently an online game that has recently launched, that uses the following idea.

When a user starts a match, it spawns a process in the server that acts as the opponent, generating the actions against the user.

The game had a rough launch, with a lot of problems due it being played by a lot of people. And, IMHO, a lot of the problems can be traced to that idea.

I see it’s a seductive one. If a user generates an interaction with the service that takes time (for example, a match for this game), spawn a process/thread in the server that generates the responses in “real time“. The user then will be notified through polling or pushing the information, and can react to it. The process will receive the new information from the user and adjust the responses.

I know is seductive because I had it once, and I was very lucky to have someone around with more experience that show me how it will break under pressure. It’s not a sane architecture to scale.

Some bad ideas:

  • No limit on processes, meaning the servers can be overflown by context switching. Once you have several thousand processes  running on a server, you are in a bad place.

harry_potter_replicating_cups.gif

Replication out of control

  • The very definition of state on the server. You need to keep track of processes started on different servers (so no two servers perform the same job). High Availability is impossible, as losing one server will mean destroying the state on all those processes. For scalability, always look at stateless servers: read all the data, store the resulting data.
  • Start up times. Each time a process starts, there’s some time to boot. This can be a problem if processes are always being started and stopped, adding overhead to the system. Even starting a thread is not free (and will require probably starting internal work like connect to the DB, read from cache, etc)
  • Connections explosion. If each process needs to connect to other parts of the infrastructure (DB, logging, cache, etc) you can have a problem in number of connections.
  • Process monitoring. What if a process gets stuck? A request can be cancelled easily by a web server (if a request takes more than X, kill it), but an individual process or thread can be more complicated and require specific tooling.

Alternative: Pool of workers

Generate a defined number of processes that can perform the individual actions that generates a match. Each process will get an action from a queue, execute it, and store the resulting state. Any process can produce an action for any user.

vlcsnap-2012-09-24-19h37m32s14.png

A group of workers can be very efficient

For example, if a match is a set of 20 actions, each one happening every  minute, the start match request will introduce 20 actions in a queue, to be extracted at the proper time, introducing the proper delay on each action. Note that the queue needs to have a way of deliver delayed messages, not every messaging queue can (in particular, RabbitMQ doesn’t have a good support). Beanstalkd or Amazon SQS supports it.

Or, alternatively, a single action, that will end inserting the next step in the queue with the adequate delay. The action can be as simple as checking if it should change something and, if not, end.

The processes will be extracting the next action from the queue, and executing them. Note that here you minimise the time the worker is waiting for a new task to do. Each worker is active as much as possible, while any user has a pending task, ready to be executed.

The number of processes are limited, so you won’t have an explosion. You can test the system and have a good idea on the limit, when your throughput is not good enough to execute the actions within a reasonable delay, so you can stop the users from starting a new match. This is a better fallback option than allowing everyone to start one and then not giving a good experience.

A priority queue can be put in place, in that case, to inform the user: “You will be able to start your match in ~3 minutes

Or you can add more processes/servers to increase the throughput in a predictable manner.

Alternative: Whole match pregeneration

Another alternative is actually generating a set of actions and returning them in the first go, and display them at the proper times in the client side. If any adjustment is required due the actions of the user, redo all the results from that time on.

Emperor1.jpg

This match is proceeding as I have foreseen it

For example, a match starts, and returns the 20 server actions to the client, which shows them to the user one each minute. In the 3rd minute, the user performs an action, which makes the server to recalculate the  remainder of the match and return another 17 actions. This is a good strategy if generating actions in advance is possible and few interactions from the user are expected.

The bottom line

The main word here is stateless. It is a basic component of an scalable system, and it’s always worth it to keep in mind when designing a system to be used to more than a couple of users.

All you need is cache

Cache is all you need

Cache is all you need

What is cache

More than a formal definition, I think that the best way of thinking about cache is an result from an operation (data) that gets saved (cached) for future use.

The cache value should be identifiable with a key that is reasonably small. This normally is the call name and the parameters, in some sort of hashed way.

A proper cache has the following three properties:

  1. The result is always replicable. The value can be scrapped without remorse.
  2. Obtaining the result from cache is faster than generate it.
  3. The same result will be used more than once.

The first property implies that the cache is never the True Source of Data. A cache that’s the True Source of Data is not a cache, it’s a database; and need be treated as such.

The second one implies that retrieving from cache is useful. If getting the result from the cache is slower (or only marginally better) than from the True Source of Data, the cache can (and should) be removed. A good candidate for cache should be a slow I/O operation or computationally expensive call. When in doubt, measure and compare.

The third property simply warns against storing values that will be used only once, so the cached value won’t be ever used again. For example, big parts of online games are uncacheable because they change so often they are read less times than written.

The simplest cache

The humblest cache is a well known technique called memoization, which is simply to store in process memory the results of a call, to serve it from there on the next calls with the same parameters. For example,

NUMBER = 100
def leonardo(number):

    if number in (0, 1):
        return 1

    return leonardo(number - 1) + leonardo(number - 2) + 1

for i in range(NUMBER):
    print('leonardo[{}] = {}'.format(i, leonardo(i)))  

This terribly performant code will return the first 100 Leonardo numbers. But each number will be calculated recursively, so storing the result we can greatly speed up the results. The key to store the results is simply the number.

cache = {}

def leonardo(number):

    if number in (0, 1):
        return 1

    if number not in cache:
        result = leonardo(number - 1) + leonardo(number - 2) + 1
        cache[number] = result

    return cache[number]

for i in range(NUMBER):
    print('leonardo[{}] = {}'.format(i, leonardo(i)))

Normally, though, we’d like to limit the total size of the cache, to avoid our program to run wild in memory. This restrict the size of the cache to only 10 elements, so we’ll need to delete values from the cache to allow new values to be cached:

def leonardo(number):

    if number in (0, 1):
        return 1

    if number not in cache:
        result = leonardo(number - 1) + leonardo(number - 2) + 1
        cache[number] = result

    ret_value = cache[number]

    while len(cache) > 10:
        # Maximum size allowed, 10 elements
        # this is extremely naive, but it's just an example
        key = cache.keys()[0]
        del cache[key]

    return ret_value

Of course, in this example every cached value never changes, which may not be the case. There’s further discussion about this issue below.

Cache keys

Cache keys deserve a small note. They are not usually complicated, but the key point is that they need to be unique. A non unique key, which may be produced by unproper hashing, will produce cache collisions, returning the wrong data. Be sure that this doesn’t happen.

Python support

Just for the sake of being useful, on Python3 there is support for a decorator to cache calls, so the previous code can look like this.

from functools import lru_cache

lru_cache(maxsize=10)
def leonardo(number):

    if number in (0, 1):
        return 1

    if number not in cache:
        result = leonardo(number - 1) + leonardo(number - 2) + 1
        cache[number] = result

    return cache[number]

so you can use it instead of implement your own.

The stereotypical web app cache

In the context of web apps, everyone normally thinks of memcached when thinks of cache.

Memcached will, in this stereotypical usage, use some allocated memory to cache database results or full HTML page, identified by an appropriate unique key, speeding up the whole operation. There are a lot of integrated tools with web frameworks and it can be clustered, increasing the total amount of memory and reliability of the system.typical_usage

In a production environment, with more than one server, the cache can  be shared among different servers, making the generation of content only happen once on the whole cluster, and then be able to be read by every consumer. Just be sure to ensure the first property, making possible to obtain the value from the True Source of Data at any point, from any server.

This is a fantastic setting, and worth using in services. Memcached can be also replaced by other tools like Redis, but the general operation is similar.

But there are more ways to cache!

Assuming a typical distributed deployment on a production web server, there are a lot of places where a cache can be introduced to speed up things.

This described service will have one DB (or a cluster) that contains the True Source of Data, several servers with a web server channeling requests to several backend workers, and a load balancer on top of that, as the entry point of the service.

full_distributed_app

Typically, the farther away that we introduce a cache from the True Source of Data, the less work we produce to the system and the most efficient the cache is.

Let’s describe possible caches from closest to the True Source of Data to farther away.

Cache inside the DataBase

(other than the internal cache of the database itself)

Some values can be stored directly on the database, derivating them from the True Source of Data, in a more manageable way.

A good example for that are periodic reports. If some data is produced during the day, and a report is generated every hour, that report can be stored on the database as well. Next accesses will be accessing the already-compiled report, which should be less expensive than crunching the numbers again.

InsideDB

Another useful way of caching values is to use replication. This can be supported by databases, making possible to read from different nodes at the same time, increasing throughput.

For example, using Master-Slave replication on MySQL, the True Source of Data is on the Master, but that information gets replicated to the slaves, that can be used to increase the read throughput.
full_distributed_app_replication

Here the third property of cache shows up, as this is only useful if we read the data more often than we write it. Write throughput is not increased.

Cache in the Application Level

The juiciest part of a service is normally in this level, and here is where the most alternatives are available.

From the raw results of the database queries, to the completed HTML (or JSON, or any other format) resulting from the request, or any other meaningful intermediate result, here is where the application of caches can be most creative.

Memory caches can be set either internally per worker, per server, or  externally for intermediate values.

  • Cache per worker. This is the fastest option, as the overhead will be minimal, being internal memory of the process serving the requests. But it will be multiplied by the number of workers per box, and will need to be generated individually. No extra maintenance needs to be done, though.
  • External cache. An external service, like memcached. This will share all the cache among the whole service, but the delays in accessing the cache would be limited by the network. There is extra maintenance costs in setting the external service.
  • Cache per server. Intermediate. Normally, setting on each server a cache service like memcached. Local faster access shared among all workers on the same box, with the small overhead of using a protocol.

Another possibility worth noting in some cases is to cache in the hard drive, instead of RAM memory. Reading from local hard drive can be faster than accessing external services, in particular if the external service is very slow (like a connection to a external network) or if the data needs to be highly processed before being used. Hard drive caches can also be helpful for high volumes of data that won’t fit in memory, or reducing startup time, if starting a worker requires complex operations that produces a cacheable outcome.

Cache in the Web Server

Widely available web servers like  Apache or Nginx have integrated caches. This is typically less flexible than application layer caching, and needs to fit into common patterns, but it’s simple to setup and operate.

There’s also the possibility to return an empty response with status code 304 Not Modified, indicating that the data hasn’t changed since the last time the client requested the data. This can also be triggered from the application layer.

Static data should be, as much as possible, stored as a file and returned directly from the web server, as they are optimised for that use case. This allows the strategy of storing responses as static files and serve them through the web server. This, in an offline fashion, is the strategy behind static website generators like Nikola or Jekyll.

For sites that deal with huge number of requests that should return the same data, like online newspapers or Wikipedia, a cache server like Varnish can be set to cache them, that may be able to act as a load balancer as well. This level of cache may be done with the data already compressed in Gzip, for maximum performance.

Cache in the Client

Of course, the fastest requests is the one that doesn’t happen, so any information that can be stored in the client and avoid making a call at all will greatly speed an application. To achieve real responsiveness this needs to be greatly taken into account. This is a different issue than caching, but I translated an article a while ago about tips and tricks for improving user experience on web applications here.

The dreaded cache invalidation

The elephant in the room when talking about cache is “cache invalidation”. This can be an extremely difficult problem to solve in distributed environments, depending on the nature of the data.

The basic problem is very easy to describe: “What happens when the cache contains different data than the True Source of Data?

Some times this won’t be a problem. In the first example, the cached Leonardo numbers just can’t be different from the True Source of Data. If the value is cached, it will be the correct value. The same would happen with prime numbers, a calendar for 2016, or last month’s report. If the cached data is static, happy days.

But most of the data that we’d like to cache is not really static. Good data candidates for being cached are values that rarely change. For example, your Facebook friends, or your schedule for today. This is something that will be relatively static, but it can change (a friend can be added, a meeting cancelled). What would happen then?

The most basic approach is to refresh periodically the cache, like deleting the cached value after a predetermined time. This is very straightforward and normally supported natively by the cache tools, like allowing to store a value with a validation date. For example, assuming the user has a cached copy of the avatars of friends locally available, only ask again every 15 minutes. Sure, there will up to 15 minutes where a new avatar from a friend won’t be available, and the old one will be displayed, but that’s probably not a big deal.

On the other hand, the position on a leaderboard for a competitive video game, or the result on a live match in the World Cup is probably much more sensible for such a delay.

Even worse, we’ve seen that some options involve having more than one cache (cache per server, or per worker; or redundant copies for reliability purposes). If two caches contains different data, the user may be alternating between old and new data, which will be confusing at best and produce inconsistent results at worst.

This is a very real problem on applications working with eventually consistent databases (like the mentioned Master-Slave configuration). If a single operation involves writing a value, and then read the same value, the returned value could have a different value (the old one), potentially creating inconsistent results or corrupting the data. Two very close operations modifying the same data by two users could also produce this effect.

Periodically refreshing the cache can also produce bad effects in production environment, like synchronising all the refresh happening at the same time. This is typical in systems that refresh data for the day at exactly 00:00. At exactly that time all workers will try to refresh all the data at the same time, orchestrating a perfectly coordinated distributed attack against the True Source of Data. It is better to avoid perfectly round numbers and use some randomness instead, or set numbers relative to the last time the data was requested from the True Source of Data, avoiding synchronised access.

This avalanche effect can also happen when the cache cluster changes (like adding or removing nodes, for example, when one node fails). These operations can invalidate or make unavailable high numbers of cached content, producing an avalanche of requests to the True Source of Data. There are techniques to mitigate this, like Consistent Hash Rings, but they can be a nightmare if faced in production.

Manually invalidating the cache when the data changes in the True Source of Data is a valid strategy, but it needs to invalidate the results from all the caches, which is normally only feasible for external cache services. You simply can’t access the internal memory of a worker on a different server. Also,  depending on the rate of invalidation per read cached value, can be counter productive, as it will produce an overhead of calls to the cache services. It also normally requires more development work, as this needs a better knowledge of the data flow and when the value in the cache is no longer valid. Sometimes that’s very subtle and not evident at all.

Conclusion

Caching is an incredibly powerful tool to improve performance in software systems.  But it can also be a huge pain due all those subtle issues.

So, some tips to deal with cache

  • Understand the data an how it’s consumed by the user. A value that changes more often than gets read it’s not a good cache candidate.
  • Ensure the system has a proper cache cycle. At the very least, understand how cache flows and what are the implications of cache failure.
  • There are a lot of ways and levels to cache. Use the most adequate to make caching efficient.
  • Cache invalidation can be very difficult. Sorry about that.

Gorgon: A simple task multiplier analysis tool (e.g. loadtesting)

Load testing is something very important in my job. I spend a decent amount of time checking how performant are some systems.

There are some good tools out there (I’ve used Tsung extensively, and ab is brilliant for small checks), but I found that it’s difficult to create flows, where you produce several requests in succession and the input depends on the returned values of previous calls.

Also, normally load test tools are focused in HTTP requests, which is fine most of the time, but sometimes is limiting.

So, I got the idea of creating a small framework to take a Python function, replicate it N times and measure the outcome, without the hassle of dealing manually with processes, threads, or spreading it out on different machines.

The source code can be found in GitHub and it can be installed through PyPi. It is Python3.4 and Python2.7 compatible.

pip install gorgon

Gorgons were mythological monsters whose hair were snakes.

Gorgons were mythological monsters whose hair were snakes.

Gorgon

To use Gorgon, just define the function to be repeated. It should be a  function with a single parameter that will receive a unique number. For example

    
    def operation_http(number):
        # Imports inside your function 
        # is required for cluster mode
        import requests  
        result = request(get_transaction_id_url)
        unique_id = get_id_from(result)
        result = request(make_transaction(unique_id))
        if process_result(result) == OK:
            return 'SUCCESS'
        return 'FAIL'

There’s no need to limit the operation to HTTP requests or other I/O operations

    def operation_hash(number):
        import hashlib
        # This is just an example of a 
        # computationally expensive task
        m = hashlib.sha512()
        for _ in range(4000):
            m.update('TEXT {}'.format(number).encode())
        digest = m.hexdigest()
        result = 'SUCCESS'
        if number % 5 == 0:
            result = 'FAIL'
        return result

Then, create a Gorgon with that operation and generate one or more runs. Each run will run the function num_operations times.

        from 
        NUM_OPS = 4000
        test = Gorgon(operation_http)
        test.go(num_operations=NUM_OPS, num_processes=1, 
                num_threads=1)
        test.go(num_operations=NUM_OPS, num_processes=2, 
                num_threads=1)
        test.go(num_operations=NUM_OPS, num_processes=2, 
                num_threads=4)
        test.go(num_operations=NUM_OPS, num_processes=4, 
                num_threads=10)

You can get the results of the whole suite with small_report (simple aggregated results) or with html_report (graphs).

    Printing small_report result
    Total time:  31s  226ms
    Result      16000      512 ops/sec. Avg time:  725ms Max:  3s  621ms Min:   2ms
       200      16000      512 ops/sec. Avg time:  725ms Max:  3s  621ms Min:   2ms

Example of graphs. Just dump the result of html_report as HTML to a file and take a look with a browser (it uses Google Chart API)

Gorgon HTML report example

Gorgon HTML report example

Cluster

By default, Gorgon uses the local computer to create all the tasks. To distribute the load even more, and use several nodes, add machines to the cluster.

        NUM_OPS = 4000
        test = Gorgon(operation_http)
        test.add_to_cluster('node1', 'ssh_user', SSH_KEY)
        test.add_to_cluster('node2', 'ssh_user', SSH_KEY, 
                             python_interpreter='python3.3')
        ...
        # Run the test now as usual, over the cluster
        test.go(num_operations=NUM_OPS, num_processes=1, 
                num_threads=1)
        test.go(num_operations=NUM_OPS, num_processes=2, 
                num_threads=1)
        test.go(num_operations=NUM_OPS, num_processes=2, 
                num_threads=4)
        print(test.small_report())

Each of the nodes of the cluster should have installed Gorgon over the default python interpreter, unless the parameter python_interpreter is set. Using the same Python interpreter in all the nodes and controller is recommended.
paramiko module is a dependency in cluster mode for the controller, but not for the nodes.

As a limitation, all the code to be tested needs to be contained on the operation function, including any imports for external modules. Remember to install all the dependencies for the code on the nodes.

Available in GitHub

The source code and more info can be found in GitHub and it can be installed through PyPi So, if any of this sounds interesting, go there and feel free to use it! Or change it! Or make suggestions!

Happy loadtesting!

%d bloggers like this: