Wrong Side of Memphis

Python Automation Cookbook on sale for Black Friday

Black Friday offer!

For this week, you can get the book on the Packt web page from $10 in ebook format.

You can get more information about the book in the page or in this post.

Posted on October 28, 2018 by Jaime Buelta

Package and deploy a Python module in PyPI with Poetry, tox and Travis

I’ve been working for the last couple of days in a small command line tool in Python, and I took the opportunity to check out a little bit Poetry, which seems to help in package and distribute Python modules.

Enter pyproject.toml

A very promising development in the Python ecosystem are the new pyproject.toml files, presented in PEP 518. This file aims to replace the old setup.py with a config file, to avoid executing arbitrary code, as well as clarify the usage.

Poetry in no motion. Photo by Pixabay on Pexels.com

Poetry generates a new project, and includes the corresponding pyproject.toml.


[tool.poetry]
name = "std_encode"
version = "0.2.1"
description = "Encode and decode files through the standard input/output"
homepage = "https://github.com/jaimebuelta/std_encode"
repository = "https://github.com/jaimebuelta/std_encode"
authors = ["Jaime Buelta "]
license = "MIT"
readme = "README.md"


[tool.poetry.dependencies]
python = ">=2.7, !=3.0, !=3.1, !=3.2, !=3.3, !=3.4, =0.12"]
build-backend = "poetry.masonry.api"

Most of it is generated automatically by Poetry, there are a couple of interesting bits:

Python compatibility

[tool.poetry.dependencies]
python = ">=2.7, !=3.0, !=3.1, !=3.2, !=3.3, !=3.4, <4"

This makes it compatible with Python 2.7 and Python 3.5 and later.

Including documentation automatically

The added README.md will be automatically included in the package.

Easy dependencies management

The dependencies are clearly stated, in particular, the difference between dev dependencies and regular dependencies. Poetry creates also a poetry.lock file that includes the versions, etc.

Scripts and entry points

This package creates command line tools. This is easy to do describing scripts.


[tool.poetry.scripts]
std_decode = 'std_encode:console.run_sd'
std_encode = 'std_encode:console.run_se'

They’ll call the function run_sd and run_se on the console.py file.

Testing the code with cram and tox

Cramtastic!

As the module is aimed as a command line tool, the best way of testing it is through command line actions. A great tool for that is cram. It allows to describe a test file as a series of command line actions and the returned standard output. For example:


Setup

  $ . $TESTDIR/setup.sh

Run test

  $ echo 'test line' | std_decode
  test line

Any line starting with $ is a command, and any line following, the result, as it will appear in the console. There’s a plugin for pytest, so it can be integrated in a bigger test suite with other Python tests.

Ensuring installation and tests with tox

To run the tests, the process should be:

Generate a package with your changes.
Install it in a virtual environment.
Run all the cram tests, that will call the installed command line scripts.

The best way of doing this is to use tox, that also adds the possibility of running it over different Python versions.

All and all they’re just another cog in the tox. Photo by Pixabay on Pexels.com

To do so, we create a tox.ini file


[tox]
isolated_build = true
envlist = py37,py27

[testenv]
whitelist_externals = poetry
commands =
  poetry install -v
  poetry run pytest {posargs} tests/

Which defines two environments to run the tests, Python 2.7 and 3.7, and for each poetry installs the package and then runs the tests using pytest.

Running

$ tox

Runs the whole suite, but while testing, to speed up development, you can do instead

$ tox -e py37 -- -k my_test

The parameter -e runs only in one environment, and anything after the -- will be transferred to pytest to select only a subset of tests, or any other possibility.

Locally, this allow to run and iterate on the package. But we also want to run the test remotely in CI fashion.

CI and deployment with Travis

Travis-CI is a great tool to setup in your open source repo. Enabling your GitHub repo can be done very quickly. But after enabling it to our repo, we need to define the .travis.yml file with info.

language: python
python:
- '2.7'
- '3.5'
- '3.6'
matrix:
  include:
  - python: 3.7
    dist: xenial
before_install:
- pip install poetry
install:
- poetry install -v
- pip install tox-travis
script:
- tox
before_deploy:
- poetry config http-basic.pypi $PYPI_USER $PYPI_PASSWORD
- poetry build
deploy:
  provider: script
  script: poetry publish
  on:
    tags: true
    condition: "$TRAVIS_PYTHON_VERSION == 3.7"
env:
  global:
  - secure: [REDACTED]
  - secure: [REDACTED]

The first part defines the different build to run, for versions of Python 2.7, 3.5, 3.6 and 3.7.

Version 3.7 requires to be executed in Ubuntu Xenial (16.04), as by default travis uses Trusty (14.04), which doesn’t support Python 3.7.

language: python
python:
- '2.7'
- '3.5'
- '3.6'
matrix:
  include:
    - python: 3.7
      dist: xenial

The next part describes how to run the tests. The package tox-travis is installed to seamless integrate both. This makes travis run versions of Python that are not included in tox.ini.

before_install:
- pip install poetry
install:
- poetry install -v
- pip install tox-travis
script:
- tox

Finally, a deployment part is added.

before_deploy:
- poetry config http-basic.pypi $PYPI_USER $PYPI_PASSWORD
- poetry build
deploy:
  provider: script
  script: poetry publish
  on:
    tags: true
    condition: "$TRAVIS_PYTHON_VERSION == 3.7"

The deploy is configured to happen only if a git tag is set up, and if the build is using Python 3.7. The last condition can be removed, but then the package will be uploaded several times. Poetry ignores it if that’s the case, but it’s just wasteful.

The package is build before deploying it with poetry publish.

To properly configure the access to PyPI, we need to store in secure variables our login and password. To do so, install the travis command line tool, and encrypt the secrets, including the variable name.

$ travis encrypt PYPI_PASSWORD=<PASSWORD> --add env.global
$ travis encrypt PYPI_USER=<USER> --add env.global

The line


poetry config http-basic.pypi $PYPI_USER $PYPI_PASSWORD

will configure poetry to use these credentials and upload the packages correctly.

Release flow

business equipment factory industrial plant

I image the builds entering an assembly line while Raymond Scott’s Powerhouse plays Photo by Pixabay on Pexels.com

After all this in place, to prepare a new release of the package, the flow will be like this:

Set up the new functionality and commits. Travis will run the tests to ensure that the build works as expected. This may include bumping the dependencies with poetry update.
Once everything is ready, create a new commit with the new version information. This normally includes:
1. Run poetry version {patch|minor|major|...} to bump the version.
2. Set up any manual changes, like release notes, documentation updates or internal version references.
Commit and verify that the build is green in travis.
Create a new tag (or GitHub release) with the version. Remember to push the tag to GitHub.
Travis will upload the new version automatically to PyPI.
Spread the word! Your package deserves to be known!

The future, suggestions and things to keep an eye to

There are a couple of elements that could be a little bit easier in the process. As pyproject.toml and poetry are quite new there are a couple of rough edges that could be improved.

Tags, versions and releases

Poetry has a version command to bump the version, but its only effect is to change the pyproject.toml file. I’d love to see an integration to update more elements, including internal versions like the one in __init__.py that gets generated automatically, or ask for release notes and append them to a standard document.

There’s also no integration with generating a git tag or GitHub release in the same command. You need to perform all these commands manually, while it seems like they should be part of the same action.

Something like:

$ poetry version
Generating version 3.4
Append release notes? [y/n]:
Opening editor to add release notes
Saved
A new git commit and tag (v3.4) will be generated with the following changes:
pyproject.toml
- version: 3.3
+ version: 3.4
src/package/__init__.py
- __version__ = "3.3"
+ __version__ = "3.4"
RELEASE_NOTES.md
+ Version 3.4 [2018-10-28]
+ ===
+ New features and bugfixes
Continue? [y/n/c(change tag name)]
Creating and pushing...
Waiting for CI confirmation
CI build is green
Creating new tag. Done.
Create a new release in GitHub [y/n]

This is a wishlist, obviously, but I think it will fit the flow of a lot of GitHub releases to PyPI.

Ready for release! Photo by rawpixel.com on Pexels.com

Travis work with Python 3.7 and Poetry

I’m pretty sure that travis will update the support for Python 3.7 quite soon. Having to define a different environment feels awkward, though I understand the underlying technical issues with it. It’s not a big deal, but I imagine that they’ll fix it so the definition is the same wether you work on 3.6 or 3.7. 3.7 was released 4 months ago at this time.

The other possible improvement is to add pyproject.toml support. At the moment setup.py uploads to PyPI is natively supported, so adding support for pyproject.toml will be amazing. I imagine it will be added if more projects uses this way of packaging more and more.

Final words

Having a CI running properly, and a deployment flow is actually a lot of work. Even doing it with great tools like the ones discussed here, there’s a lot of details to keep into account and polishing bits that need to be considered. It took me around a full day of experimentation to get this setup, even if I worked previously with travis (I configured it for ffind some time ago).

Poetry is also a very promising tool, and I’ll keep checking it. The packaging world in Python is complicated, but there has been a lot of work recently to improve it.

3 Comments

Posted on October 8, 2018 by Jaime Buelta

Python Automation Cookbook

So, great news, I wrote a book and it’s available!

Receiving your own physical book is very exciting!

It’s called Python Automation Cookbook, and it’s aimed to people that already know a bit of Python (not necessarily developers only), but would like to use it to automate common tasks like search files, creating different kind of documents, adding graphs, sending emails, text messages, etc. It’s written in the cookbook format, so it’s a collection of recipes that can be read independently, though there’s always references to show how to combine them to create more complex flows.

The book is available both in the Packt website and in Amazon. There’s more information about the book there, like previews and samples, in case anyone is interested…

This is my first written book, so all is very exciting. The process itself has been a lot of work, but not without its fun parts. I’m also quite proud of having written it in English, not being my first language.

2 Comments

Posted on October 22, 2017 by Jaime Buelta

A Django project template for a RESTful Application using Docker – PyCon IE slides

Just putting here the slides for my presentation in the PyCon Ireland, that are a follow up from this blog post. I’ll try to include the video if/when available.

I hand drawn all the slides myself, so it was a lot of fun work!

Enjoy!

A Django project template for a RESTful application using Docker from Jaime Buelta

2 Comments

Posted on August 27, 2017 by Jaime Buelta

Notes about ShipItCon 2017

Disclaimer: I know personally and worked with a good portion of the conference organizers and talkers. I label them with an asterisk*.

The ShipItCon finally took place last Friday. I think it’s quite impressive, given the short amount of time since announcing it and being the first edition, that was so well organized. The venue was very good (and fairly unusual for a tech conference), and all the usual things that are easy to take as granted (food, space, projector, sound, etc) work like clockwork. Kudos to the organizers.

The conference was oriented towards releasing online services, with special emphasis on Continuous Integration/Delivery. I think that focusing a conference over this kind of topic is challenging, as talks need to be generic enough in terms of tools, but narrow enough that is useful. Conferences about a specific technology (like PyCon, RubyConf or Linux Con) are more focused by concept.

Interesting venue for a tech conference #shipItCon pic.twitter.com/6egaxPduM4

— Jaime Buelta 🌻🌻🌻 (@jaimebuelta) August 25, 2017

The following is some notes, ideas and follow up articles that I took. Obviously, there are biased over the kind of things I find more interesting. I’ll try to link the presentation slides if/once they’re available.

The keynote by the Romero family was a great story and addressed a lot of specific points to the game industry (like the design challenges). It was also the exception in talking about shipping something other than a service, but a game (in Steam and iOS). I played a little GunMan Taco Track over the weekend!
- “Ship a game while on a ship“. They released part of the game while on the Queen Elizabeth cruise, crossing the Atlantic.
Release often and use feature toggles, detaching the code release and feature release. This is a point done in the Frederick Meyer talk that I heard recently in other places.
- Friday night releases make me cringe, but it can make sense if the weekend is the lowest activity point of your customers.
- Dependency trees grow to be more and more complex, to the point no one understands them anymore and only automated tools can plot them.
- Challenges in treating data in CI. Use production data? A subset? Fake data? Redacted data? Performance analysis can be tricky.
- Automate what you care about
The need for early testing, including integration/system/performance, was the theme around Chloe Condon talk. Typically, a lot of testing will be performed at the “main branch” (after a feature is merged back) level that can be prepared in advance, giving better and faster feedback to developers. Test early, test often.She presented Codefresh with seems an interesting Cloud CI tool aimed at working with containers.
Lauri Apple talked about communication and how important READMEs and documentation are for projects, both internal and external. The WHAT to build is a key aspect that shouldn’t be overlooked.
- READMEs should include a roadmap, as well as info about installation, run and configure the code.
- This project offers help, review and advice for READMEs. I’ll definitively submit a review for ffind (after I review it and polish it a little bit myself).
- She talked about the Open Organization Maturity Model, a framework about how open organizations are.
- A couple of projects in Zalando that catches my eye:
  - Patroni, an HA template for PostgreSQL
  - Zalenium, distribute a Selenium Grid over Docker to speed up Selenium tests.
  - External DNS, to help configure external DNS access (like AWS Route 53 or CloudFare) to Kubernetes cluster.
If it hurts, do it more frequently. A great quote for Continuous Delivery and automated pipelines. Darin Egan talked about the mindfullness principes and how the status quo get challenges and driving change opposes inertia.
The main point in Ingrid Epure‘s talk was the integration of security practices during the development process and the differences between academia and engineering practices.
- Linters can play a part in enforcing security practices, as well as automating autoformatting to leave format differences out from the review process.
- Standardizing the logs is also a great idea. Using Canonical Log Lines for Online Visibility. I talked before about the need to increasing logs and generate them during the development process.
Eric Maxwell talked about the need to standardise the “upper levels” of the apps, mainly related to logging and metrics, and making applications (Modern Applications) more aware of their environment (choreography vs orquestration) and abstracted from the underlying infrastructure.
- He presented habitat.sh, a tool aimed at working with these principles.
- Packaging the application code and letting the tool to do the heavy lifting on the “plumbing“
The pipeline in Intercom was discussed by Eugene Kenny, and the differences between “the ideal pipeline” and “the reality” of making dozens of deployments every day.
- For example, fully test and deploy only the latest change in the pipeline, speeding deployments at the expense of fewer separations of changes.
- Or allow locking the pipeline when things are broken.
- Follow up article: Continuous Deployment at Instagram
Observability is an indispensable property for online services: the ability to check what’s going on in production systems. Damien Marshall* had this concept of graphulsion that I can only share.

https://twitter.com/damo_marshall/status/614165915987480576

He gave some nice ideas on observability through the whole life cycle:

Development:

Make reporting logs and metrics simple
Account for the effort to do observability work
Standardize what to report. The three most useful metrics are Request Rate, Error Rate and Duration per Request.

Deployment:

Do capacity planning. Know approximately the limits of your system and calculate the utilization of the system (% of that limit)
Ship the observability

Production:

Make metrics easy to use
Centralise dashboard views across different systems
Good alerting is hard. Start and keep it simple.

Riot Games uses custom generation of services to generate skeletons and standardise good practices and reduce development time. Adam Comeford talked about those practices and how they implemented them.
- Thinking inside the container.
- Docker-gc is a tool to reduce the size of image repos, as they tend to grow very fast very quickly.
Jacopo Scrinzi talked about defining Infrastructure as Code, making the infrastructure changes through the same process as code (review, subjected to source control, etc). In particular using Terraform and Atlas (now Terraform Enterprise) to make automatic deployments, following CI practices for infrastructure.
- Using modules in Terraform simplifies and standardises common systems.
The last keynote was about Skypilot, an initiative inside Demonware to deploy a game fully using Docker containers over Marathon/Mesos , in the Cloud. It was given by Tom Shaw* and the game was last year’s release of Skylanders. As I’ve worked in Demonware, I know how big an undertaking is to prepare the launch of a game previously in dedicated hardware (and how much in underused to avoid risks), so this is a huge improvement.

Wrapping up #shipItCon2017 Great conference, congratulations @Shipitcon! pic.twitter.com/igzBmKp6o5

— Jaime Buelta 🌻🌻🌻 (@jaimebuelta) August 25, 2017

As noted by the amount of notes I took, I found the conference very interesting and full of ideas that are worth following up. I really expect a ShipItCon 2018 full of great content.

ffind v1.2.0 released!

The new version of ffind v1.2.0 is available in GitHub and PyPi. This version includes the ability to configure defaults by environment variables and to force case insensitivity in searches.

You can upgrade with

    pip install ffind --upgrade

This will be the latest version to support Python 2.6.

woman programming on a notebook — Photo by Christina Morillo on Pexels.com

Happy searching!

4 Comments

Posted on July 30, 2017 by Jaime Buelta

A Django project template for a RESTful Application using Docker

I used what I learn and some decisions to create a template for new projects. Part of software development is mainly plumbing. Laying bricks together and connecting parts so the important bits of software can be accessing. That’s a pretty important part of the work, but it can be quite tedious and frustrating.

This is somehow a very personal work. I am using my own opinionated ideas for it, but I’ll explain the thought process behind them. Part of the idea is to add to the discussion on how a containerised Django application should work and what is the basic functionality that is expected for a fully production-ready service.

The code is available here. This blog post covers more the why’s, while the README in the code covers more the how’s.

GO CHECK THE CODE IN GITHUB!

Summary

a Django RESTful project template
sets up a cluster of Docker containers using docker-compose
include niceties like logging, system tests and metrics
code extensively commented
related blog post explaining the why and opening the discussion to how to work with Docker

Docker containers

Docker is having a lot of attention in the last few years, and it promises to revolutionise everything. Part of it is just hype but nevertheless is a very interesting tool. And as every new tool, we’re still figuring out how it should be used.

The first impression when dealing with Docker is to treat it as a new way of a virtual machine. Certainly, it was my first approach. But I think is better to think of it a process (a service) wrapped in a filesystem.

All the Dockerfile and image building process is to ensure that the filesystem contains the proper code and configuration files to then start the process. This goes against daemonising and other common practices in previous environments, where the same server handled lots of processes and services working in unison. In the Docker world, we replace this with several containers working together, without sharing the same filesystem.

The familiar Linux way of working includes a lot of external services and conveniences that are no longer required. For example, starting multiple services automatically on boot makes no sense if we only care to run one process.

This rule has some exceptions, as we’ll see later, but I think is the best approach. In some cases, a single service requires multiple processes, but simplicity is key.

Filesystem

The way Docker file system work is by adding a layer on top of another layer. That makes the build system to series of steps that execute a command that changes the filesystem and then exits. The next step builds on top of the other.

This has very interesting properties, like the inherent caching system, which is very useful for the build process. The way to create a Dockerfile is to set the parts that are least common to change (e.g. dependencies) on top, and put at
the end of it the ones that are most likely to need to be updated. That way, build times are shorter, as only the latest steps need to be repeated.
Another interesting trick is to change the order of the steps in the Dockerfile while actively developing, and move them to their final place once the initial setup (packages installed, requirements, etc) is stable.

Another property is the fact that each of these layers can only add to a filesystem, but never subtract. Once a build operation has added something, removing it won’t free space of the container. As we want to keep our containers as minimal as possible, care on each of the steps should be taken. In some cases, this means to add something, use it for an operation, and then remove it in a single step. A good example is compilation libraries. Add the libraries, compile, and then remove them as they won’t be used (only the generated binaries).

Alpine Linux

Given this minimalistic approach, it’s better to start as small as possible. Alpine is a Linux distribution focused on minimalistic containers and security. Their base image is just around 4MB. This is a little misleading, as installing Python will bring it to around 70MB, but it’s still much better than something like an Ubuntu base image, at around 120MB start and very easy to get to 1GB if the image build is done in a traditional way, installing different services and calling apt-get with abandon.

This template creates an image around 120MB size.

Embed from Getty Images

Running a cluster

A single container is not that interesting. After all is not much more than a single service, probably a single process. A critical tool to work with containers is to be able to set several of them to work in unison.

The chosen tool in this case is docker-compose which is great to set up a development cluster. The base of it is the docker-compose.yaml file, that defines several “services” (containers) and links them all together. The docker-compose.yaml file contains the names, build instructions, dependencies and describes the cluster.

Note there are two kinds of services. One is the container that runs and ends, producing some result as an operation. For example, run the tests. It starts, runs the tests, and then ends.
The other one is to run a long running service. For example, run a web server. The server starts and it doesn’t stop on its own.
In the docker-compose there are both kind of services. server and db are long running services, while test and system-test are operations, but most of them are services.

It is possible to differentiate grouping them in different files, but dealing with multiple docker-compose.yaml files is cumbersome.

The different services defined, and their relationships, are described in this diagram. All the services are described individually later.

As it is obvious from the diagram, the main one is server. The ones in yellow are operations, while the ones in blue are services.
Note that all services exposes their services in different ports.

Codebase structure

All the files that relates to the building of containers of the cluster are in the ./docker subdirectory, with the exception of the main Dockerfile and docker-compose.yaml, that are in the root directory.

Inside the ./docker directory, there’s a subdir for each service. Note that, because the image is the same, some services like dev-server or test inherits the files under ./docker/server

The template Django app

The main app is in the directory ./src and it’s a simple RESTful application that exposes a collection of tweets, elements that have a text, a timestamp and an id. Individual tweets can be retrieved and new ones can be created. A basic CRUD interface.

It makes use of the Django REST framework and it connects to a PostgreSQL database to store the data.

On top of that, there are unit tests stored in Django common way (inside ./src/tweet/tests.py. To run the tests, it makes usage of pytest and pytest-django. Pytest is a very powerful framework for running tests, and it’s worth to spend some time to learn how to use it for maximum efficiency.

All of this is the core of the application and the part that should be replaced for doing interesting stuff. The rest of it is plumbing to making this code to run and to have it properly monitored. There are also system tests, but I’ll talk about them later.

The application is labelled as templatesite. Feel free to change the name to whatever makes sense for you.

The services

The main server

The server service is the core of the system. Though the same image is used for multiple purposes, the main idea is to set up the Django code and make it run in a web server.

The way this is achieved is through uwsgi and nginx. And yes, this means this container is an exception about running a single process.

As shown, the nginx process will serve the static files, as generated by Django collectstatic command, and redirect everything else towards a uWSGI container that runs the Django code. They are connected by a UNIX socket.

Another decision has been to create a single worker on the container. This follows the minimalistic approach. Also, Prometheus (see below) doesn’t like to be round robin behind a load balancer in the same server, as the metrics reported are inconsistent.

It is also entirely possible to run just uWSGI and create another container that runs nginx and handles the static files. I chose not to because this creates a single HTTP server node. Exposing HTTP with uWSGI is not as good as with nginx, and you’ll need to handle the static files externally. Exposing uWSGI protocol externally is complicated and will require some weird configuration in the nginx frontend. This makes a totally self-contained stateless web container that has the whole functionality.

The Database

The database container is mainly a PostgreSQL database, but a couple of details have been added to its Dockerfile.

Embed from Getty Images

After installing the database, we add our Django code, install it, and then run the migrations and load the configured fixtures. All of this at build time. This makes the base image to contain an empty test database and a pre-generated general database, helping for quick setup of tests. To get a fresh database, just bring down the db container and restart it. No rebuild is needed unless there are new migrations or new fixtures.

In my experience with Django, as project grow and migrations are added, it slowly takes more and more time to run the tests if the database needs to be regenerated and fixtures to be loaded again. Even if the --keepdb option is used from tests, sometimes a fresh start is required.

Another important detail is that this database doesn’t store data in a persistent volume, but just inside the container. This is aimed not to work as a persistent database, but to run quickly and to be able to be regenerated into a known state with ease. If you need to change the start data, change the fixtures loaded. Only put inside things you are ok losing.

As part of the setup, notice that the following happens. The database needs to be started and then another process, the Django manage.py, loads the fixtures. Then the database is turned down and the container exists. This is one of the cases where multiple processes need to run in a container. The turn down is important, as ending the PostgreSQL process abruptly can lead to data corruption. Normally on the next startup of the database, it will be corrected, but it will take a little time. It’s better to end the step cleanly.

Logs

Logging events is critical for a successful operation in a production environment and typically is done very late in the development process. I try to introduce logging as I run my unit tests and add different information while developing, which it helps quite a lot in figuring out what’s going on on production servers.

A big part of the pain of logging is to set up properly the routing of the logs. In this case, there’s a dedicated log container running syslog where all the INFO and up logs from the server are directed. The collected logs are stored in a file on the container that can be easily checked.

All the requests are also labelled with a unique id, using the Django log request id middleware. The id can also be forwarded through the X-REQUEST-ID HTTP header, and it will be returned in the responses. All the logs from a request will include this ID, making easy to follow what the request has done.

When running the unit tests, the DEBUG logs are also directed to the standard output, so they’ll show as part of the pytest run. Instead of using print in your unit test while debugging a problem, try to use logging and keep what it makes sense. This will keep a lot of useful information around when a problem arises in production.

Metrics

Another important part of successful production service. In this case, it is exposing metrics to a Prometheus container.
It uses the prometheus-django module and it’s exposing a dashboard.

Embed from Getty Images

There’s also included a Grafana container, called metrics-graph. Note that these two containers are being pulled from their official images, instead of including a tailored Dockerfile. The metrics container has some minimal configuration. This is because the only requirement is to expose the metrics in a Prometheus format, but creating dashboards or making more detailed work on metrics is out of the scope for this.

The good thing about Prometheus is that you can cascade it. It works by fetching the data from our web service itself (through the /metrics URL), and at the same time it exposes the same URL with all the data it pools. This makes possible to very easily create a hierarchical structure where a container picks information about a few servers and then exposes the information to another one, that groups the data from a few Prometheus containers.

Prometheus query language and metrics aggregation are very powerful, but at the same time is very confusing initially. The included console has a few queries for interesting data in Django.

Handling dependencies

The codebase includes two subdirectories, ./deps and ./vendor. The first one is to include your direct dependencies. That mean your own code, that lives in a different repo. This allows you to set a git submodule and use it as a imported module. There’s a README file to show some tips on using git submodules, as they are a little tricky.

The idea behind this is to avoid the usage of git pulling from a private repo inside a requirements file, which requires setup of authentication inside the container (adding ssh keys, git support, etc). I think is better to handle that at the same level as your general repo, and then import all the source code directly.

./vendor is a subdirectory to contain a cache of python modules in wheel format. The service build-deps builds a container with all the stated dependencies in the requierements.txt file and precompile them (among all sub-dependencies) in convenient wheel files. Then the wheel files can be used to speed up the setup of other containers. This is optional, as the dependencies will be installed in any case, but greatly speeds up rebuilding containers or tweaking requirements.

Testing and interactive development

The container test runs the unit tests. It doesn’t depend on any external service and can be run in isolation.

The service dev-server starts the Django development web server. This reloads the code as it changes, as it mounts the local directory. It also logs the runserver logs into standard output.

The container system-test independently run tests generating external HTTP requests against the server. They are under the ./system-test subdir and are run using pytest.

Healthcheck

Docker container can define a healthcheck to determine whether a container is properly behaving, and take action if necessary. The application includes a URL root for the heathcheck that currently is checking if the access to the database is correct. More calls to external services can be included in this view.

Embed from Getty Images

The healthcheck pings using curl the local nginx URL, so it also tests the routing of the request is correct.

Other ideas

Though this project aims to have something production-ready, the discussion on how to do it is not trivial. Expect details to require to be changed depending on the system to bring the container to a production environment.
Docker has some subtleties that are worth paying attention, like the difference between up and run. It pays up to try to read with care the documentation and understand the different options and commands.
Secrets management has been done through environment variables. The only real secrets in the project are the database password and the Django secret key. Secret management may depend on what system you use in production. Using docker native secret support for Docker swarm create files inside the container that can be poured into the environment with the start scripts. Something like adding to docker/server/start_server.sh

export SECRET=`cat /run/secret/mysecret`

The Django secret key is injected as part of the build as an ARG. Check that’s consistent in all builds at docker-compose.yaml. The database password is stored in an environment variable.

The ideal usage of containers in CI should be to work with them locally, and when pushed to the repo, trigger a chain of build container, run tests and promote the build until deployment, probably in several stages. One of the best usages of containers is to be able to set them up and not change them all along the process, which was as easy as it sounds with traditional build agents and production servers.
All ideas welcome. Feel free to comment here or in the GitHub repo.

UPDATE: I presented this template in the PyCon Ireland 2017. The slides are available in this blog post.

1 Comment

Posted on February 25, 2017 by Jaime Buelta

Compendium of Wondrous Links vol XI

It has been a while since the last time. More food for though!

Python

Python 3 upgrade strategy. The time has come to take migrating to python 3 seriously.
Another addition to Python-to-c++ compilers, in a similar way to Cython: Pythran. I tested it with code from my recent post $7.11 in four prices and the Decimal type, revisited and it worked quite well, though it’s arguably a very simple test. Speed was quite good, better than Cython, actually.
Powering the Python PyPI. These people deserve a lot of credit. PyPI is really important for the Python environment.
Check the memory usage of a Python program is not very easy, but there are some tools to help with that.

The craft of development and tools

A good flow to choose the proper HTTP status code to return.
A good overview of available tools to check performance in the first 60 seconds login into a Linux server.
You’ll be glad when you have to check your logs. How to write a great error message.
I use ag (the silver searcher), but this is a great article about a similar tool built up to speed. Ripgrep. (and a follow up from Geoff Greer, the maker of ag making them more compatible, which is fantastic)
SAWS: A Supercharged AWS CLI, to help dealing with all those AWS commands. They are a lot.

The environment of development

A take on Employee retention, even if they start by saying not call it that.
How to destroy Programmer Productivity.
A take on Employee retention, even if they start by saying not call it that.
How to destroy Programmer Productivity
The website obesity crisis.
Inseparable from magic. Manufacturing modern computer chips. Really impressive.

Randomness is actually very difficult.

Computers are fast. Even if you see a lot of delays from time to time, they do a lot of stuff each second. Try to guess with this questionnaire.

Miscellanea

The perfect man. I find quite fascinating the lives of people so driven by competitiveness.
Dealing with incentives is always more complicated that it looks like. The Cobra effect.

The resolution of the Bitcoin experiment. An interesting though about Bitcoin.
A 1980 documentary about the music of Star Wars (mainly The Empire Strikes Back)
Magic the Gathering: Twenty years, twenty lessons. Great for insight in games design.

2 Comments

Posted on February 11, 2017 by Jaime Buelta

$7.11 in four prices and the Decimal type, revisited

I happen to take a look to this old post in this blog. The post is 7 years old, but still presents an interesting problem.

“A mathematician purchased four items in a grocery store. He noticed that when he added the prices of the four items, the sum came to $7.11, and when he multiplied the prices of the four items, the product came to $7.11.”

I wanted to check my old solutions again, with the things that I learn in the last years. At that time, I was still a newbie in Python, and there are certainly interesting points to check again.

Python 2 vs Python 3

The first interesting difference is the fact that Python3 is now quite mature, and there’s really no excuse for not using it as a default for small exercises or new projects.

The code can actually be adapted into running both in Python 2 and 3 to make comparisons:

import operator
from itertools import combinations
from functools import reduce
from decimal import Decimal

def solve(number, step, number_of_numbers):
  integer_range = int(number / step)
  all_numbers = (Decimal(i) * step
for i in range(1, integer_range + 1))
  multiples = 1 / (step ** number_of_numbers)
  correct_numbers = (i for i in all_numbers
                     if (number * multiples % i) == 0)

  def check(numbers):
    """Check the numbers added and multiplied are the same"""
    return (sum(numbers) == number
            == reduce(operator.mul, numbers))
  # only combinations are required, as order is irrelevant
  comb = combinations(correct_numbers, number_of_numbers)
  results = [p for p in comb if check(p)]
  return results

print(solve(number=Decimal('7.11'), step=Decimal('0.01'),
            number_of_numbers=4))

I changed a couple of things, like the usage of filter and reduce and better usage of list comprehensions, but the code is very similar. Obtain the numbers that are multiples or the total (as they are the only possible ones to fulfil the criteria), and then try all combinations.

I removed the internal time check, using instead the UNIX one, calling it at bash:

$ time python2.7 711.py
[(Decimal('1.20'), Decimal('1.25'), Decimal('1.50'), Decimal('3.16'))]

real 1m26.020s
user 1m22.199s
sys 0m1.124s

$ time python3.6 711.py
[(Decimal('1.20'), Decimal('1.25'), Decimal('1.50'), Decimal('3.16'))]

real 0m1.148s
user 0m0.955s
sys 0m0.030s

The time difference is quite staggering. 1 second vs 90. They clearly have spend time in optimise Decimal in Python3

Crunching numbers

For the second version of the code, the code is like this

def calc():
  for w in range(711):
    for x in range(711 - w):
      for y in range(711 - w - x):
        z = 711 - x - y - w
        if (w * x * y * z == 711000000):
          print("w: %d x: %d y: %d z: %d" % (w, x, y, z))
          return

wich, this time, is actually slower in python3, and slower than the first solution.

time python2.7 711_v2.py
w: 120 x: 125 y: 150 z: 316

real	0m7.963s
user	0m7.362s
sys	0m0.291s
$ time python3.6 711_v2.py
w: 120 x: 125 y: 150 z: 316

real	0m13.193s
user	0m11.955s
sys	0m0.461s

But, there are two tricks that can be used in this case.

The first one is a simple replacement for PyPy, which indeed speeds up things.

$ time pypy 711_v2.py
w: 120 x: 125 y: 150 z: 316

real	0m2.413s
user	0m0.177s
sys	0m0.094s

Not sure why, but the results where from ~200ms up to 4 seconds, which is quite a variability.

And the second one is to use Cython to compile the code into a C extension. This requires making two files:

# This compiles the extension automatically
import pyximport; pyximport.install()
from c711v3 import calc
calc()

and a c711v3.pyx file, with the Cython code.

def calc():
  # define the variables as C types
  cdef int w, x, y, z
  for w in range(711):
    for x in range(711 - w):
      for y in range(711 - w - x):
        z = 711 - x - y - w
        if (w * x * y * z == 711000000):
          print("w: %d x: %d y: %d z: %d" % (w, x, y, z))
          return

This speeds up the process sensibly, to a quarter of a second:

$ time python 711_v3.py
w: 120 x: 125 y: 150 z: 316

real	0m0.223s
user	0m0.138s
sys	0m0.070s

Not bad from 90 seconds at the start!

Conclussions

Python3 has a lot of optimisations, sometimes in non-obvious places. It should be the version to go unless there are legacy requirements.
It’s also quite simple to keep code compatible with Python2 and Python3 for easy stuff.
I can look at code I wrote 7 years ago and find ways of improving it… Good to feel like I know more.
PyPy is another useful tool that should be considered. It’s also getting better and betters and now it has as well Python3 compatibility. Also, my blog has been around for a while (though I don’t update it as often as I should)
Creating C extensions with Cython is very easy when dealing with certain bottlenecks.
And I still find joy in coding small exercises!

ffind v1.0.2 released!

The new version of ffind (1.0.2) is available in GitHub and PyPi. This version includes the ability to execute python modules and scripts directly and some other minor improvements.

Happy developing!

« 1 2 3 4 5 6 … 16 »

Share this:

Enter pyproject.toml

Python compatibility

Including documentation automatically

Easy dependencies management

Scripts and entry points

Testing the code with cram and tox

Cramtastic!

Ensuring installation and tests with tox

CI and deployment with Travis

Release flow

The future, suggestions and things to keep an eye to

Tags, versions and releases

Travis work with Python 3.7 and Poetry

Final words

Share this:

Share this:

Share this:

Share this:

Share this:

Summary

Docker containers

Filesystem

Alpine Linux

Running a cluster

Codebase structure

The template Django app

The services

The main server

The Database

Logs

Metrics

Handling dependencies

Testing and interactive development

Healthcheck

Other ideas

Share this:

Python

The craft of development and tools

The environment of development

Miscellanea

Share this:

Python 2 vs Python 3

Crunching numbers

Conclussions

Share this:

Share this:

Top Posts

Category Cloud

Archives