Notes about ShipItCon 2023

It’s this time of the year again, and another ShipItCon review is due!

I’ve talked about the previous instances of this conference, which happens in Dublin, roughly once a year (though the pandemic years it did not happened), and I think is a quite interesting conference, especially as it revolves more about the concept of Software Delivery, in a broad sense. That makes for very different talks, from quite technical ones to other focused more on team work or high concepts.

Perhaps because I’m getting older, I appreciate that kind of “high-level” thinking, instead of the usual tech talks diving deep into some core nerdy tech. It’s difficult to do properly, but when it resonates, it does quite heavily.

The conference

The conference itself is run in The Round Room. This is a historical place in Dublin, which actually served as the place for the first Dáil Éirann (Assembly of Ireland) to meet.

The whole red colour gives interesting danger undertones

The round space and the fact that you sit on round tables gives some “Awards” or “political fundraising” vibes, more than the classical impersonal hotel in most tech conferences. The place is quite full of character, and it allows for the sponsors to be located in the same room, instead on the hall.

There are some breaks to allow people to mingle and chat, but the agenda is quite packed. As well as an after party, which I didn’t attend because I had stuff to do.

As in last year’s, CK (Ntsoaki Phakoe) acted as MC, introducing the different guests and making it flow. It’s unconventional, but it helps to keep momentum and helps making everything more consistent.

The topic for this year was the unknown, and, in particular, the unknown unknowns.

Talks and notes

Some ideas I took from all the different talks

Keynote by Norah Patten

Things started quite high with Dr Patten, which is set to be the first Irish astronaut. She described her path, from her childhood in the West of Ireland, where she was fascinated with space, especially after she visited NASA. She studied aeronautics engineering in Limerick and later in the International Space University in Strasbourg, where she got exposed to international cooperation, key for space exploration.

She got involved into experiments in microgravity and participated in Project Possum, a program to train scientists into microgravity and spaceflight. One of the highlights of the talk was to see videos of her absolutely ecstatic when reaching microgravity in some of the flights, where you can see that she really belongs out of Earths gravity.

Some interesting details she went over were the training on space suit, where the suit is pressurised and it has a lot of noise inside, as well as all her work inspiring other people, both to help experiments to run in microgravity environments, and well as a general role model.

What’s slowing your team by Laura Tacho

Laura talked about the different ways of measuring the performance of teams, and how there are now some metrics in the form of DORA or SPACE. But, she argued, those are related more to known unknowns than to unknowns, following the topic of the conference.

All data is data

that include things like opinions or 1:1 conversations

She studied recently a lot of trams and she found common problems in most cases, in three areas:

  • Projects:
    • Too much work in progress
    • Lack of prioritisation
  • Process:
    • Slow feedback, both from people (like feedback on PRs) and from machines (like time for the CI pipeline to run)
    • Not enough focus time, like time free of interruptions
  • People:
    • Unclear expectations, which can lead to a loop of micromanaging, as tasks are defined too broad, which leads to not complete the expected work, then correct by micromanaging, and start from there again the loop.

All of the elements of the list are probably not surprising to anyone with some experience. The problem is not that much to recognise these problem than act against them effectively.

To continue with the work, she recommended DevEx. Interestingly, she also recommended the book Accelerate, which was commented in a couple more occasions during the day. It’s a great book and I also recommend it.

Large Language Models limitations by Mihai Criveti

This feel over the more “technical side” of the talks, and, unfortunately, got some tech problems in the presentation. It discussed some of the problems with LLMs and some ways of mitigating them.

The interesting bits that I got there were that LLMs are very very slow (in Mihai’s words, hundreds of times slower than a 56K baud modem from 1995), and expensive. Expensive to operate, as being so slow requires a lot of hardware (around 20K in a cloud provider) for “someone” writing at a very slow pace, as we’ve seen. But prohibitively expensive to train with new data.

An LLM model requires first a training phase, where it learns from a corpus of data. This is insanely expensive. And then an operation phase, where it uses it to answer a prompt. But LLMs doesn’t have memory, they only can generate information from the training. there are some tricks to “fake” having memory, like adding the previous questions and answers to the prompt. But prompt has a small limit, and the whole amount that it can generate is around 7 pages of texts, and it gets weirder the more it has to generate. Which makes difficult to create applications where either it has to understand specific information (as it needs to be trained for that) or it requires a relatively big input or output.

Another interesting note is that LLMs work only with text, so anything that needs to interact with them needs to be transformed into pure text, which makes difficult to work with things like tables on a PDF, for example.

How to make your automation a better team player by Laura Nolan

Laura already gave talks in previous years and they have always been interesting. She discussed some concepts coming back from the 80s in terms of operators (people operating systems, particularly in production environments) and automated systems and how the theory of the JCS (Joint Cognitive Systems) emerged.

This theory is based in the idea that both automated systems and operators cooperate in complex ways and influence each other. For example, having an automated alarm can actually make a human operator to relax and not perform checks in the same way, even if the alarm is low quality.

Looking through that lens, there are two antipatterns that she commented:

  • Automation surprise, the idea that automation can produce unexpected results or run amok. Some ideas to avoid these problems are:
    • Avoid scattered autonomous processing, like distributed cronjobs (she gave a great advice avoid independent automatic OS updates). Instead, treat them as a proper maintained service.
    • Be as predictable as possible
    • Clearly display intended actions, including through status pages or logs
    • Allow operators to stop, resume or cancel actions to avoid further problems.
  • “Clippy for prod” or very specific recommendations and “auto-pilot” for operations.
    • “It looks like you want to reconfigure the production cluster, may I suggest you how?”.
    • Very specific recommendations can misdirect if they are wrong, so be careful.
    • Provide ways of understanding system behaviour is often more useful. For example, “Hard drive full, here are the biggest files” can lead to more insight than “Hard drive full, clean logs? Yes|No”

Finally, she talked about two types of tools under JCS, amplifiers and prosthetics. Amplifiers are tools that allow to do more for the operator, normally in a repeated way. For example, a tool like Fabric, which can deploy into multiple hosts and execute some commands on them. It doesn’t change the nature of the commands, which are still quite understandable. If it fails in one host, but not others, I can probably understand why.

In the other hand, prosthetics are tools that totally replace the abstractions. On one way, they allow to perform things that are totally impossible otherwise. But, at the same time, makes use totally dependent on the underlying automation. Here there are things like the AWS console. What’s going on inside? Who knows. If I try to create 10 EC2 servers and one fails, I can only guess.

Laura suggested building amplifiers when possible and to not hide the complexities. I think is a bit analogue to the “leaky abstractions” problem. At the same time, in my mind I relate it to my way to look at the cloud, where, in general, I prefer to use “understandable” blocks as much as possible (instances, databases, other storage, etc) and build on top of that, over using complex “high level rich services”. I may write at some point a longer post about this.

Granting developer autonomy while establishing secure foundations by Ciaran Carragher

Ciaran gave a talk focused in security, and how it should be looked to the lens of allowing developers their autonomy while establishing secure paths and try to simplify security. Or, better describe, make the easy path secure.

Looking at different perspectives, there are some opposing views. Developers want to have immediate access when needed, to build and test without interrupting their flows, and to be able to perform tasks when they evolve. At the same time, from the point of view of security, there are requirements in terms of minimising risks, follow regulation, and ensure that security is key. So there’s a dilemma in allowing that teams can remain performant and innovative, while the system is secure and, especially, the customer data is protected.

He went to describe several examples on his experience, and some actions taken. Most of that was quite AWS specific, which seemed like his area of expertise. His main recommendations were to find a balance, trying to not bother developers too much provide the means to be secure; and engage with experts (in his case, talked about AWS, but I’m generalising it a bit) as early as possible to be sure to not do silly mistakes. Security is hard!

Unpeeling the onion of uncertainty by Rob Meany

Rob described how our developer careers are filled with failures. I think that everyone can relate. Even the most successful software companies release way too many things (products, features) that don’t really get any traction. We are talking that most software release is a disappointment, even in the best of circumstances.

So we need to, first embrace this fallibility and be prepared to not be too attached. And, second, try to fail faster, so we can confront our releases with reality as soon and as often as possible.

We spend too much time building the “wrong” solution in the “right” way

He has this idea of the “onion of uncertainty” which in its outside is more uncertain (more related to the real world and business) and it becomes more technical and manageable as we get inside. So a new feature will need to start out, go inside, and then get back again out to be confronted. He created a list of principles to try to navigate it

I think it follows quite closely the principles of Agile. But it was very well constructed and thoughtful talk and he gave quite concrete examples. It was one of my favourite talks on the conference.

Just because you didn’t think of it doesn’t make it an edge case by Jamie Danielson

Following on the unknown topic, Jamie talked how there are a lot of events in a production system that you cannot really foresee, and that requires a proper observability structure to be able to address and detect. She followed a real example and gave a few recommendations:

  • Instrument your code
  • Use feature flags to allow quick enable and disable of new code
  • Iterate and improve in fast loops: reproduce, fix, rinse and repeat

Panel discussion moderated by Damien Marshall, with Laura Nolan, João Rosa, Diren Akko and Siún Bodley

The classical panel discussion about the topic of the conference: the unknown. some of my notes:

  • Learning from failures
  • Importance of documentation
  • People as single points of success
  • Team’s autonomy is key to reduce risks
  • Think outside of your own area
  • Imagine before release a very bad thing had happen (“pre-mortem” meeting)
  • Importance of instrumentation
  • Prepare in advance for stressful days (release, etc), prescale
  • Monitor what your users are experiencing, to proper prioritise fixes. If the user doesn’t access it, is it worth fixing?
  • Bring observability as part of the dev process

For some strange reason, after all the unknown talk, I got a bit disappointed that Damien Marshall didn’t sing “Into the unknown“. But it’s just me, I guess.

The hidden world of Kubernetes by David Gonzalez

David is a Google Dev Expert at GCP, and he went into the metaphore that Kubernetes is the equivalent of an airport. There can be many kinds of planes, but airports ned to be standarised to accept all kinds of them, and they are incredibly complex structures that streamline the operations of the planes.

Kubernetes is an API as DSL modelling of the concept of the software deployment

If you are embarquing into Kubernetes (and he defended that many companies doesn’t need to, and can work with a simpler orchestration approach), you need to transform your processes to fully embrace, as some of the traditional software practices are opposed.

He set a list of ideas to make it work:

  • Fully embrace the platform. That means both Kubernetes and the underlying cloud platform. If you are in AWS (or Azure, or GCP), go “all-in”
  • Deliver often and. beready to roll back
  • Don’t run Kubernetes if you don’t have the scale or resources
  • “Build it and they’ll come” doesn’t work. You’ll need an SRE team that deal with. the platform, where 90% of the work should be automated, and the 10% missing will be plenty to keep everyone occupied

He described that the SRE team is key, and they need. tobuild the platform, but also evangelise about it, help teams with manifests (again, fully embracing all the features that Kubernetes offers). The real value of an SRE team is that is able to scale logarithmically as the rest of the company grows linearly. In non-technical terms this means that a small team can support a pretty big organisation.

Innersource, open collaboration in a complex world by Clare Dillon

Clare talked about the concept of Innersource, which, as she described it is “Open Source within a firewall“, using the concepts and practices of Open Source inside a company.

The main practices are that the code is visible to everyone in the company; there’s a big focus in documentations (like READMEs, roadmaps, etc); to contribute externally from a team is possible and even welcome (including details like enabling mentoring or reviewing the code) and that there’s community engagement.

Right now is gaining quite a lot of traction, thanks to different aspects, like the increase in remote and hybrid work; the war on talent and ability to attract people used to work in Open Source already; and the gains in code reusability and developer productivity.

Automating observability on AWS by Luciano Mammino

Luciano described an observability tool that he has been working on that is aimed at serverless applications in AWS, SLIC watch.

In essence, is a tool that will enhance existing definitions of applications in serverless through Cloud Formation to add observability tools like CloudWatch, X-Ray and other available options. It aims to solve the 80% of the definition work, by giving a good “starting point” that then needs to be tweaked.

It sounded like a great tool for the specific case, and they are working in expanding it to improve it.

Final thoughts

As in previous years, and as you can see by the amount of notes that I take (i can assure you that I have many more in my notebook), I think is a fantastic conference and it’s full of thought-provoking ideas. I may not agree with all of them, but they make me think, which I appreciate.

If you are interested in the difficult art of releasing software and can be around Dublin in September, I definitively recommend attending. It’s an intense day, but full of ideas that will keep your head busy for a while afterwards!

Leave a comment