Featured posts


A robust distributed locking algorithm based on Google Cloud Storage

Many workloads nowadays involve many systems that operate concurrently. This ranges from microservice fleets to workflow orchestration to CI/CD pipelines. Sometimes it's important to coordinate these systems so that concurrent operations don't step on each other. One way to do that is by using distributed locks that work across multiple systems.

Distributed locks used to require complex algorithms or complex-to-operate infrastructure, making them expensive both in terms of costs as well as in upkeep. With the emergence of fully managed and serverless cloud systems, this reality has changed.

In this post I'll look into a distributed locking algorithm based on Google Cloud. I'll discuss several existing implementations and suggest algorithmic improvements in terms of performance and robustness.

Read more »

A better way to reason about software testing terms

I was recently in a discussion between developers about improving the test coverage of a major software project. They needed guidance about what kind of tests to write, and how to write them.

The discussion quickly became confusing: the topic was too large, too general, and it turns out that there wasn't even a well-defined shared vocabulary for testing concepts! The latter turns out to be a wider problem in the software development community: it simply doesn't have well-defined testing terms!

In this post, I'd like to provide some guidance w.r.t. this matter. I'll discuss:

  • An overview of the most common testing terms.
  • A new way of reasoning about testing concepts: reiterating what actually matters, and categorizing tests based on "size" and "approach".
  • How the existing testing terminology fits in this new model.

Read more »

Debugging Docker builds

One of the projects I'm working on has a CI/CD pipeline that builds Docker images. The Dockerfile runs yarn install, then yarn build. The latter runs the TypeScript compiler tsc. Everything was working fine, but one day the build failed with the following error:

tsc: command not found

But TypeScript is still part of package.json. Nobody touched package.json or yarn.lock recently. Nobody could reproduce the problem locally with Docker: it only happened in the CI/CD pipeline. What is going on? We needed to debug the Docker build on the CI/CD server.

Read more »

What causes Ruby memory bloat?

Ruby apps can use a lot of memory. But why? Various people in the community attribute it to memory fragmentation, and provide two “hacky” solutions. Dissatisfied by the current explanations and provided solutions, I set out on a journey to discover the deeper truth and to find better solutions.

Read more »

Full-system dynamic tracing on Linux using eBPF and bpftrace

Linux has two well-known tracing tools:

  • strace allows you to see what system calls are being made.
  • ltrace allows you to see what dynamic library calls are being made.

Though useful, these tools are limited. What if you want to trace what happens inside a system call or library call? What if you want to do more than just logging calls, e.g. you want to compile statistics on certain behavior? What if you want to trace multiple processes and correlate data from multiple sources?

In 2019, there's finally a decent answer to that on Linux: bpftrace, based on eBPF technology. Bpftrace allows you to write small programs that execute whenever an event occurs.

This article shows you how to setup bpftrace and teaches you its basic usage. I'll also give an overview of how the tracing ecosystem looks like (e.g. "what's eBPF?") and how it came to be what it is today.

Read more »
Page 1 of 2