Duck! Another blag incoming!

libchirp or Software is infinite

2018-12-13

I want to thank Adfinis-SyGroup who have supported me and allowed me to develop libchirp.

Adfinis-SyGroup

Writing a message-passing library and some of the things I learned. In multiple installments.

chirp

It takes always longer

Four years ago I decided we need something like libchirp. I wanted a core, that is safe, light-weight, high-performance and as portable as possible. I had this polyglot cloud-software-toolkit in mind that bridges all language barriers in software. I was inspired by what I learned about Erlang. I thought by now we will have many daemons, bindings and upper-layer protocols, but in fact, I have an awesome C99-implementation and good bindings for python. I have created the foundation.

You can't have the penny and the bun

I wanted to have two conflicting properties on more than one occasion. For example:

  • Have a bounded message-queue
  • All peers are synchronous

For performance, memory-safety, safety against spamming a peer I designed libchirp using a fixed-size message-queue. It allocates the message-queue with a fixed-size when a connection opens and then it doesn't have to allocate anything during operation, this means existing connections stay operational in low-memory situations. Generally, libchirp handles failing malloc() gracefully.

Now if all peers are synchronous you can end-up with a lock-up situation:

  • Two peers want to send an ACK message to each other
  • But both peers have a full queue, so they can't accept the ACK message
  • When they can't accept the ACK message they can't free a slot in the queue [1]

I had about 5 plans to remedy the situation, but everything I tried only moved the problem farther away. I do randomized testing using hypothesis, and it was always able to find new lock-up situations. Yes, hypothesis rules! Other plans would violate my fixed-memory property and open the door for DoS-attacks somehow. You just can't have both.

Since I value the performance, memory-safety and DoS-prevention properties of libchirp, I pondered if it is indispensable for all peers to be synchronous, Synchronous in this context means you can't lose a message without an error (exception in python).

It turns out that most of the time an asymmetric approach is absolutely sufficient and if it isn't you can always use timeout-based bookkeeping of messages/requests/responses.

Rule of thumb:

  • Consumers (workers) are not synchronous
  • Producers are synchronous if they don't do bookkeeping
  • If you route messages from a synchronous producer, you want to be synchronous too: Timeouts get propagated to the producer.

If the producer requests an acknowledge (it is synchronous), then the consumer signals that it has finished the job after sending the response. So the response is put on the wire before the acknowledge, therefore by the time the consumer reads the acknowledge and there is no response it is clearly an error.

If there is no response needed an acknowledge means that consumer has done its work, for example, committed the data to a database. So for the producer, no error means the transaction was successful.

[1]You might ask, why can't we release the slot of the message that triggered the acknowledge? 1. The information where to send the acknowledge to is stored in that slot. 2. The user will get a callback when the acknowledge has been sent, to identitify the callback he needs an identity which is stored in the slot. Yeah, the memory-safety property really makes things complicated, but its so worth it, believe me. Chirp is more or less as fast only calling the needed syscalls, almost no overhead. Also it keeps on sending messages when you are out-of-memory. Did I mention that we wanted to use chirp for monitoring. If your server is out-of-memory, it will be able to tell you about it. By default libchirp will allocate more memory if the message is larger than the allocated slot, but you can disable this.

This entry was tagged as libchirp message-passing lessons python c99