Introducing Rump: hot-sync two Redis databases using dumps
投稿者 Marco Lisci 投稿日
We're thrilled to announce our first open source project: Rump!
Rump is a tiny little tool focused on one simple thing: getting live data out of an AWS ElastiCache Redis cluster.
We faced this problem when trying to get our staging Redis containers in sync with our production cluster. At Sticker Mule we heavily use Docker and CoreOS, relying on an ElastiCache cluster for our Redis needs in production.
Lately we wanted to make our staging environment as close as possible to our production environment, and Redis is part of it. Here's the journey that ultimately led to Rump.
Don't block
We had one simple requirement: do not block production while getting data. The single-threadedness of Redis is an important aspect to take into account.
Surprisingly we discovered that ElastiCache ships with some commands disabled. Basically all commands you can use to safely transfer data.
BGSAVE
The standard way of manually triggering a back up of a Redis database is issuing a BGSAVE
, and waiting for it to finish in the background, a non-blocking operation. Unfortunately this is disabled, unless you go with the AWS internal implementation of the snapshot feature.
SLAVEOF
Setting up slaves is another interesting option Redis offers, and it would have been the perfect choice for us.
The plan was to set temporarily the staging Redis containers as slaves of our production cluster, getting live data. Unluckily SLAVEOF
too is disabled, there's no way to add slaves to an ElastiCache instance.
Existent tools
There are many awesome Redis tools around that try to simplify the administration of Redis servers, dumping to JSON, etc.
The problem is that most of the stable, maintained tools, use the KEYS
command to get keys, and then operate on the keys. The KEYS
command has an O(N) complexity, heavily blocking Redis when N is high, until all keys are returned. Staging containers get created and destroyed frequently and we have a good number of keys, we don't want to DoS our own server.
It was clear we needed a simple tool to just do the sync. We started playing with SCAN
to get the keys, and DUMP/RESTORE
to get/set values.
SCAN
is an O(1) command, safe to run on a production server to get all keys, and because of that its implementation has to be different than KEYS
. SCAN
returns a group of keys currently present in the DB, and a cursor to the next group.
DUMP/RESTORE
make the job of reading/writing values independent from the key type.
With this in mind, here's what Rump brings to the table:
- Non-blocking progressive keys reading via
SCAN
. TYPE
independent values operations viaDUMP/RESTORE
.- Pipelined
SCAN
andDUMP/RESTORE
operations. - Reading from the source server and writing to the destination server are concurrent. Rump doesn't store all keys before writing to the destination server.
- Single cross-platform binary, no dependencies.
- Minimal footprint, UNIX philosophy, it does just one thing with two flags.
We hope the tool will be useful to those experiencing the same troubles we had, and many, many thanks to Redis for supporting such a wide array of commands!
P.S. If this is of interest to you, and you're seeking an organization where talented people enjoy working, we are hiring.