Practical use of recursion when scanning Redis keys

March 14, 2021

The most known way to get all keys from Redis DB matching a pattern is to use KEYS <pattern>. At the same time, developers should not use this method on production systems:

Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout.

The proper way is to use SCAN command, which is the Redis way of doing cursor-based scans.

The cursor-based scan is also an excellent opportunity to use recursion, especially with Elixir. It should be easy enough given Erlang (thus Elixir) implements tail-call optimization.

The solution I came up with is using Redix to interact with Redis.

  @max_scan_count 1000

  def process_scan(key, filters, cursor, func) do
    command = [
      "SCAN",
      cursor,
      "MATCH",
      "#{key}",
      "COUNT",
      @max_scan_count
    ]

    {:ok, [new_cursor, keys]} = Redix.command(:cache, command)

    keys =
      keys
      |> filter_by(filters)

     case new_cursor do
      "0" ->
        func.(keys)

      _ ->
        func.(keys) ++ process_scan(key, filters, new_cursor, func)
    end
  end

the key argument can be any argument accepted by SCAN e.g. my-key-12*
the filters argument can be a list of additional filtering options – if an empty list provided, no further filtering is applied
the cursor is a cursor position returned by SCAN (on the first invocation set as 0)

The interesting argument is the func argument which can be any function that should be applied on found keys.

Let's say we want to count all keys in our DB, matching a pattern: my-key-123*. We can call our recursive function as:

process_scan("my-key-123*", [], 0, fn found_keys ->
  [length(found_keys)]
end)
|> Enum.reduce(0, fn res, acc -> acc + res end)

This snippet will go through all keys in DB matching a given pattern, and on each batch, it will run our callback, so the result is a list of lengths of each batch. We reduce that list to get the final number indicating an absolute number of matching elements.

The callback function can be more sophisticated than that. For example, we can invalidate found keys like so

  defp invalidate_by_keys(keys) do
    case Redix.command(:cache, ["DEL"] ++ keys) do
      {:ok, result} -> {:ok, result}
      _ -> {:error, :no_key}
    end
  end

and use it in our recursive function:

process_scan(key, filters, 0, fn found_keys ->
  case invalidate_by_keys(found_keys) do
    {:ok, res} -> [res]
    {:error, _} -> [0]
  end
end)

So, as a result, we receive a list where elements are numbers indicating how many records were invalidated in each batch returned from the SCAN command.

So far, this approach was working fine on a DB with a few million items. Note that performance is not a critical requirement for my use case, but it performs well enough. I assume this is thanks to BEAM optimizations on recursion.