Notes regarding the proposed binary protocol from Facebook's hosted
memcached hackathon on 2007-07-09:

REQUEST STRUCTURE:

  * Magic byte / version
  * Cmd byte
  * Key len byte  (if no key, 0)
  * Reserved byte (should be 0)

  * 4 byte opaque id.  (will be copied back in response; means nothing to server)

  * 4 byte body length (network order; not including 12 byte header)

  [ cmd-specific fixed-width fields ]

  * key, if key length above is non-zero.

  [ cmd-specific variable-width field ]


RESPONSE STRUCTURE:

  * Magic byte / version (different from req's magic byte/version, to distinguish
    that it's a response for, say, protocol analyzers)
  * cmd byte (same as response it goes to)
  * err code byte (0 on success, else errcode.  hit bit set if fatal/non-normal error)
  * Reserved byte (should be 0)

  * 4 byte opaque id copied back from response

  * 4 byte body length (network order; not including 12 byte header)

  [cmd-specific body]


COMMANDS:  (for cmd byte)

  get    - single key get (no more multi-get; clients should pipeline)
  getq   - like get, but quiet.  that is, no cache miss, return nothing.

      Note: you're not guaranteed a response to a getq cache hit until
            you send a non-getq command later, which uncorks the
            server which bundles up IOs to send to the client in one go.

      Note: clients should implement multi-get (still important for
            reducing network roundtrips!) as n pipelined requests, the
            first n-1 being getq, the last being a regular
            get.  that way you're guaranteed to get a response, and
            you know when the server's done.  you can also do the naive
            thing and send n pipelined gets, but then you could potentially
            get back a lot of "NOT_FOUND!" error code packets.
            alternatively, you can send 'n' getqs, followed by an 'echo'
            or 'noop' command.

  delete
  set/add/replace

       cmd-specific fixed-width fields for set/add/replace:

           * 4 byte expiration time
           * 4 byte flags
           (the 4 byte length is inferred from the total body length,
            subtracting (keylen + body length))


