An Overview of the Hasty Pudding Cipher


Rich Schroeppel & Hilarie Orman
rcs@cs.arizona.edu
July 1998

While we are all waiting for NIST to uncloak the new Advanced
Encryption Standard, we've cooked  up a tasty morsel: the Hasty Pudding
Cipher (HPC) is fast and flexible, and provides strong security.  A
bottle of Dom Perignon to anyone who cracks the cipher!

This is a public domain method with no known patents.  The code is not
copyrighted.

Hasty Pudding is a variable length block cipher.  The block size may
be *any* number of bits,  even fractional bit values are permitted.
The arbitrary block size means that anything can  be encrypted without
expansion.  It also means that large data units (files, email
messages,  etc.), if encrypted as single blocks, can have encryptions
that interrelate all of the constituent bits, not merely those that
are in contiguous 64 bit blocks, for example.

The key size may be any whole number of bits; the key space is larger
than any measure of the physical universe.  It is possible to use a
long-term key, with key lifetime not limited by the "amount of
material protected by one key" restriction.  Of course, other lifetime
limitations will still apply -- keys can be stolen, or leak, etc.

Hasty Pudding is fast, achieving 100 megabits per second on 1000-bit
blocks on a 233 MHz DEC Alpha.  The cipher works best on 64-bit
architectures, but runs acceptably fast on 32 bit machines.  For RISC
architecture machines, such as the Alpha, the cipher is remarkably
conservative of instruction usage: on the Alpha it uses about 20
instructions and 2-3 memory references per byte of input plaintext.

The Hasty Pudding Cipher has a security feature applied to each
datablock, the *spice*.  It may be regarded as a secondary key that
need not be concealed.  Use of the spice offers one of the
advantages of cipher block chaining mode (CBC), because the spice can
be changed very cheaply for each block encrypted,  yielding different
ciphertext even when the plaintext remains constant.  Spice offers an
advantage of over CBC, though, because the spice can be generated
quickly and used for parallel encryption of several blocks at once.

The cipher is fast enough to be used for many distributed system
application, such as video multicast, virtual memory for diskless
client machines, and encrypted file systems.  Interestingly, random
access is well supported, because each record (or file block) can be
encrypted with a different spice, perhaps based on block number or
customer account number.  For structureless text files, the block
number could be used as the spice.  Because there is no expansion
(ciphertext size is always identical to plaintext size), and no CBC
problem, a new block can be written over an old one without the
block-to-block chaining problem that CBC has.

The architectural dependencies of the cipher speed are minimal.  A
RISC machine with 64 bit registers and 64 bit memory operations, a
first-level data cache of at least 10K bytes, and an instruction cache
of at least 8K bytes will have the best performance, but similar 32
bit architectures will have acceptable performance as well.  The
cipher is quite miserly in its occasional use of multiplication and
other expensive instructions.  Pipeline busting conditional branches
are rare.

The disadvantages of the cipher include its complexity and resulting
use of a large amount of instruction memory.  Another disadvantage for
power-limited processing environments is its use of RAM: for each key
it requires 2K per blocksize range, and there are 5 blocksize ranges.

Performance Considerations

The highest speeds are for blocks of length 64 bytes and up. The
shorter blocksizes are slightly slower.  The speed for single bits is
much slower but still compares favorably with other methods.  In
actual practice, cipher speed is important for bulk encryption, but
less important for small data units, because the cost of processing
small blocks is generally dominated by application service
requirements (parsing, computation, disk access, etc.).  Thus, HPC is
engineered to be fastest where speed counts most.

The extra effort to safely include the spice impacts noticeably on the
time for short blocksizes; even if the spice is removed, the cipher
remains sluggish.  A user merely needing to encrypt single bits might
instead choose to encrypt 64 bits of zero at a time, thereby obtaining
64 bits to XOR with the target.

Because the cipher accesses the datablocks at random, very large
datablocks (e.g. one million 64 bit words) may cause translation
lookaside buffer misses that cause the performance to degrade
slightly.

The code size might exceed the size of the on-chip cache for some
architectures; this will have an adverse effect on performance.

When implemented on a 32-bit computer architecture, the speed will be
greatly affected by the operations that add, shift, and rotate,
because the testing for carry bits will cause the instruction pipeline
to stall.

The running time of the algorithm depends on several usage factors,
one of them being the length of the spice.  If the spice is shorter,
the algorithm will run faster.  This offers opportunities for
fine-tuning the speed of the algorithm to the security requirements of
the application.

When the cipher is used for encrypting data values that have a domain
size that is not a power of two, the running time of the algorithm is
probabilistic.  This is due to the method that Hasty Pudding uses for
mapping from the domain to a cipher range that is nearly equal in
size: encipherments are generated in a larger domain until a value
within the domain appears.  This assures that the cipher values have a
uniform range, the same as the plaintext domain values.  However, the
variance in the running time may not be acceptable for hard deadline
applications.

The five key expansion tables needed for each key imply that a data
cache of at least 10K bytes is normally required.  For specialized
applications, utilizing only one block size, only a single 2K table
need be generated; this reduces instruction space usage as well.  The
fractional bit techniques might also be dropped in some cases, further
reducing the instruction memory requirements. 

Design Considerations

Though this is a "kitchen sink" cipher, using many disparate tricks,
the irregularity has its advantages.  It prevents differential
cryptanalysis and makes linear analysis hard.  The analysis is harder
for both the good guys and the bad guys.  Even in the event of
practical quantum computing methods emerging in the future, Hasty
Pudding is likely to fare well because of its ad hoc design
complexity; it may well have too much state to represent in any
realizable quantum machine.  On the other hand, this complexity means
that the cipher cannot be represented compactly for today's computer
architectures, either, and the cipher will have a large instruction
footprint.

A design goal for the cipher is to have triple diffusion: one bit of
change in data or spice propagates three times through the cipher
state, changing every bit.  Some of the subciphers have diffusion
near five.  The extended-length subcipher has diffusion two, but
is compensated by the mixing rounds.

The Cipher uses the non-linearity of addition and exclusive OR to
achieve mixing of the bits.  The key expansion memory serves as an
additional non-linear function, mapping 8 bits to 64 bits.

The key expansion table is referenced at least 24 times during any
block encryption, utilizing 64 bits from the table on each access.
This ensures a liberal use of keying material in each block of output
ciphertext.

A single bit change in spice affects all ciphertext bits with nearly
equal probability.  In the event that Hasty Pudding is used with a
requirement for secrecy of the spice, this feature minimizes the
likelihood that cryptanalysis of the ciphertext could reveal the spice
values.

The cipher is designed to take advantage of instruction-level
parallelism, and the spice allows block level parallelism.


Overview of the Operation of the Hasty Pudding Block Cipher

The inputs to the main HPC routine are the following:

        Plaintext
        Output buffer (this is the output area)
        Length of the plaintext (in bits)
        Key
        Length of the key (in bits)
        Array of 5 pointers to key expansion tables
        Pointer to spice
        Multiple encryption count
        
Hasty Pudding is five different ciphers for five different block size
ranges: 

   tiny                   <=35 bits 
   short                  36-64 bits
   medium                 65-128 bits
   long                   129-512 bits
   extended               >=513 bits

Each cipher has the underlying Feistel structure (although it's
changed so much that Feistel probably wouldn't recognize it).  

The cipher key controls the key expansion table at the start of the
cipher.

The internal state of the cipher contains up to 8 variables of 64 bits
(512 bits), depending on the block size. The internal state of the
cipher is initially derived from the plaintext.

The cipher consists of a series of steps (a "step" is similar to a DES
"round") that alter the values of the internal state variables.  A
step alters each state variable at least once.  Each of the
assignments to a state variable is called a microstep, and each
microstep is a simple, reversible function of its inputs.
Many steps use eight bits of the state to select a word from the key
expansion table.  The word is mixed with some the state via the
microsteps, and the state is then "stirred" by mixing it with itself.

Every few steps, the Cipher uses the spice to alter the state.  This
pattern (cipher, spice) is run for a few steps; the spice is used a
minimum of five times for each block.

A microstep takes as inputs the state variable that will be altered,
and one or two other selected words from the state, and possibly one
or two words from the key expansion table.  The operations for
combining the inputs are selected from the following set: addition,
subtraction, exclusive or.  Often one of the inputs is shifted.

Decryption consists of reversing the microsteps.

The extended cipher works by interleaving a step with an operation
that exchanges a plaintext word with a state word.  This combination
of step and exchange is executed once for each word in the plaintext
array.  This is called a pass, and there are three passes in extended
mode.  Each pass uses the plaintext array in a different order.
Between passes, a few steps are executed to make sure that small
changes in the plaintext propagate throughout the ciphertext.  Some
steps use the spice.  Each step is invertible.

The inputs to the HPC function for fractional bit encryption are the
following:

        Plaintext (this is the integer that is to be encrypted)
        Size of the output range (an integer)
        Key
        Length of the key (in bits)
        Array of 5 pointers to key expansion tables
        Pointer to spice
        Multiple encryption count

The output of the fractional bit encryption is an integer between zero
and the size of the output range minus 1.


Key Expansion

There are five key expansion tables, one for each subcipher.  Each
table is an array of 256 64-bit words. The code for creating the
tables is the same for each subcipher.  The table initialization
depends on the subcipher number, which produces completely different
expansion arrays.

There are three inputs to the key expansion process: the key, the
subcipher number (one to five), and the low 64 bits of the length of
the key.  The inputs control the initialization of the array.  The
stirring process amplifies minor differences.  In the version of Hasty
Pudding coded below, the key expansions are done only when needed.

An application may use one key expansion table for all five
ciphers. In this case, the table for medium length is used.  This
carries some additional risk, since it allows the possibility that one
of the five ciphers could be broken, and the key expansion table
recovered, which would then expose traffic in the other four ciphers.

Modes

Because it is a block cipher, Hasty Pudding can be used in all the
usual block cipher modes.

The spice itself offers all the advantages of CBC mode as well some
unique advantages, such as parallel encryption.

Hasty Pudding can be used to simulate a stream cipher by initializing
the spice to zero, and then incrementing it for each item encrypted.
The efficiency is of this mode is poor for bits, but is tolerable for
bytes, and reasonable for 64-bit words.

If a "random" bit stream is all that is required, then encryption of
any convenient size of  integers will do.

Implementations of Hasty Pudding must include the option for multiple
encryption modes.  In the event that machine speeds or cryptanalysis
renders the single encryption mode less secure than is desirable, it
should be possible to easily change fielded versions to use double or
triple encryption mode.

The Spice

Each block can be encrypted with a unique contribution, the spice.
This is similar to an initialization vector (IV) for DES, in that it
need not be concealed and it assures that two identical plaintext
blocks will not have the same ciphertext.  The spice need not be used,
but if it is, it protects against block splicing attacks that are
common to the cipher block chaining (CBC) modes.

The spice also allows parallel encryption of several blocks, a
capability  that CBC prevents.  Both CBC and Hasty Pudding with spice
can be parallel decrypted.

The spice is long enough (512 bits) that it can be used to contain a
variety of "uniquifying" data.  These might include things like date
and time, inode number, user account number, filename, etc.  One word
can be reserved for the block number, so any file or data connection
will have unique encryption of every block sent.  Even if the
plaintext is a constant stream of zeroes, a different spice for each
block will cause the ciphertext to look random. 

Block splicing is not a problem, because moving a ciphertext block to
another place will cause it to decrypt to randomness.  This is better
than CBC, which can be spliced at the cost of two random decrypted
blocks.  

The spice can be shortened to fewer words, or deleted entirely, for
extra speed.  Unused portions of the spice are treated as 0.

Spice Caveat

Avoid using spice that is controlled by an opponent because the cipher
is not warranteed against a chosen-spice attack.  Though no such
attack is known at this time, it seems prudent to avoid this mode
absent more analysis.  The spice is not a substitute for a cipher key,
and the cipher should never be used in a mode where the key is known.


Security Analysis

I'm claiming a security level of 400 bits.  This means that an
attack will require 2^400 trials to succeed.  The claim is based on
the amount of intermediate state in the cipher, and the amount
of key expansion table used for each encryption.

Five separate key expansion (KX) tables are used.  Each is 256 64-bit
words.  One security goal is that a break in one algorithm which
reveals the KX table doesn't spoil the other sub-algorithms.  The KX
algorithm has been made deliberately lossy, so that an attacker who
learns a KX table cannot work backward to find the original key.

For single bits, knowing the encryption of zero implies the encryption
of one and vice-versa.  For somewhat larger blocks of size B, if the
attacker learns the encryptions of 2^B-1 values, then the last value
is determined, as is the parity of the defined permutation.

The step used in Hasty Pudding churns most of the bits.  A one-bit
change will be amplified to 2 or more bits changed.  After 9 steps, a
one-bit change can affect all 2^9 state bits.  These include the KX
lookup, and the variable shifts.

The cipher includes both linear and non-linear combining operations:
xor and add/subtract.  Xor is bitwise linear, but arithmetically
non-linear.  Viewed from an arithmetic perspective, add/subtract is
linear, but xor is non-linear.  (In both cases, shifts are nearly
linear.)

The KX lookup is highly non-linear.

Two other mixes are relevant:  Any single bit position within a word
(or the entire state) is mostly mixed with other words in the same
position.  The variable shifts, and the fixed shifts, guarantee 
diffusion along the bit-position dimension.

Finally, adjacent collections of bits must be broken up.  This is done
by allowing parts of words to fall of the ends during shifts.

State information flows from each word into each other word; three
steps gives a complete graph of words flowing into every other word
and themselves.

The spice is mixed in at a time when the state from the initial
plaintext (or final ciphertext) is so thoroughly mixed that nothing
can be learned.  If the attacker can somehow peer into the internal
state (or guess it), and vary the spice to cancel it, the other spice
uses will scramble the effort, foiling attacker's attempts to detect
the match.

If the attacker can guess most of the KX array, then a sufficient
number of known plaintext-ciphertext pairs will fill in the array.
Varying the spice will somewhat blunt this attack.

If 16384 bits of plaintext-ciphertext are known, along with the spice
values used, then the KX array is in theory determined.  If the spice
is fixed but unknown, 16896 bits (in theory) define both the KX and
the spice.

Although indefinite size keys are allowed, there are in effect only
2^16384 distinct keys.  (If all five tables are considered, then there
are 81920 bits of key.)

It is important that the key expansion table be generated by the
pseudorandom process in the cipher specification and not by direct
user input.  This is because a small change in the expanded key
might not have any effect on the ciphertext.  In such a case, an
attacker might guess which key words were used, and vary the spice to
try to solve for them.


Challenge

In keeping with the tradition of offering prizes for cryptographic
success, while reflecting my modest means, I (RCS) am offering a bottle
of Dom Perignon champagne for progress attacking Hasty Pudding.  I will
attempt to award a prize each year for the best work that comes to my
attention.  You don't have to break Hasty Pudding to win -- the best
paper may be analytical, or statistical, or attack a simplified
cipher, or the key expansion, or suggest an improvement to the cipher.
The prize work must be publicly available.  I'll make awards until the
cipher is broken or I have given ten prizes.