=========================
Tips for parallel fuzzing
=========================

  (See README for the general instruction manual)

1) Introduction
---------------

Every copy of afl-fuzz will take up one full CPU core. This means that on an
n-core system, you can safely run up to n different fuzzing jobs with virtually
no performance hit.

In fact, depending on the design of the fuzzed program, you may be able run one
fuzzer per every virtual core on HT-enabled system at very little cost.

(Conversely, if you stick to a single job on a multi-core machine, you will be
underutilizing the hardware, as indicated by the CPU load widget in the bottom
right corner of the screen.)

When targeting multiple unrelated binaries or using the tool in "dumb" (-n)
mode, it is perfectly fine to run fully separate instances of afl-fuzz - be it
on a single machine or across a large fleet. The picture gets more complicated
when you want to have multiple fuzzers hammering a common target: if a
hard-to-hit but interesting test case is synthesized by one fuzzer, the
remaining instances will not be able to use that input to guide their work.

To help with this problem, afl-fuzz offers a simple way to synchronize test
cases on the fly.

2) Single-system parallelization
--------------------------------

If you wish to parallelize a single job across multiple cores on a local
system, simply create a new, empty output directory ("sync dir") that will be
shared by all the instances of afl-fuzz; and then come up with a naming scheme
for every instance - say, "fuzzer01", "fuzzer02", etc. 

Run the first one ("master", -M) like this:

$ ./afl-fuzz -i testcase_dir -o sync_dir -M fuzzer01 [...other stuff...]

...and then, start up secondary (-S) instances like this:

$ ./afl-fuzz -i testcase_dir -o sync_dir -S fuzzer02 [...other stuff...]
$ ./afl-fuzz -i testcase_dir -o sync_dir -S fuzzer03 [...other stuff...]

Each fuzzer will keep its state in a separate subdirectory, like so:

  /path/to/sync_dir/fuzzer01/

Each instance will also periodically rescan the top-level sync directory
for any test cases found by other fuzzers - and will incorporate them into
its own fuzzing when they are deemed interesting enough.

The only difference between the -M and -S modes is that the master instance
will still perform deterministic checks; while the slaves will proceed straight
to random tweaks. If you don't want to do deterministic fuzzing at all, it's OK
to run all instances with -S. (Running multiple instances with the -M flag
would be a waste of CPU time, however.)

WARNING: Exercise caution when explicitly specifying the -f option. Each fuzzer
must use a separate output file; otherwise, things will go south. One safe
example may be:

$ ./afl-fuzz [...] -S fuzzer10 -f file10.txt ./fuzzed/binary @@
$ ./afl-fuzz [...] -S fuzzer11 -f file11.txt ./fuzzed/binary @@
$ ./afl-fuzz [...] -S fuzzer12 -f file12.txt ./fuzzed/binary @@

3) Multi-system parallelization
-------------------------------

The basic operating principle for multi-system parallelization is similar to
the mechanism explained in section 2. The key difference is that you need to
write a simple script that performs two actions:

  - Uses SSH with authorized_keys to connect to every machine and retrieve
    a tar archive of the /path/to/sync_dir/<fuzzer_id>/queue/ directories for
    every <fuzzer_id> local to the machine. It's best to use a naming scheme
    that includes host name in the fuzzer ID, so that you can do:

    for s in {1..10}; do
      ssh user@host${s} "tar -czf - sync/host${s}_fuzzid*/queue" >host${s}.tgz
    done

  - Distributes and unpacks these files on all the remaining machines, e.g.:

    for s in {1..10}; do
      for d in {1..10}; do
        test "$s" = "$d" && continue
        ssh user@host${d} 'tar -kxzf -' <host${s}.tgz
      done
    done

There is an example of such a script in experimental/distributed_fuzzing/.

When developing custom test case sync code, there are several optimizations
to keep in mind:

  - The synchronization does not have to happen very often; running the
    task every 15-30 minutes or so may be perfectly fine.

  - There is no need to synchronize crashes/ or hangs/; you only need to
    copy over queue/*.

  - It is not necessary (and not advisable!) to overwrite existing files;
    the -k option in tar is a good way to avoid that.

  - There is no need to fetch directories for fuzzers that are not running
    locally on a particular machine, and were simply copied over onto that
    system during earlier runs.

  - For large fleets, you will want to consolidate tarballs for each host,
    as this will let you use n SSH connections for sync, rather than n*(n-1).

    You may also want to implement staged synchronization. For example, you
    could have 10 groups of systems, with group 1 pushing test cases only
    to group 2; group 2 pushing them only to group 3; and so on, with group
    eventually 10 feeding back to group 1.

    This arrangement would allow test interesting cases to propagate across
    the fleet without having to copy every fuzzer queue to every single host.

  - You do not need a "master" instance of afl-fuzz on every system; you can
    run them all with -S, and just designate a single process somewhere within
    the fleet to run with -M.

It is *not* advisable to skip the synchronization script and run the fuzzers
directly on a network filesystem; unexpected latency and unkillable I/O
wait processes can mess things up.

4) Remote monitoring and data collection
----------------------------------------

You can use screen, nohup, or something equivalent to run remote instances of
afl-fuzz. If you redirect the program's output to a file, it will automatically
switch from a fancy UI to more limited status reports. There is also basic
machine-readable information always written to the fuzzer_stats file in the
output directory.

Use the status screen of the master (-M) instance to monitor the overall
fuzzing progress and decide when to stop. You can also rely on that
instance's output directory to collect the synthesized corpus that covers
all the noteworthy paths discovered anywhere within the fleet. The slave
(-S) instances do not require any special monitoring, other than just making
sure that they are up.

Keep in mind that crashing inputs are *not* automatically propagated to the
master instance, so you may still want to monitor for crashes fleet-wide
from within your synchronization or health checking scripts.

5) Closing remarks
------------------

It is perhaps worth noting that all of the following is permitted:

  - Running some of the synchronized fuzzers with different (but related)
    target binaries. For example, simultaneously stress-testing several
    different JPEG parsers (say, IJG jpeg and libjpeg-turbo) can have
    synergistic effects and improve the overall coverage.

    (In this case, running one -M instance per each binary can be
    worthwhile.)

  - Having some of the fuzzers invoke the binary in different ways.
    For example, 'djpeg' supports several DCT modes, configurable with
    a command-line flag. In some scenarios, this can slightly improve
    coverage.

  - Much less convincingly, running the synchronized fuzzers with different
    starting test cases (e.g., progressive and standard JPEG). The
    synchronization mechanism ensures that the test sets will get fairly
    homogeneous over time, but it introduces some initial variability.
