








                         [1mTesting?  What testing?[0m

                               [4mPeter[24m [4mMiller[0m
                           Platypus Technology


                                 [4mABSTRACT[0m


            This  paper  presents  a  simplistic  yet powerful
            model of what a test is.  When you intend to  test
            your software, you have to design your software to
            be test[4mable[24m.  This paper will  examine  attributes
            of  software implied by this model.  Some examples
            of automated testing will be given.





       [1m1.  What is a test?[0m



       The core thesis of this paper  is  the  idea1  that  a  test
       consists  of  three  things:  a system in a defined state, a
       defined transaction, and  a  confirmation  that  the  system
       arrives in a defined state.
       [40m[0m
                          initial[40m-------destination[0m
                           state[40mtransactionstate[0m



       [40mThis   is   an  overly  simplistic  statement,  but  remains[0m
       [40mremarkable useful.  The  "system"  under  test  could  be  a[0m
       [40msimple object, a collection of interrelated objects, a whole[0m
       [40mapplication,  or  a  distributed  multi-layer  client-server[0m
       [40msystem.   Equally, the transaction could be a single byte of[0m
       [40minput, a single edge of a state  transition  diagram,  or  a[0m
       [40mseries  of  transactions  lumped  together as a single event[0m
       [40mbeing considered.[0m

       [40mConfirming that the system  under  test  has  arrived  in  a[0m
       [40mparticular  state  can be done in may ways.  Some states are[0m
       [40mclearly  visable,  sometimes  they  are  available  but  not[0m
       [40museful,   and   some   internal  states  are  not  for  user[0m

       ____________________

       1. There is a growing body of knowledge called  "Transaction
          Based    testing"   or   sometimes   "Transaction   Based
          Verification".

       Testing?  What testing? Peter Miller                  Page 1





                                   - 2 -



       caonndsuamrpetimounch  harder  to  access  and  therefore  harder  to
       confirm.
       
                          initial      destination
                           state          state



                                          _o_o_p_s



       Please  note that this is a _s_i_m_p_l_i_s_t_i_c definition of a test.
       It does not cover all forms of testing  (such  as  tests  of
       usability,  maintainability,  portability, robustness and so
       on  which  make  up  the   other   zillion   software   sub-
       characteristics  listed in ISO 9126) and it is no substitute
       for a well thought out test plan.  It does, however, provide
       some language for talking about functional testing.


       22..  MMaannuuaall tteessttiinngg iiss nnoo tteessttiinngg



       Humans  are really bad at boring, repetitive tasks.  If your
       test plan  is  based  on  the  idea  that  your  staff  will
       faithfully  execute  a long list of printed instructions, at
       least once per release, then your testing  is  probably  not
       effective.

       For  example,  many manual test plans contain long sequences
       of things  the  operator  is  required  to  do,  often  with
       information  on the screen to be confirmed as correct.  This
       is all very well for successful tests, but what happens when
       one  fails?  Usually, these test scripts cover large numbers
       of behaviors.  There is thus a motivation  to  complete  the
       rest  of  the  script,  rather than stop, and have to do the
       start of the script again when the software has been  fixed.
       
                                          +
                                           +
                                           +
                                           +



       There  are  two  themes  here:  (a)  testers  have  to  look
       "productive" or they might not get paid, and (b) redoing the
       first bit again and again is boring.

       Let's  look  at  that  definition again, rephrasing what our
       manual test scripts are doing.  "Usually, these test scripts
       start  from  a defined state, and define a transaction and a

       
       Testing?  What testing? Peter Miller                  Page 2





                                   - 3 -



       confirmations  of  the  destination  state,  then  the  next
       transaction   and  confirmation,  _a_d  _n_a_u_s_e_u_m."   Now,  what
       happens when one of those  confirmations  fails?   Well,  we
       know  it's  in  _t_h_e  _w_r_o_n_g _s_t_a_t_e, so going on to execute the
       rest of the script, we are no longer fulfilling the  initial
       portion  of  our  three-part  definition:  we  aren't in the
       defined state that the transaction  is  to  be  applied  to.
       After  the  first  failure,  the  rest of the results are _n_o
       _i_n_f_o_r_m_a_t_i_o_n.
       
                                     _o_o_p_s










       For effective testing, then, you need something that is very
       good  at  accurately repeating the same script over and over
       again, and  reporting  very  promptly  when  something  goes
       wrong.   Computers  are  very  good  at  boring, repetitious
       tasks.  They don't complain when you ask  them  to  run  the
       same stupid scripts tens or even thousands of times.  And if
       the script breaks, they stop.  For effective testing,  then,
       you need automated testing.  Let the humans _w_r_i_t_e the tests,
       and let the computers _r_u_n the tests.


       33..  SSooffttwwaarree AAttttrriibbuutteess



       Automated testing requires the ability to automatically  get
       the  system  under test into a defined state, the ability to
       automatically apply one or more transaction, and the ability
       to automatically confirm the current state (either read-and-
       compare, or write-and-diff, usually).

       Some things are easy to test, e.g.
        cat > test.in
        cat > test.sed
        cat > expected-output
        sed-clone -f test.sed test.in \
            > test.out
        diff expected-output test.out

       But some things require some specific  changes  to  get  the
       three  properties.   _E_._g_.  a virtual machine simulator needs
       the ability to set registers and stack, _e_t_c,  and  later  to
       dump  them do they can be confirmed.  This may be observable

       
       Testing?  What testing? Peter Miller                  Page 3





                                   - 4 -



       _e_._g_.  as  some  interesting  opcodes  only  present  in  the
       simulator,  and  not  the  real  machine,  maybe  to get the
       simulator to exit with a success/fail indicator.

       33..11  IInniittiiaall SSttaattee

       The system under test needs a way to be  placed  in  a  well
       defined initial state.  This is something that most programs
       are reasonably good at.  Word processors can  load  a  file,
       image processing systems can load an image, databases can be
       created and populated with test sets, _e_t_c.

       It was mentioned above that transactions can actually  be  a
       series of transactions.  Sometimes, getting the system under
       test into a defined state requires starting from the default
       state  and  applying a series of known-to-work transactions.
       Provided that  you can _g_e_t the  system  under  test  into  a
       defined state automatically, it can be tested automatically.

       33..22  TTrraannssaaccttiioonnss

       Automating transactions can often be  the  hardest  part  of
       automated  testing.   Usually,  this  means  automating  the
       simulation of input.  This could be user input, or a network
       connection,   or  a  hardware  simulation  for  an  embedded
       application.

       _3_._2_._1  _C_o_m_m_a_n_d _L_i_n_e

       The design  of  UNIX  makes  the  testing  of  command  line
       programs  relatively  simple, because you can redirect input
       from a file.  This means that you  don't  actually  need  to
       change your software (or not much, anyway).

       _3_._2_._2  _F_u_l_l _S_c_r_e_e_n

       Full-screen  programs  are  often  similar, with input again
       directed from a file, although  you  may  need  to  make  it
       tolerant  of  non-tty  input possibly under the control of a
       command line option.  The trickier cases can be handled with
       _e_x_p_e_c_t.

       _3_._2_._3  _G_U_I

       On the other hand GUI interfaces are harder.  There are some
       utilities, such as _T_k_R_e_p_l_a_y which help.  But they lead us to
       looking  at the problem differently: where can we inject the
       input?
       We can inject it into the X server (or have a fake X  server
       which exists solely to provide test input).
       We  can  proxy  the  X  server, and inject the input via the
       proxy.
       We can inject it into the event  loop  of  our  application.
       This, of course, requires changing the system under test.

       
       Testing?  What testing? Peter Miller                  Page 4





                                   - 5 -



       We  can  have  alternate  input classes, a "real" one and an
       "automated" one.  This, of course,  means  that  the  "real"
       input  class  doesn't get tested, but the rest of the system
       does, and that may be enough.

       _3_._2_._4  _C_l_i_e_n_t _S_e_r_v_e_r

       Most of the techniques useful for X programs work for client
       server   systems  as  well.   Fake  clients,  fake  servers,
       proxies, alternative input classes, _e_t_c.

       _3_._2_._5  _O_b_s_e_r_v_a_t_i_o_n

       In order to test the system, some aspect of it was  changed.
       Auxiliary  test support, more tolerant input, multiple input
       sources.

       33..33  VVeerriiffyy SSttaattee

       Some programs, such as the  _s_e_d  example  given  above,  are
       relatively  easy to test.  Many programs store a significant
       amount of state when you save to a file,  and  this  may  be
       compared  with  _d_i_f_f(1)  or _c_m_p(1).  Other systems, however,
       are more challenging.

       _3_._3_._1  _F_u_l_l _S_c_r_e_e_n

       Many _c_u_r_s_e_s(3) programs need a special command to  dump  the
       screen into a text file for comparison using _d_i_f_f(1).  It is
       also possible to use _e_x_p_e_c_t in many cases.

       _3_._3_._2  _G_U_I

       Many of the input solutions also work for  output,  but  you
       will probably need special commands or options to get screen
       dumps at strategic moments, for comparison.

       Wholesale capture and comparison of  the  output  stream  is
       problematic,  usually  because of gratuitous differences not
       relevant to the test.

       _3_._3_._3  _C_l_i_e_n_t _S_e_r_v_e_r

       You can use bogus clients, bogus servers, or clever proxies.

       _3_._3_._4  _O_b_s_e_r_v_a_t_i_o_n

       In  order to test the system, some aspect of it was changed.
       Auxiliary test support,  captured  output,  multiple  output
       destinations.


       44..  DDiissccuussssiioonn


       
       Testing?  What testing? Peter Miller                  Page 5





                                   - 6 -





       There  are  some  things  which  arise from consideration of
       these ideas.

       44..11  NNoo RReessuulltt

       In coming up with a  testing  regime,  it  is  necessary  to
       remember that tests do not simply _p_a_s_s or _f_a_i_l.

       This  is  further  complicated by the inverted sense of some
       tests.  For example, your development  process  may  require
       that  a  bug fix be accompanied by a test which _f_a_i_l_s on the
       unfixed system, and _p_a_s_s_e_s on the fixed system.

       Consider the issues in achieving a necessary  initial  state
       by  applying transactions to an initial state.  What happens
       when  one  of  these  transactions,  which   are   not   the
       transaction under test, _f_a_i_l?  In such a case it can't _f_a_i_l,
       because the bug fix case will give  a  false  _p_o_s_i_t_i_v_e,  but
       equally  it  can't  succeed  because  this  renders the test
       meaningless.

       The solution is to have a  third  result,  often  called  _n_o
       _r_e_s_u_l_t, which when negated still means _n_o _r_e_s_u_l_t.

       Similar   problems   can  occur  with  the  transaction  and
       verification stages of the test.

       44..22  NNeeggaattiivvee TTeessttiinngg

       Some other examples of negative testing will be given  (i.e.
       _d_i_d_n_'_t  arrive  in  the right state, or invalid transactions
       resulting in an invalid state change).

       44..33  WWaattcchh MMee

       A useful facility for creating tests is a "watch  me"  mode.
       This  is a mode or tool or whatnot that allows the system to
       record  inputs  and  output  for  replay  and   confirmation
       (respectively)  at  a  later time.  While this is _n_o_t one of
       the necessary attributes, it is often a useful side  effect.

       44..44  AAsssseerrtt

       This  simple  model of testing gives a different spin on the
       humble assert statement.  The use of assert can  be  thought
       of as verifying that the system is in a particular state, or
       that the transaction (input) is valid.  This is not the kinf
       of  artifact  you  _w_a_n_t  to  see  in  production code; it is
       usually compiled out of production code.




       
       Testing?  What testing? Peter Miller                  Page 6





                                   - 7 -



       44..55  TTrraaccee oonn RReeqquueesstt

       Another thing which is often compiled out of production code
       is  a  variety of tracing macros, which allow you to see the
       state  of  various  portions  of  the  system  as  they  are
       executed.   You  sometimes  see  this in production systems,
       whene there is little performance impact;  it  is  extremely
       useful feature for tech support, as well as testing.


       55..  TTeessttiinngg??  WWhhaatt tteessttiinngg??



       I  once  worked  on an image processing system for which the
       company had partial source, and  the  inner  workings  where
       supplied   as  a  library  from  the  vendor.   One  of  the
       transforms had some trouble, and I  fixed  it,  but  then  I
       wondered  how  I should test it.  How many of us can confirm
       visually that  a  2D  Walsh-Hadamard  transform  has  worked
       correctly?   While  the destination state was visible on the
       screen, giving humans 2 side-by-side pictures  (a  "does  it
       look like this" manual test) you will almost certainly get a
       false positive.  _E_._g_. those "find  10  differences"  cartoon
       pictures on the funnies section of the newspaper.  If humans
       are so bad at spotting _g_r_o_s_s differences, how can we  expect
       them to find one pixel different in a million?  So, I looked
       for the tool to compare two images  and  tell  me  how  many
       pixels  were  different.   _T_h_e_r_e  _w_a_s_n_'_t  _o_n_e_.   How did the
       vendor test their product?

       If you have testability as a requirement of  your  software,
       you  will  write  different software than if testability was
       not a requirement.

       Do  all  the  tools  we  use  every  day  have  these  three
       properties: Can their initial state be loaded automatically?
       Can their transactions be applied automatically?  Can  their
       destination state be confirmed automatically?  If any one of
       these is missing (but usually the last one), what  gives  us
       any confidence that they were tested at all?














       
       Testing?  What testing? Peter Miller                  Page 7


