Short FAQ:
----------

   @version \$Id: FAQ.txt,v 1.25 2007/05/21 07:45:07 hauk Exp $


1. Q: Monit watches processes by a pid file, so if a program crashes
      without removing its pid file, then monit won't recognize it,
      right?

   A: Monit will always check that a pid in a pid file belongs to a
      *running* process. If a program crashes and dies in a "normal"
      manner, then the process ID (pid) will not exist and monit will
      know that the program is not running (and restart it) even if a
      pid file exist. Some servers *can* crash and leave a zombie
      process, and appeare to run. Monit does also test for zombie
      processes and will raise an alert if a process has become a
      zombie.


2. Q: I want to watch the FOO server, unfortunately monit does not
      support the FOO protocol. And a FOO server won't send you a
      welcome message which can easily be checked.

   A: Just use the default port connection check. This check is in
      most cases, more than good enough. You simply use:

       if failed port FOO's-portnumber

      in the monitrc file, monit will open a connection to this port
      (TCP or UDP) and check that it is possible to send and recieve
      data from the port (via the select system call or test for an
      ICMP error in case of udp). If the connection fails or if the
      port connection does not respond, monit will restart the
      program. As of monit version 4.1 you can also write your own
      protocol-tests using send and expect strings (see the manual).
      Although this requires the server to use a text based protocol.


3. Q: I have a program that does not create its own pid file. Since
      monit requires all programs to have a pid file, what do I do?

   A: Create a wrapper script and have the script create a pid file
      before it starts the program. Below you will find an example
      script for starting an imaginary program (a Java program in this
      case).  Assuming that the script is saved in a file called
      /bin/xyz, you can call this script from monit by using the
      following in monitrc:

      check process xyz with pidfile /tmp/xyz.pid
        start = "/bin/xyz start"
        stop = "/bin/xyz stop"


          --8<--- (cut here)

          #!/bin/bash
          export JAVA_HOME=/usr/local/java/
          export DISPLAY=localhost:0.0
          CLASSPATH=ajarfile.jar:.

          case $1 in
           start)
           echo $$ > /tmp/xyz.pid;
           exec 2>&1 java -cp ${CLASSPATH} org.something.with.main \
           1>/tmp/xyz.out 
           ;;
          stop)  
           kill `cat /tmp/xyz.pid` ;;
          *)  
           echo "usage: xyz {start|stop}" ;;
          esac
   
          --8<---- (cut here)


4. Q: Tomcat (The Jakarta Servlet Container) does not create a pid
      file and will put the server in the background.

   A: Edit The catalina.sh script and find and remove the '&'
      character which will put the Tomcat server in the
      background. Then call tomcats startup.sh and shutdown.sh scripts
      from a wrapper script like the one mentioned above.

      or

      If your catalina.sh contains lines with $CATALINA_PID, you can
      just set CATALINA_PID=/path/file.pid enviroment variable.


5. Q: I have started monit with HTTP support but when I telnet into
      the monit http port the connection closes.

   A: If you use the host allow statement, monit will promptly close
      all connections from hosts it does not find in the host allow
      list. So make sure that you use the official name for your host
      or its IP address. If you have a firewall running also make sure
      that it does not block connections on the monit port.


6. Q: I'm having trouble getting monit to execute any "start" or
      "stop" program commands.  The log file says that they're being
      executed, and I can't find anything wrong when I run monit in
      verbose mode.

   A: Monit did start the program but for some reason the service dies
      later. Before we go on and introduce you to the fine art of
      system debugging, it's worth to note that:

      For security reasons monit purges the environment and only set a
      spartan PATH variable that contains /bin, /usr/bin, /sbin and
      /usr/sbin. If your program or script dies, the reason could be
      that it expects certain environment variables or to find certain
      programs via the PATH. If this is the case you should set the
      environment variables you need directly in the start or stop
      script called by monit.


7. Q: How can I run monit from init so it can be respawned in case monit
      dies unexpectedly?

   A: Use either the 'set init' statement in monits configuration file
      or use the -I option from the command line. Here's a sample
      /etc/inittab entry for monit:

       # Run monit in standard runlevels
       mo:2345:respawn:/usr/local/sbin/monit -Ic /etc/monitrc

      After you have modified inits configuration file, you can run
      the following command to re-examine the runlevel and start
      monit:

       telinit q

      If monit is used to monitor services that are also started at
      boot time (e.g. services started via SYSV init rc scripts or via
      inittab) then in some situations a special race condition can
      occur. That is; if a service is slow to start, monit can assume
      that the service is not running and possibly try to start it and
      raise an alert, while, in fact the service is already about to
      start or already in its startup sequence. If you experience this
      problem, here are a couple of strategies you can use to prevent
      this type of race condition:


      A) Start critical services directly from monit:

        This is the recommended solution - let monit takeover the
        responsibility for starting services. To use this strategy you
        must turn off the systems automatic start and stop for all
        services handled by monit.

        On RedHat, you can for example use:

	  chkconfig myprocess off

        on Debian:

	  update-rc.d -f myprocess remove

        a general example:

          mv /etc/rc2.d/S99myprocess /etc/rc2.d/s99myprocess

        If monit is started from a rc script, then to stop the service
        at systems shutdown, you should add the following line to
        monit's rc script:

          /usr/local/bin/monit -c /etc/monitrc stop myprocess

        or if monit handles more than one service, simply stop all
        services by using:

          /usr/local/bin/monit -c /etc/monitrc stop all

        If monit instead is started from init then, add a second line
        to inittab to stop the service:

          mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc
	  m0:06:wait:/usr/local/bin/monit -Ic /etc/monitrc stop myprocess

        or to stop all services handled by monit:

          mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc
	  m0:06:wait:/usr/local/bin/monit -Ic /etc/monitrc stop all

        Services handled by monit I<must> have start and stop methods
        defined so monit can start and stop a service. For instance:

          check process myprocess with pidfile /var/run/myprocess.pid
	        start program = "/etc/init.d/myprocess start"
	        stop program = "/etc/init.d/myprocess stop"
	        alert foo@bar.baz


      B) Make init wait for a service to start:

        This solution will make the init process wait for the service
        to start before it will continue to start other services. If
        you are running monit from init, you must enter monit's line
        at the end of /etc/inittab (A short example):

          si::sysinit:/etc/init.d/rcS
	  ...
	  l2:2:wait:/etc/init.d/rc 2
	  ...
	  mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc

        The rc script for the monitored service must be I<modified>
        so, that it will not return unless the service was started or
        start of the service timed out. Creative use of sleep(1) may
        be sufficient.

        As in the above example, services handled by monit must have
        start and stop methods defined.


      C) Enable the service monitoring manually from monit:

	  check file myprocess.pid with path /var/run/myprocess.pid
		if timestamp > 5 minutes then
		   exec "/bin/bash -c '
			/usr/bin/monit -c /etc/monitrc monitor myprocess;
			/usr/bin/monit -c /etc/monitrc unmonitor myprocess.pid
			'"
	  check process myprocess with pidfile /var/run/myprocess.pid
		start program = "/etc/init.d/myprocess start"
		stop program = "/etc/init.d/myprocess stop"
		alert foo@bar.baz
		mode manual

        This will cause monit to wait for 5 minutes before it will
        enable monitoring of the service myprocess.



8. Q: Why is monit not able to gather process data from a 64bit
      applications on Solaris?

   A: Most probably monit was compiled as a 32bit application and
      32bit applications cannot read /proc data for a 64bit
      applications. Furthermore, access to procfs is not supported in
      large file environments. Thus, you must compile monit with 64bit
      support. You will need a gcc version at least greater or equal
      to 3.0.  We have successfully tested monit with gcc version 3.1.
      Do the following,
      
       * "configure" monit with 64-bit support (examples):

           gcc [sparc]:
           ./configure \
               --with-ssl-incl-dir=/usr/sfw/include \
               --with-ssl-lib-dir=/usr/sfw/lib/64 \
               CFLAGS='-m64 -mtune=v9' \
               LDFLAGS='-m64 -mtune=v9'

           gcc [amd64]:
           ./configure \
               --with-ssl-incl-dir=/usr/sfw/include \
               --with-ssl-lib-dir=/usr/sfw/lib/64 \
               CFLAGS='-m64 -mtune=opteron' \
               LDFLAGS='-m64 -mtune=opteron'

           Sun Studio [sparc]:
           ./configure \
               --with-ssl-incl-dir=/usr/sfw/include \
               --with-ssl-lib-dir=/usr/sfw/lib/64 \
               CFLAGS='-xarch=v9' \
               LDFLAGS='-xarch=v9'

           Sun Studio [amd64]:
           ./configure \
               --with-ssl-incl-dir=/usr/sfw/include \
               --with-ssl-lib-dir=/usr/sfw/lib/64 \
               CFLAGS='-xarch=amd64' \
               LDFLAGS='-xarch=amd64'

       * "make" as usual:
           make

      Note, in order to successfully link a 64bit application you will
      also need all libraries (e.g. libflex, libssl and libcrypto) as
      64bit versions. Thus, it might be necessary to set the library
      path pointing to your 64 libraries by adding their location to
      make, e.g. "LDFLAGS='-L/usr/local/lib/sparcv9'".  This might
      apply to other unices, too.



9. Q: How to set Monit to run from daemontools?

   A: Use following script:

       --8<--
       #!/bin/bash

       mkdir -pm 755 /service/monit/log

       cat << EOF > /service/monit/run
       #!/bin/bash
       echo Starting Monit
       exec /usr/bin/monit -Ic /etc/monitrc 2>&1
       EOF

       # optional if you want monit to log to multilog
       cat << EOF > /service/monit/log/run
       #!/bin/bash
       exec multilog t ./main
       EOF

       chmod 755 /service/monit/run /service/monit/log/run
       --8<--

      Monit will be started automaticaly by supervise.

      Above script causes that Monit will log via multilog.
      This is additional option to traditional syslog and
      own file logging. You can combine these methods or
      use them exclusively - it depends on your needs.
      
      Note: If you want just log "Starting Monit" message
      via multilog to know that supervise tried to start Monit
      and Monit's output is not interesting for you, you can
      change the Monit run script to not redirect standard
      error output to standard output:

       #!/bin/bash
       echo Starting Monit
       exec /usr/bin/monit -Ic /etc/monitrc


10. Q: Is it possible to generate static monit binaries?

    A: Yes, you can call configure like this:

        env LDFLAGS="-static" ./configure


11. Q: The monit binary seems quite bloated.  Is it possible to 
       slim it down?

    A: There are several ways to slim down monit:

       a) If possible link monit against dynamic libraries.  This is 
          the default behavior on most architectures.

       b) Compile without debug information.  GCC must be started 
          with out the "-g" option, e.g.:
             
             env CFLAGS="" ./configure

       c) Use size optimization switches for the compiler, e.g. use
          GCC with "-Os" for compiling.  It might look like this,

             env CFLAGS="-Os" ./configure

       d) Strip the binary.  A "strip monit" after compiling does the
          work.

       e) Compile without SSL support.  Start the configuration process
          with the "--without-ssl" option.

       Points a)-d) does not reduce the functionality of monit.  By 
       applying point e) you loose of course any SSL functionality. 

       On Linux it is possible to get some additional reduction by using
       dietlibc[A] or uClibc[B] instead of GNU-libc.  They might not 
       directly reduce the size of monit but it is not necessary to have
       GNU libc installed on order to use monit.  If statically linked
       binaries are required these libraries reduce the binary size 
       dramatically.

       It is possible to reduce the monit binary to a size of <320kB 
       dynamically linked with SSL, <850kB statically linked with
       SSL and <300kB statically linked without SSL using the plan above.

       [A] http://www.fefe.de/dietlibc/
       [B] http://www.uclibc.org/


12. Q: When running in debug mode (using -v option) there are following error
       messages (example):
       --8<--
       system statistic error -- orphaned process id 5812
       system statistic error -- orphaned process id 5819
       ...
       system statistic error -- orphaned process id 5854
       system statistic error -- cannot find process id 1 
       --8<--

    A: You are most probably running monit on system with security
       restrictions which hides process with PID 1. For example LIDS 
       and Linux-VServer are optionaly able to do so. Monit can run and 
       work in this environment, but some system or process resource
       statistics may be not available (will show zero as value). You can
       also unhide PID 1 for monit's context and use for example unkillable
       protection for PID 1 instead to provide access to monit to these
       statistics.


13. Q: Is here any support for external testing scripts available?

    A: We plan to add the support for external scripts in the future (see our
       TODO list - http://www.tildeslash.com/monit/doc/next.php#33). Until
       native support will be available, here are some workarounds:

       1.) nice workaround contributed by Pavel Urban is based on timestamp
       monitoring of file, which is updated by external script, running from
       cron. When everything is OK, the script will update (touch) the file.
       When the state is false, the script won't update the timestamp and
       monit will perform the related action.

       For example script for monitoring the count of files inside /tmp
       directory:
       --8<--
       #!/bin/bash
       if [ `ls -1 /tmp |wc -l` -lt 100 ]
       then
         touch /var/tmp/monit_flag_tmp
       fi
       --8<--

       run this script via cron (for example, every 20 minutes):
       --8<--
        20 * * * * /root/test_tmp_files > /dev/null 2>&1
       --8<--

       and do timestamp check on /var/tmp/monit_flag_tmp (or any file you decide)
       in monit control file:
       --8<--
        check file monit_flag_tmp with path /var/tmp/monit_flag_tmp
          if timestamp > 25 minutes then alert
       --8<--

       Done :)

       Another Example script: for monitoring the Solaris Volume Manager
       metadevices:
       --8<--
       #!/usr/bin/bash
       /usr/sbin/metastat | /usr/xpg4/bin/grep -q maintenance
       if [ $? -ne 0 ]; then
         touch /var/tmp/monit_flag_svm
       fi
       --8<--

       2.) alternatively you can use the monit's file content testing to watch
       logfiles or status files created similar way as described above.

       Example script:
       --8<--
       #!/usr/bin/bash
       /usr/sbin/metastat > /var/tmp/monit_svm
       --8<--

       and example monit syntax:
       --8<--
       check file svm with path /var/tmp/monit_svm
         if match "maintenance" then alert
       --8<--


-------------------------------------------------------------------------------

HOW-TO debug monit: 

   a) Start monit

   b) Stop the service you want monit to monitor. Let's say sshd.

   c) Run: strace -f -p $(cat ~/.monit.pid) 2>&1|tee trace.out

   d) Wait for monit to wake up and try to start sshd. (Or wake up
      the monit daemon in another console by calling monit again)

   e) If you can see a line like this in the trace console with the
      significant `= 0' at the end, it means that monit did in fact
      start sshd

   execve("/etc/init.d/sshd", ["/etc/init.d/sshd", "start"], [/* 1 var */]) = 0

   After this statement, monit is (probably) guiltless and you must
   search for the fault further down in the trace output.  Search for
   system calls that return -1, error codes like ENOENT or for the
   Segmentation fault signal, SIGSEGV).

   Here's a `grep' trick you can use to search in the output file from
   trace.
          
     egrep "= -[0-9]*| E[A-Z]*|SIGSEGV" trace.out

   If you have problems understanding and reading the trace file; join
   the monit mailing list and send us the output from the trace with a
   description of the bug, the Unix system you are using, the monit
   version and your monitrc control file.

If you know C, another option is to use the GNU debugger to debug
monit. Use the -I switch when monit runs inside gdb, this way monit
does not fork of a daemon process which makes it easier to debug.




