This is the INSTALL file for nqs-3.36.  This version has been built and
tested on the following configurations:

	SGI IRIX 5.2 using both cc and gcc
	IBM RS6000 running AIX 3.2.5
	HP 9000/730 running HPUX 9.0.

This version has been built but not tested on the following configurations:

	SGI IRIX 4.0.5 
	DECstation running Ultrix 4.1
	SUN SparcStations running SUNOS
	SUN SparcStations running Solaris

This version has not been built or tested on the following architectures,
although there is a Makefile and fixes have been submitted by others:

	DEC Alpha running DEC OSF

The first part of this document is the instructions for installing this
implementation of NQS for the first time.  Following that are the
instructions for upgrading to this version from the previous version of
NQS from Monsanto.  At the end is some information for NQS load
balancing and Version staging.

------------------------------------------------------------------------
Steps required to install NQS for the first time:

1.  Unload the save set.  It creates a directory tree called nqs-3.36.

2.  As superuser cd to the proto directory and select the proper makefile
    for your architecture:

	Makefile.sgi		for Silicon Graphics IRIX 5.2
	Makefile.sgi4		for Silicon Graphics IRIX 4.0.5
	Makefile.ibm		for IBM RS6000s
	Makefile.hpux		for HP 9000s
	Makefile.ultrix		for Decstations.
	Makefile.solaris	for Suns running Solaris
	Makefile.sun		for Suns running SUNOS
	Makefile.decosf		for DEC Alpha

    Review the makefile to ensure that you understand what it is doing. 
    In particular, if you wish to install in "non-standard" locations, 
    you must make some modifications.

    Several environment variables are used throughout the installation.
    These are defined as follows:

    NQS_HOME
	This directory contains the NQS configuration file (nqs.config)
	that defines the rest of these environment variables to NQS. 
	By default, this directory is /usr/lib/nqs.  In a local network
	this directory can be shared by multiple heterogeneous nodes.
	This variable is not used in the Makefile, but see the note below.
    NQS_LIBEXE
	This directory contains the NQS daemon programs and administrative
	shell scripts.  By default, this directory is /usr/lib/nqs. In
	a local network, this directory can be shared by multiple 
	homogeneous nodes.
    NQS_NMAP
	This directory contains the NQS network mapping database.  This
	directory is only required by NQS.  No one needs to directly access
	it.  By default, this directory is /etc/nmap.  In a local network,
	this directory can be shared by multiple heterogeneous nodes.
    NQS_SPOOL
	This directory contains the NQS spool files and queue database.  This
	directory will contain around 1,000 i-nodes associated with NQS.  By
	default, this directory will be /usr/spool/nqs.  This directory
	must be private to each node that executes NQS.
    NQS_STAGE
	This directory contains the version of NQS to be installed the next
	time NQS is quiescent.  There is no default location.  In a local
        network, this directory can be shared by multiple homogenous nodes.
    NQS_USREXE
	This directory contains the NQS user interface and utility programs.
	This directory will be used by NQS users and must be in their
	search path.  By default, this directory is /usr/bin.  In a local
	network this directory can be shared by multiple homogeneous nodes.

    These symbols are defined in the makefiles up near the top.  Make
    changes as desired.

    If NQS is installed in locations other than the standard, then each NQS
    user will have to have the environment variable NQS_HOME defined and 
    point to the directory which contains the file nqs.config.  This file
    contains pointers to the various NQS directories.

    The commands to make and install NQS are:

	make -f Makefile.xxx
		to compile and load the NQS binaries
	make -f Makefile.xxx directories
		to build the NQS directory
	make -f Makefile.xxx install

    where xxx is the suffix used when building the software.

3.  [SGI only] Install the man pages by doing a

	make maninst

    Rebuild the whatis database:

	/usr/lib/makewhatis

    For other systems, you will need to do the appropriate things
    manually to install the man pages from the man subdirectory.  There
    is a man page called nqsconfig which is provided for local
    information on the NQS configuration.

4.  Edit /etc/services (or modify your YP database) to add nqs as
	port 607/tcp:

	nqs	607/tcp				# Network Queueing System

5.  Set up the Machine ID database using nmapmgr.  Each machine you wish
    to have in your NQS network must have a unique MID.   There are
    two ways to set this up.  One way is to explicitly assign MIDs.
    The usual way is to start numbering them at 1, in this manner:

	# # nmapmgr will be installed in the location pointed to by
	# # the symbol NQS_USREXE in the makefile
	#
	# nmapmgr
	NMAPMGR>: add mid 1 node
	NMAPMGR>: add name fqdn 1
			where node is the name of the node (beaker)
			and fqdn is the fully qualified domain name
			of the node (beaker.monsanto.com).  Repeat
			for each node.
	NMAPMGR>: list
			This will list all the mids and names.
	NMAPMGR>: exit

    The other way is to implicitly assign MIDs based on the IP
    address of the various nodes.  Follow this pattern:

	# nmapmgr
	NMAPMGR>: add host node
	NMAPMGR>: add alias fqdn node
			where node is the name of the node (beaker)
			and fqdn is the fully qualified domain name
			of the node (beaker.monsanto.com).  Repeat
			for each node.
	NMAPMGR>: list
			This will list all the mids and names.
	NMAPMGR>: exit

    Consult the nmapmgr man pages for more information.  Help is 
    available at the NMAPMGR prompt by typing help.

6.  Start up NQS by typing:

	# # The nqsdaemon will be installed in the location pointed
	# # to by the symbol NQS_LIBEXE in the makefile
	#
	# /usr/lib/nqs/nqsdaemon

    If there is an error in the startup it will be written to the
    terminal.  This command will cause three daemons to run:  the main
    NQSdaemon, the logdaemon, and the netdaemon.

    If, after NQS is configured, you are satisfied with it, you can
    shutdown NQS using the "qmgr shutdown" command, and start it up
    using the command:

	# /usr/lib/nqs/nqsdaemon > /dev/null &

	(assuming that NQS_LIBEXE is in the standard place).

    After you are satisfied with the system you will want to put this
    line in your startup script.

7.  Now you should use the qmgr program to configure your system and add
    queues.  Invoke qmgr as root:

	# qmgr
	Mgr: # Direct the log information to a file
	Mgr: set log_file /tmp/nqs-logfile
	Mgr: # Indicate the level of information
	Mgr: set debug 2
	Mgr: # Add a manager other than root
	Mgr: add managers yourself:m
	Mgr: # Create and enable a batch queue
	Mgr: create batch batch-queue
	Mgr: set default batch_request queue batch-queue
        Mgr: enable queue_batch-queue
   	Mgr: start queue batch-queue
	Mgr: show all
	Mgr: exit

    See the Qmgr man pages (or type help at the Mgr prompt) for more
    information on these commands.

    Exit root and test the system by typing "qstat -x".  (You will
    probably have to rehash). You should see information  on the queue
    you just set up and it should indicate that the queue is "[ENABLED,
    INACTIVE]".  Now submit a job by typing qsub <cr> then date <cr>
    then a control-d.  Qsub will report that a batch request had  been
    submitted.  The stdout and stderr files will appear in your
    directory as STDOUT.o0 and STDERR.e0.  Stderr should be empty
    unless your .profile (Borne shell) or .cshrc + .login (C shell)
    execute commands which are not appropriate for a batch
    environment.  Stdout will contain the output of the date command.

    If you want to create a pipe queue, the commands would be:

	Mgr:  create pipe_queue pipe-queue destination=rqueue@there
	Mgr:  enable queue pipe-queue
	Mgr:  start queue pipe-queue

    These commands create a pipe queue called pipe-queue which will route
    jobs to the queue called rqueue on the machine called there.  Note
    that the machine "there" will have to be defined using nmapmgr, and
    the local machine will have to be defined using nmapmgr on the "there"
    machine.

    NQS uses the rhosts mechanism for determining if access is permitted.
    When a remote request (such as a qstat or qsub from a remote system)
    is received, first the /etc/hosts.equiv file is checked for machine
    equivalency.  If none is found, the .rhosts file in the user's home 
    directory is checked.  In this file, both the hostname and the username
    are expected.  It may be necessary to include lines with both the 
    hostname and the fully qualified hostname. Finally, if access is still
    not granted, NQS checks for a file called /etc/hosts.nqs.  At the
    most simple form, it is similar to the .rhosts file, but it can
    provide mapping of remote usernames to local usernames.  See the
    source file lib/mapuser.c for more information.   

    Eventually, you can reduce the debug level to 0, but you should not direct
    the log_file to /dev/null, as failures report useful information to the 
    log files.

8.  Create the file called NQS_LIBEXE/nqs-domain giving the names of
    all the machines in your "nqs domain".  The format is a single
    machine name on each line.  Lines beginning with a "#" are
    considered comments and are ignored.  This file serves two
    purposes.  It is the default list of machines that  will be
    checked when a user does a "qstat -d" command.  It is also the
    default list of machines that will receive broadcast messages
    if requested.  This list is usually a subset of all the
    machines given Machine ids.  A sample is in misc/nqs-domain.dist.
    Note that this can be overridden on a per-user basis by creating
    a file called .qstat in the user's home directory.  The format
    of the .qstat file is the same as the nqs-domain file.

9.  Build the msg programs required for broadcasts.  cd to the msgd
    directory. Read the README there and check the Makefiles in the msg
    and msgd directories.  The makefiles may need modifications for your
    environment.  Then do a "make" and then as root "make install".
    This will install three new programs in /usr/local/bin.  They
    are mesg, msg, and msgd.  msg allows you to write to terminals
    anywhere on the network, msgd is the daemon which listens for
    requests to write to users remotely, and mesg is a replacement
    for the standard mesg program.  See their man pages for more
    information.  After the programs are installed, edit
    /etc/services and /usr/etc/inetd.conf to add the lines for
    msg.  Then cause inetd to re-read its conf file by doing a kill
    -HUP <inetd pid>.

10. If you want to use the NQS scheduler features and are running on
    RS6000s then you need the monitor package.  A copy of that is
    on wuarchive in the nqs/unix directory.  Build and install according
    to the instructions.

That should complete the installation.




------------------------------------------------------------------------
UPGRADE:

Steps required to upgrade to this release from the previous release 
from Monsanto:

1.  Unload the save set.  It creates a directory tree called nqs-3.36.

2.  If you have changed h/nqs.h or the Makefile for your architecture,
    then you must reconcile them with the previous versions.  Since the
    format of the Makefiles have changed, you will have to check them
    manually.

3.  If you want to install NQS in a different location from the previous, 
    then you should consider this to be a new installation.  There must
    be no queued requests if you want to move the location of the
    NQS database and spool files whose standard location is /usr/spool/nqs.

    The best way is to save all information about the current system
    by getting the qmgr snap file (indicated below) and getting a list
    of the currently defined Machine IDs using the nmapmgr program.  Then
    follow the instructions above for the new install.

4.  Build the release in the proto subdirectory doing a

	make -f Makefile.xxx

    where Makefile.xxx is the appropriate Makefile.?? file for your
    architecture/OS (sgi, ibm, hpux, etc.).

5.  Get a copy of the current NQS parameters by doing the following:

	$ qmgr
	Mgr: snap file=(file-name)
	Mgr: exit

    The qmgr commands are written to the specified file..

6.  Make sure there are no running nqs jobs and then shutdown nqs by
        doing a shutdown at the qmgr prompt.

7.  Install the new version by doing a

	make -f Makefile.xxx install

8.  [SGI only] Install the man pages by doing a

	make -f Makefile.sgi maninst

    Rebuild the whatis database:

	/usr/lib/makewhatis

    Everybody else will have to do this by hand out of the man subdirectory.

9.  [If desired] Rename the NQS accounting file to something else to keep a copy,
    but have all subsequent records get written to a clean file.  The
    commands would be:

	# mv /usr/adm/nqs /usr/adm/nqs.old 

    The old file will continue to be valid, but small changes in the 
    accounting file make it convenient to start with a new file after 
    the upgrade.  There are no changes between 3.34 and 3.36, but there
    are from 3.3[123] to 3.34.

10.  Start NQS up again:

	# qmgr start nqs

That will complete the upgrade.


SETTING UP FOR LOAD BALANCING:

NQS supports several levels of load balancing.  In the simplest case,
there is no load balancing.  If a pipe queue has several destinations,
and there is no load balancing all requests are sent to one of the
destinations, ignoring the others, until the favored destination is
disabled.

A second level is that a pipe queue can be set up to be load balanced
outbound.  This means that the destinations will be selected in a sort
of round robin algorithm, so that the jobs will be distributed more
evenly.

The third level is that the destination pipe queues for a local pipe
queue can be set load balanced inbound.  This means that it will refuse
a request unless it can be run immediately.  If none of the destination
queues can run the request immediately, it waits on the source machine
until one of the destination queues can run the request.

The final level is implemented using the concept of a NQS scheduler.  A
scheduler is a machine designated to distribute jobs to a set of queues
on several machines.  The scheduler has a generic batch queue, which is
set to be load balanced outbound and is directed to several pipe queues
on the other machines (and perhaps on itself).  These queues are all
load balanced inbound.  The scheduler machine will direct the available
jobs to the various compute resources based on available information on
the power and current load on the various machines.  The power of the
various machines is specified by using the qmgr "set server
performance" command on the scheduler.  The current load on the various
machines is provided by their load daemons.  The load daemons provide
the number of NQS jobs running and the 1, 5 and 15 minute load
averages.  The scheduler uses this information to order the possible
destinations for a requests by their "ability" to run the job.  The
machines also report completion of jobs to the scheduler.  Then the
scheduler can act right away to attempt to run the next available job
on that machine, thereby minimizing idle cycles.

To set up this level of load balancing, do the following:

- Designate a machine to be the scheduler.  It will have a slightly
  greater compute load due to the scheduling processing, but it is
  more important that it be generally available than the most powerful
  machine.
- Create a pipe queue on the scheduler having destinations pipe queues 
  on each of the remote machines which will be the compute engines. 
- The pipe queue on the scheduler machine is set to be load balanced 
  outbound using the "qmgr set lb_out queuename" command.
- The pipe queues on the remote machines are set to be load balanced inbound 
  using the "qmgr set lb_in queuename" command. These queues must have a 
  single destination which is  the actual execution batch queue on that 
  machine.  
- These execution queues can be set to be pipeonly, so that requests have
  to be submitted through the scheduler.  
- All of the machines within the "cluster" are configured to know the scheduler 
  using the "qmgr set scheduler nodename" command. This  means that the 
  scheduler will be notified when jobs complete on the remote systems and 
  will have some indication about the relative load on the various machines.
- The default retry wait time can be increased, to reduce network traffic
  using the "set default destination_retry wait command".  This command controls
  the interval between which the scheduler will try to deliver a request to a
  remote node.  Since the scheduler will be informed of job completions and
  is aware of the number of running jobs on remote machines, this information
  is superfluous.  A value of 30 minutes may be appropriate.


NQS STAGING:

With Release 3.35 of Monsanto NQS, the concept of a "staged release"
has been introduced.  A common problem with managing NQS is that
requests may run for several days, and if there are several on a
machine, it may be difficult to find an opportunity to install a new
release.  It is possible to disable the queues, but that might be
inefficient under some circumstances.  A "staged release" is a new
release of NQS that is placed in a particular location.  When NQS
completes a request and would go into a quiescent state it checks for a
compatable new release in the staging area.  If one is found, NQS shuts
itself down, installs the new version, and starts itself up again.
This facility is set up by doing a "make -f Makefile.xxx stage"  which
places the next release into the staging area determined by the
NQS_STAGE definition in the makefiles.  A compatable release is defined
as one that has the same major number and a greater minor number or
patch level.


Due to my taking on other projects, it is unlikely that the Monsanto NQS
will be further enhanced.  My ability to answer questions will be 
severely limited, so I am glad to hear from you, but do not expect
timely responses to your questions.  So be warned:  If you want a 
supported version of NQS, look elsewhere!  Also note, the only 
"official" ftp site for this distribution is wuarchive.wustl.edu.  If
you did not get it there then I cannot be sure you got an offical
version.  If this distribution is no longer on wuarchive, then you 
can be sure that it is no longer supported.

------
John Roman                                          Monsanto Company
jrroma@beaker.monsanto.com                          Chesterfield, MO  63198
(314) 537-7044

