$<H1>$Help for 'connect' program.$</H1>$
$<PRE>$
Date help created:  31 Oct 1996
Date last updated:  22 Jun 2001
$</PRE>$
'connect' takes a shift file and a crosspeak file and
matches the crosspeaks to one or more pairs of shifts.

To run the program type

	connect <connect script file>

The program is intended to be used in conjunction with XPLOR,
Per Kraulis' Ansig, and rdb scripts written by Andy Raine.

There must be no more than one key word per line in the
script file.

Below <...> represents an argument for a key word
and [...] represents a key word or argument that is optional.

The syntax for the key words are

	$IREF$input_par <par file of spectrum>
	$IREF$input_shift <input shift file>
	$IREF$input_crosspeak <input crosspeak file>
	[ $IREF$output_crosspeak <output crosspeak file> ]
	[ $IREF$output_match <output match file> ]
	[ $IREF$output_xplor <output XPLOR file> ]
	[ $IREF$output_nilges <output Nilges-style XPLOR file> ]
	[ $IREF$output_null <output null matches file> ]
	$IREF$columns <first column> [ <second column> ]
	$IREF$intensity_dist <intensity> <distance>
	[ $IREF$intensity_dist2 <intensity> <distance> <distance_minus> ]
	[ $IREF$intensity_dist3 <intensity> <distance> <distance_minus> <distance_plus> ]
	[ $IREF$exclude <column> <spectral width> <tolerance> ]
	[ $IREF$residues <columns> <residue1> <residue2> ]
	[ $IREF$spectral_width <column> <spectral width> ]
	[ $IREF$split_output ]

At least one of $IREF$output_match, $IREF$output_xplor or $IREF$output_nilges
must occur.  The $IREF$output_crosspeak file contains a list of
crosspeaks that have not been matched.

All shifts are aliased according to the specified spectral width.

A description of the key words may be obtained by typing

	connect help <key word>

A description of the format of the input shift file may be
obtained by typing

	connect help $IREF$shift_format

A description of the format of the input and output crosspeak
file may be obtained by typing

	connect help $IREF$crosspeak_format

A description of the format of the output match file may be
obtained by typing

	connect help $IREF$match_format

A description of the format of the output XPLOR file may be
obtained by typing

	connect help $IREF$xplor_format

A description of the format of the output Nilges-style XPLOR
file may be obtained by typing

	connect help $IREF$nilges_format

***shift_format

The input shift file for the program has an ascii tab-separated
format, with two header lines followed by one line (record; row)
per shift data.  The first header line contains the column titles.
The second header line contains an 'N' or an 'S' in each column,
consistent with rdb format.

Each record has data for the light atom (hydrogen) and the
corresponding bonded heavy atom (anything other than hydrogen,
e.g. carbon, nitrogen, oxygen or sulfur).

The first column contains the residue name of the amino acid,
the second column contains the residue number of the amino acid,
the third column contains the light atom name, the fourth column
contains the light atom shift (in ppm), the fifth column contains
the heavy atom atom name, the sixth column contains the heavy
atom shift (in ppm), the seventh column contains the light atom
tolerance (in ppm), and the eighth column contains the heavy atom
tolerance (in ppm).

A shift of <= -99 is considered to be unknown.

The tolerances specify how close a crosspeak shift value must be
to the specified atom shift in order for there to be a match.

A given atom is allowed to have more than one entry in the file.
If so, they must be consecutive rows and if for a given peak more
than one of these entries matches then the atom is only output once
but the match counts reported include all entries matched.

***crosspeak_format

The input and output crosspeak files for the program have an
ascii tab-separated format, with two header lines followed by one
line (record; row) per crosspeak.  The first header line contains
the column titles.  The second header line contains an 'N' or an
'S' in each column, consistent with rdb format.

The records first have a set of data for each dimension, and then
a dimension-independent set.

For each dimension (of the spectrum) there are five columns.  The
first column contains the residue name, the second column contains
the residue number, the third column contains the atom name, the
fourth column contains the atom type, and the fifth column
contains the shift (in ppm).  The first four of these columns can
be null, but if not null (the residue and atom names are checked)
this will be considered to be a valid assignment.  The dimensions
are ordered with the Ansig convention, which is opposite the Azara
convention.

The dimension-independent set has four columns.  The first column
contains the unnormalized crosspeak intensity, the second column
contains the spectrum name, the third column contains the
crosspeak number, and the fourth column contains the normalized
crosspeak intensity.

The output crosspeak file has two additional columns, giving the
number of matches for the two sets of matched shifts.

***match_format

The output match file for the program has an ascii tab-separated
format, with two header lines followed by one line (record; row)
per shift data.  The first header line contains the column titles.
The second header line contains an 'N' or an 'S' in each column,
consistent with rdb format.

Each record has data for the two matched light atoms.

The first column contains the residue number of the first atom,
the second column contains the residue name of the first atom,
the third column contains the atom name of the first atom,
the fourth column contains the residue number of the second atom,
the fifth column contains the residue name of the second atom,
the sixth column contains the atom name of the second atom,
the seventh column contains the normalised intensity of the
matched crosspeak, the eight column contains the crosspeak
number of the matched crosspeak, and the ninth column contains
an estimate of the implied distance between the light atoms.

***xplor_format

The output XPLOR file for the program has a proprietary ascii
format.  See an XPLOR manual for more explanation.

***nilges_format

The output Nilges-style XPLOR file for the program is a slight
modification of the $IREF$xplor_format.

***input_par

input_par <par file of spectrum>
	This specifies the par file name of the spectrum from
	which the crosspeaks were derived.  The data file of
	the spectrum is not used.  This should be the first
	key word in the script file.

***input_shift

input_shift <input shift file>
	This specifies the input shift file.  A description of
	the format may be obtained by typing

		connect help $IREF$shift_format

***input_crosspeak

input_crosspeak <input crosspeak file>
	This specifies the input crosspeak file.  A description
	of the format may be obtained by typing

		connect help $IREF$crosspeak_format

***output_crosspeak

output_crosspeak <output crosspeak file>
	This specifies the output crosspeak file.  This file
	contains those crosspeaks that have not been matched.
	A description of the format may be obtained by typing

		connect help $IREF$crosspeak_format

***output_match

[ output_match <output match file> ]
	This specifies the output match file.  In content this
	file is equivalent to the $IREF$output_xplor file and
	$IREF$output_nilges file, and at least one of these three
	key words must appear.  A description of the format may
	be obtained by typing

		connect help $IREF$match_format

***output_xplor

[ output_xplor <output XPLOR file> ]
	This specifies the output XPLOR file.  In content this
	file is equivalent to the $IREF$output_match file and
	$IREF$output_nilges file, and at least one of these three
	key words must appear.  A description of the format may
	be obtained by typing

		connect help $IREF$xplor_format

***output_nilges

[ output_nilges <output Nilges-style XPLOR file> ]
	This specifies the output Nilges-style XPLOR file.  In
	content this file is equivalent to the $IREF$output_match
	file and $IREF$output_xplor file, and at least one of
	these three key words must appear.  A description of the
	format may be obtained by typing

		connect help $IREF$nilges_format

***output_null

[ output_null <output null matches file> ]
	This specifies the output file for crosspeaks without
	any matches.  The format is tab-separated with one
	header line followed by one line per crosspeak (without
	any matches), with the line containing the crosspeak
	number and spectrum.

***columns

columns <first column> [ <second column> ]
	This specifies one or two columns, and the data in the
	corresponding column(s) in the $IREF$input_crosspeak file are
	matched to the shifts in the $IREF$input_shift file.  The
	first column must be a light atom (hydrogen) and the second
	column, if it exists, must be the heavy atom to which the
        light atom is bonded.
	If the second column is negative the shift is not matched
	but the atom type is (for the column which is the negative
	of the specified value).  The first column must be positive.
	This key word must appear twice.

***intensity_dist

intensity_dist <intensity> <distance>
	This is used to specify how to convert the normalised
	intensity in the $IREF$input_crosspeak file into a distance.
	This key word can appear more than once, and they must be
	listed in order of decreasing <intensity> (increasing
	<distance>).  For a given crosspeak normalised intensity
	the first smaller <intensity> determines the <distance>
	to be used.
	If this key word and the other intensity_dist* key words do
	not appear then it is assumed that distance = intensity
	(this is useful for working with simulated data).
	In xplor terminology, this assumes distance_minus = distance
	and distance_plus = 0.  To set these explicitly use either
	$IREF$intensity_dist2 or $IREF$intensity_dist3.

***intensity_dist2

intensity_dist2 <intensity> <distance> <distance_minus>
	This is used to specify how to convert the normalised
	intensity in the $IREF$input_crosspeak file into a distance.
	This key word can appear more than once, and they must be
	listed in order of decreasing <intensity> (increasing
	<distance>).  For a given crosspeak normalised intensity
	the first smaller <intensity> determines the <distance>
	to be used.
	If this key word and the other intensity_dist* key words do
	not appear then it is assumed that distance = intensity
	(this is useful for working with simulated data).
	In xplor terminology, this assumes distance_plus = 0.
	To set this explicitly use $IREF$intensity_dist3.

***intensity_dist3

intensity_dist3 <intensity> <distance> <distance_minus> <distance_plus>
	This is used to specify how to convert the normalised
	intensity in the $IREF$input_crosspeak file into a distance.
	This key word can appear more than once, and they must be
	listed in order of decreasing <intensity> (increasing
	<distance>).  For a given crosspeak normalised intensity
	the first smaller <intensity> determines the <distance>
	to be used.
	If this key word and the other intensity_dist* key words do
	not appear then it is assumed that distance = intensity
	(this is useful for working with simulated data).

***exclude

[ exclude <column> <spectral width> <tolerance> ]
	This specifies that crosspeaks within <tolerance> of the
	<spectral width> for the given <column> are ignored.
	The <spectral width> is specified in ppm (not Hz).

***residues

[ residues <columns> <residue1> <residue2> ]
	This specifies that only those shift matches for residues
	between <residue1> and <residue2> for the given <columns>
	(1 or 2) are output.
	The default is that all matches are output.
	This can have multiple occurrences for a given choice of
	<columns> and if so then the shift matches for residues
	which lie in one of the specified residue ranges.

***spectral_width

[ spectral_width <column> <spectral width> ]
	This specifies that this is the <spectral width> for the
	given <column>.  This key word should be used if the
	spectral width given in the par file is not correct for
	the aliasing.
	The <spectral width> is specified in ppm (not Hz).

***split_output

[ split_output ]
	This specifies that for $IREF$output_xplor and $IREF$output_nilges
	there should be two output files, one (suffix '0') for
	unassigned output and one (suffix '1') for assigned
	output.
