PDP-1 COMPUTER ELECTRICAL ENGINEERING DEPARTMENT MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASSACHUSETTS 02139 PDP-45-2 CERTAINLY (abridged) Sept. 24, 1971 4 Introduction Certainly assembles source programs written in PDP-1 assembly language into object programs. The source language provides a convenient way of coding algorithms while giving the programmer complete control over the content of the object program. The source program may be read from the drum, or the on-line typewriter. The object program may be written onto drum field 1 or punched on paper tape. Certainly processes the source program twice. During pass 1 address tags and other symbols are defined, and constants and variables areas are allocated. During pass 2 the object program is produced. Macros, repeats, and conditionals are expanded during both passes. A sample program written in Certainly assembler language is shown below. sum n=100 100/ a, law tab dap b dzm s b, lac . adm s idx b sas c jmp b hlt tab, tab+n/ s, 0 c, lac tab+n start a The first non-blank line is the title, which is printed on the typewriter at the beginning of each pass. The program ends with the start pseudo-instruction. A program may be divided into several consecutive sections, each with a title and start pseudo-instruction. This is useful when the input or output medium is changed between sections. >>15<< The Source Language For clarity the following symbols are assigned to the invisible characters when needed in examples of parts of source programs. carriage return (cr) tabulation (tab) The source program is considered to be a series of syllables and separators. A separator is one of the following characters - space, tab, cr, +, -, x, ^, >>05<<, <, >, ~, =, comma, (, ), [, ], and slash. A syllable is a string of alphanumeric characters (digits, letters, and period) preceded and followed by separators. The most important object in the source language is the expression, which has a numerical value to be used as a storage word of the object program, location assignment, argument, etc. An expression is one or more terms separated by suitable combining operators. The following are some of the forms terms can take - A symbol is a syllable containing at least one letter. Symbols may be of arbitrary length, but are recognized by their first six characters. If a symbol is undefined, the expression in which it appears is undefined. If it is defined as a macro-instruction, or pseudo-instruction special action is taken. The mnemo- nics for the PDP-1 machine instructions are initially defined as shown in the appendix. A number is a syllable which is a string of digits with an optional decimal point at the end. The value of a number is computed using ones complement arithmetic (modulo 777777). 777777 is not changed to +0. If a number is immediately followed by a decimal point, then it is taken as decimal regardless of the current radix. The syllable consisting of a single point evaluates to the current location, which is the address at which the current instruction is to be assembled. A term consisting of upper case characters is a micro- program instruction (see memo PDP-35). The syllable must not contain case shifts. Certain pseudo-instruction generate terms. See the descriptions of the pseudo-instructions for details. flexo abc is a term with value 616263 ) Terms may be combined by use of the following operators - + or space means ones complement addition. A sum of zero is always plus zero unless one of the addends was minus zero. - means subtraction. Minus signs count out properly, thus ---3 = -3. >>05<< means bitwise inclusive or ^ means bitwise and ~ means bitwise exclusive or x means integer multiply. Multiplication is mod 777777, e.g. 400000x6 = 3 > means integer quotient. The argument on the left is divided by the argument on the right. Division by zero returns the original dividend. < means remainder of integer division. Division by zero returns zero. Operator priority Operations of the same priority are performed from left to right. Operations of different priorities are performed in the order given in the table below. unary + and - (executed first) < > ^ x ~ >>05<< binary + and - (executed last) 6 Two consecutive operators are assumed to have zero between them. The following are some examples of symbolic expressions, giving the values (in octal) on the right. expression value 2 2 2+3 5 2-3 777776 2x3 6 2>>05<<3 3 2^3 2 2~3 1 -5>>05<<1 777773 (note that -5 was computed first) 5<3 2 13>5 2 7-2>>05<<3 4 add 40 400040 cla>>05<>05<>05<>37<< text The pseudo-instruction text is used to assemble an arbitrarily long string of characters. The character immedi- ately following the separator after the pseudo-instruction name is used as the break character. Following characters, up to but not including the next appearance of the break character, are packed three to a word and assembled into the object program. If the break character which ends the string is followed by octal digits instead of a separator, the assembler goes into "octal" mode, in which pairs of digits are taken as 6 bit numbers and packed as if they were characters. When the break character is next encountered the assembler reverts to normal "text" mode. The assembler alternates between text and octal modes until the break character, followed by a separator, is found while in text mode. Note that the string begins and ends in text mode, and there are always an even number of appearances of the break character. examples - text .abc.7652.de. assembles into 616263 765264 650000 text //14/abc/13// assembles into 146162 631300 Because text may generate more than one word of data, it should only be used to generate storage words. It should not be used in constants, arguments, etc. text7 The pseudo-instruction text7 assembles characters in 7 bit form. The pseudo-instruction name is followed by a string in the same format as for text. Bit 11 is turned on for each character that is in upper case, and the resultant characters are packed five per two words, left justified. Bit 0 of the first word in each pair is zero. Radix Control All numbers not followed by a decimal point are inter- preted according to the current radix. The radix is set to octal at the beginning of each pass. decimal Decimal sets the radix to decimal. octal Octal sets the radix to octal. The decimal and octal t pseudo-instructions may be used anywhere within an expres- sion, hence an expression may be interpreted partly in decimal and partly in octal. radix Radix is followed by an expression and sets the radix to the value of that expression. The expression must be defined on both passes. The usx error is given if this is not the case. 0 Automatic Constant Allocation It is frequently necessary to assemble an instruction whose address part is the address of a register in which a constant is stored. The assembler facilitates this operation by automatically assembling a register containing a constant whenever the constant appears enclosed in parentheses in an expression. The constant with its parentheses then evaluates to the address in which the constant is assembled. The right parenthesis after the constant may be (and almost always is) omitted. A constant does not need to be defined on pass 1. If it is undefined on pass 2 the usc error will be given. example - sas (13 assembles into an instruction which skips if the accumulator contains 13 constants The actual constants are saved in a table in the assembler and then assembled in a block at the next appearance of the constants pseudo-instruction. Duplicated constants are combined and stored in the same register. The amount of space allocated for the constants area during pass 1 may exceed the amount actually used on pass 2, since, if constants are undefined on pass 1 the assembler is sometimes unable to determine whether they are duplicated and must assume that they are not. The pseudo-instruction constants may be used up to 8 times in a program. Each constant is placed in the next constants area regardless of whether the same constant appeared in an earlier constants area. The programmer should not make any assumptions about the order of constants within a constants area. , Automatic Variable and Array Allocation Certainly will automatically allocate one register of memory for a variable or temporary if the name of the variable appears with an overbar. The overbar may be anywhere within the name. Only one appearance of the name needs an overbar. The symbol will be defined to have a value of the address of the register which is allocated. A variable must have been previously undefined on pass 1. The mdv error will occur if this is not the case. dimension The dimension pseudo-instruction declares a symbol as an array or table to be automatically allocated. Dimension is followed by a series of array declarations separated by commas and terminated by a carriage return. Each declaration consists of the array name followed by its length enclosed in parentheses. The length may be any expression, which must be defined on pass 1. The usd error will occur if the array size is not defined. Each array name will be defined to have a value of the address of the first word of the array. An array name must have been previously undefined on pass 1. The mdd error will occur if this is not the case. example - dimension a(10),b(20),c(1) declares a, b, and c as arrays of 10, 20, and 1 word respectively. The declaration for c could have been accomplished by its appearance with an overbar in any expression. variables All variables and arrays are placed in a variables area, which the assembler constructs when it encounters the variables pseudo-instruction. This pseudo-instruction may be used up to 8 times in a program. Each variable or array is placed in the next variables area after the overbar or dimension pseudo-instruction that declares it. The program- mer should not make any assumptions about the order of variables and arrays within an area. The initial contents of variables and arrays are not assigned by the assembler. 6 The use of dimension, constants, and variables is shown in the program below. sum n=100 dimension tab(n) 100/ a, law tab dap b dzm .s b, lac . adm s idx b sas (lac tab+n jmp b hlt variables constants start a This will produce the same object program as the example given in the introduction, except that s is not initialized, and the relative order of s and tab in the variables area is unknown. The array tab is not initialized in either example. >>76<< repeat The pseudo-instruction repeat is used to make the assem- bler process part of the source program a specified number of times. The pseudo-instruction is followed by the count, which may be any expression and is terminated by a comma. The characters following the comma up to and including the next carriage return are the range. The assembler behaves exactly as if the range had been typed a number of times equal to the count. example - repeat 3,ril 6s tyo is treated as if it were ril 6s tyo ril 6s tyo ril 6s tyo z=0 repeat 3,z=z+10 y=0 repeat 3,y=y+1 y+z is treated as if it were z=0 z=z+10 y=0 repeat 3,y=y+1 y+z z=z+10 y=0 repeat 3,y=y+1 y+z z=z+10 y=0 repeat 3,y=y+1 y+z which is treated as if it were z=0 z=z+10 y=0 y=y+1 y+z y=y+1 y+z y=y+1 y+z z=z+10 y=0 y=y+1 y+z y=y+1 y+z y=y+1 y+z z=z+10 y=0 y=y+1 y+z y=y+1 y+z y=y+1 y+z which assembles into the sequence of words 11,12,13,21,22,23,31,32,33 The count must be definite on both passes, or the usr error will occur. Sense switch 6 will prevent the error message from being printed on pass 1. A negative count is taken as zero. t Macro-instructions A macro-instruction is a user-defined "abbreviation" for a given string of characters. Macro-instructions are created by use of the define and terminate pseudo-instructions. Subsequent appearances of the macro-instruction name cause the macro to be "called". The assembler behaves exactly as if the characters that form the definition had been typed in place of the call. A macro-instruction call may supply arguments that are inserted into the definition at specified points. The characters that are substituted for the call are the "expansion" of the macro. Macro-instructions must be defined before they are called. example with no arguments (definition) define abs spa cma terminate (call) lac x abs dac y is treated as if it were lac x spa cma dac y example with two arguments (definition) define move a,b lio a dio b terminate (call) move j,k+3 is treated as if it were lio j dio k+3 l another (definition) define clear a,b law a dap .+1 dzm . idx .-1 sas (dzm a+b jmp .-3 terminate (call) clear tab,100 is treated as if it were law tab dap .+1 dzm . idx .-1 sas (dzm tab+100 jmp .-3 define and terminate The pseudo-instruction define is followed by the name of the macro to be defined and then the list of "dummy symbols", separated by commas and terminated by a carriage return. The following text, up to the appearance of the pseudo-instruction terminate, become the definition. All appearances of dummy symbols within the definition are removed and marked as places where arguments are to be substituted when the macro is called. The actual definition begins with the character after the tab or carriage return that ends the dummy symbol list. It ends on and includes the separator before the terminate pseudo-instruction. In order to permit macro or function definitions within a macro, appearances of define, function (see below), and terminate are counted. The macro ends on the first terminate not paired with a define or function. If terminate is followed by a separator other than tab or carriage return, a symbol must follow. It is compared with the name of the macro being defined. A disagreement causes the mnd error. This is sometimes helpful in debugging complicated macros. ) In order for the assembler to recognize a dummy symbol in the definition the symbol must be preceded and followed by separators or non-alphanumeric characters such as overbar, underbar, centerdot, or illegal characters. In some cases it is desirable to substitute an argument adjacent to an alphanumeric character, such as a symbol. This would require adjoining a dummy symbol with another symbol, which makes it impossible for the assembler to determine where one symbol ends and the other begins. To prevent this difficulty, the separator single quote is provided. A single quote separates the symbols, permitting recognition of the dummy symbol. The single quote is then removed and does not appear in the expansion. If it is immediately surrounded by case shifts, they are removed also. example - define type x lio (char r'x tyo terminate type q then becomes lio (char rq tyo The use of rx without the single quote would have prevented recognition of x. Where the count of defines is nonzero, i.e. in a definition within a macro, single quotes are not removed, since they will presumably be needed again. macro calls A macro is called whanever its name appears followed by a separator other than equals sign. If the separator is tab or carriage return, there are no arguments. Otherwise the following characters, up to the next tab or carriage return, form the argument list. The arguments are separated from each other by commas. They do not include the commas, the separator after the macro name, or the tab or carriage return after the last argument. In order to permit comma, tab, and carriage return in an argument, these characters may be hidden inside brackets in the same way that carriage returns are hidden in a repeat range. The outermost pair of brackets is removed from each argument. The arguments are then substituted as character strings for the dummy symbols in the definition, and the resulting expansion is substitu- ted for the macro call. After the expansion has been processed, assembly resumes with the character after the tab or carriage return that ended the argument list. >>34<< If more arguments are supplied than the number of dummy symbols in the definition, the extra arguments are ignored. If too few arguments are supplied, the empty character string is used for the missing arguments, unless a symbol is generated. generated symbols It is sometimes helpful to have a macro generate one or more symbols to be used as address tags, etc. within the macro. For this purpose dummy symbols may be declared to be candidates for generated symbols. If a slash appears in the dummy symbol list, all the following symbols are candidates for symbol generation. If, at the time the macro is called, the argument corresponding to such a symbol is missing, the assembler will generate a symbol and use it for the argument. A new symbol is generated for each call. Generated symbols are of the form .g0001, .g0002, etc. If the argument is supplied, it overrides the generated symbol. example - define ifzero x/y sza jmp y x y, terminate The generated symbol provides an address for the instruction to jump over x without knowing how many words x will become. ifzero [lac a dac b lio c] becomes sza jmp .g0001 lac a dac b lio c .g0001, stop The pseudo-instruction stop causes an immediate exit from the most recently entered macro. The assembler behaves as if it had reached the last character of the definition, and continues from the character after the call. >>16<< Miscellaneous Pseudo-instructions start The start pseudo-instruction indicates the end of the souce program or program section. It is optionally followed by an expression to be used as the starting address for the program. The starting address is used to punch the jump block when the object program is being punched on paper tape. After the tape finishes reading in, execution of the program begins at the specified address. The argument to start is not used if the object program is written on the drum. 2 Program Format While Certainly has few requirements on format, many programmers have found that adherence to a fairly rigid format is helpful in writing and correcting programs. The following suggestions have been found useful in this re- spect. Place address tags at the left margin, and run instruc- tions vertically down the page indented one tab stop from the left margin. Use only a single carriage return between instructions, except where there is a logical break in the flow of the program. Then put in an extra carriage return. Forget that you ever learned to count higher than five. Let Certainly count for you. Do not write "dac .+16", use an address tag. This will save grief when corrections are required. Have a listing handy when assembling or debugging a program, and note corrections thereon as soon as they are found. As macro-instructions must be defined before they are used, put these definitions at the beginning of the program. If the pseudo-instructions variables and constants are used, place them at the end of the program, just before start. 3 Assembly Procedure Certainly normally reads the source program from Expen- sive Typewriter's text buffer and places the object program on drum field 1. However, many variations in procedure are possible by typing control characters on the typewriter. input medium e Expensive Typewriter buffer y online typewriter output medium d drum field 1 t paper tape w without output (just check for errors) special format g get (turn on) x exchange (turn off) [g,x]i input routine (loader) [g,x]j jump block [g,x]l label (title) assembly control s begin next pass, or, after pass 2, punch jump block (also, suppress output and proceed after error) c continue same pass on next program section (also, proceed after error) 1 begin pass 1 2 begin pass 2 f forget (initialize everything) z assign and zero drum field 1 l symbol printout a print/punch symbols in alphabetic order n print/punch symbols in numeric order k print constants areas exit b back to ID, leaving symbol table in core where "2T" command can read it m meliorate source program (back to Expensive Typewriter) d Sense Switches During an assembly Certainly uses sense switches for the following functions. 1 Type out every character of the source program, including expansions of repeats, and macros. This is useful when debugging macros. 2 Punch all output that would normally be typed except for error messages. This includes output produced by printx, printo, printc, and sense switch 1. 4 Proceed after error messages as if "c" had been typed. 5 Forbid indefinite address on pass 1. Give usl error instead. 6 Permit undefined arguments to printo, printc, and repeat on pass 1. During some control functions Certainly uses sense switches as follows. 1 Stop output from a, n, or h. 2 Punch output from a or n instead of typing it. 3 Print symbol listing with tab instead of equals sign. When Certainly is started at location 104 (as it is when the "M" command is given in expensive typewriter), it listens for control characters from the typewriter. After each pass on a program section, it listens for another control character. When Certainly is started at location 102 (as it is when the "N" command is given in expensive typewriter), it automatically goes through both passes of the assembly and returns to ID as if the sequence z, s, s, and b had been typed. It directs ID to place the starting address of the program in the program counter, read the symbol table, and unsave drum field 1 into core. Certainly assigns and dismisses the punch as needed. When the object program is punched on tape, the first program section is normally preceded by the title, punched in block letters, followed by the input routine. The program itself is punched in checksummed data blocks of up to 100 words each. If the title contains a centerdot, the centerdot and all following characters will not appear on the tape. The tape format may be changed by control characters and the pseudo-instructions readin and noinput. s Error Messages Upon detecting an error, Certainly will print a line in the following format. aaa p,l ccc dddd eee where aaa is a three letter code indicating the error, p,l is the page and line number at which the error occurred, ccc is the symbolic address (relative to the last tag), and dddd is the name of the last pseudo-instruction, macro, or function. In the case of an error caused by a symbol, eee is the symbol. Following is a list of error messages and the action taken if assembly is continued. sce Symbol table capacity exceeded. No recovery. pce Pushdown capacity exceeded (nesting of repeats, and macros is too deep.) The pushdown list is cleared and assembly starts over at the top level. cce Constants capacity exceeded (more than about 400 constants). The current constant will evaluate to zero. mce Macro capacity exceeded and the garbage collector could recover no space. No recovery. ich Illegal character. It is ignored. rpm Wrap around memory. The location counter has overflowed. It will be reset to zero. ilf Illegal format. Characters are ignored to the next tab or carriage return. ipi Illegal pseudo-instruction. A pseudo-instruction is used in an illegal context. Same recovery as ilf. mdv Multiple definition of a variable (a symbol with an overbar was previously defined). The old definition remains. mdd Multiple definition in dimension (a symbol in a dimension declaration was previously defined). The old definition remains. mdt Multiple definition of a tag. A defined tag does not match the location counter. The tag is not redefined. x usw Undefined symbol in a storage word. The symbol is taken as zero. All error messages beginning with "us" refer to undefined symbols and are identified by the third letter as follows. usl In a location assignment. usc In a constant. usj In a jump block (argument for word). uss In argument for start. usa In argument for a function. usv In argument for return. ust In an address tag that is not a single symbol. usr In a repeat count. usd In an array size for dimension. use In a formal symbol definition (with equals sign). usx In an argument for radix. nca No constants area. The constant is assembled as zero. ipa Illegal formal symbol assignment. It is ignored. mnd Macro or function name disagrees with name after terminate. The original name is used. uer Micro-program error (upper case letters do not form a micro-program instruction). Same recovery as ilf. vld Variables location disagrees between passes 1 and 2. The location is forced to agree. tmv Too many variables areas. The pseudo-instruction variables is ignored. cld Constants location disagrees between passes 1 and 2. The location is forced to agree. tmc Too many constants areas. The pseudo-instruction constants is ignored. ctl Constants area too long (longer on pass 2 than on pass 1). The constants area is truncated. _