PRS: Lexical scanner, macro processor, and parser; first pass of the SETL compiler.
PRS: Lexical scanner, macro processor, and parser; first pass of the SETL compiler. stlprs.opl
1 .=member intro 2$ ssssssss eeeeeeeeee tttttttttt ll 3$ ssssssssss eeeeeeeeee tttttttttt ll 4$ ss ss ee tt ll 5$ ss ee tt ll 6$ sssssssss eeeeee tt ll 7$ sssssssss eeeeee tt ll 8$ ss ee tt ll 9$ ss ss ee tt ll 10$ ssssssssss eeeeeeeee tt llllllllll 11$ ssssssss eeeeeeeee tt llllllllll 12$ 13$ 14$ ppppppppp rrrrrrrrr ssssssss 15$ pppppppppp rrrrrrrrrr ssssssssss 16$ pp pp rr rr ss ss 17$ pp pp rr rr ss 18$ pppppppppp rrrrrrrrrr sssssssss 19$ ppppppppp rrrrrrrrr sssssssss 20$ pp rr rr ss 21$ pp rr rr ss ss 22$ pp rr rr ssssssssss 23$ pp rr rr ssssssss 24$ 25$ 26$ t h e s e t l p a r s e r 27$ 28$ this software is part of the setl programming system 29$ address queries and comments to 30$ 31$ setl project 32$ department of computer science 33$ new york university 34$ courant institute of mathematical sciences 35$ 251 mercer street 36$ new york, ny 10012 37$ 38 39 40$ this file contains the source for the first phase of the setl 41$ compiler. this phase consists of a parser, macro processor, 42$ and lexical scanner. its output is a symbol table and a reversed 43$ polish string. 44$ 45$ the parser is by far the simplest of the three sections. 46$ the setl grammar is written in an extended bnf which can 47$ be viewed as an assembly language for some abstract 48$ parsing machine. this assembly language is turned into 49$ interpreteble code by the utility program 'syn', and 50$ interpreted by the parser. 51$ 52$ the parser consists of an interpreter called 'parse' and 53$ several small auxilliary routines. 54$ 55$ the scanner is also quite simple. it consists of a single 56$ routine called 'trulex' which returns the next token from the 57$ input stream. 58$ 59$ the macro processor is relatively complex; the rest of this 60$ section will concentrate on its design. 61 62$ the macro processor consists of three hierarchical routines. 63$ each of these routines can be viewed from the outside as 64$ a black box which obtains a token from some lower level routine 65$ and returns it. from above, these routines seem like a garden 66$ variety lexical scanner. in actuality they intercept certain 67$ tokens passed to them in order to expand macros and absorb 68$ macro definitions. 69$ 70$ the three routines are: 71$ 72$ 1. lexscan 73$ 74$ this is the top level routine of the macro processor. 75$ each time it is called, it returns a token from a lower 76$ level routine. before returning a token, it checks 77$ whether the token is a macro. if so, it initializes the 78$ expansion of the macro and requests another token. it 79$ returns the first token which is not a macro. 80$ 81$ 2. absorb 82$ 83$ this is the second level routine of the macro processor. 84$ it passes along tokens from the routine below it, looking 85$ for occurrences of 'macro' and 'drop'. each time it sees 86$ one of these it makes a detour to process a macro drop 87$ or definition. 88$ 89$ 3. expand 90$ 91$ this is the bottom level routine of the macro processor. 92$ it checks whether any macros are currently being expanded. 93$ if so, it returns the next word of macro text; otherwise 94$ it obtains a token from trulex. 95 96$ the macro processor uses three global data structures: 97$ 98$ 1. mtab: 99$ 100$ this is an array used to store macro text. each word of 101$ mtab represents a single token of macro text. there are 102$ three types of macro text entries: 103$ 104$ a. simple text 105$ 106$ a simple text entry is represented by a symbol table 107$ pointer. 108$ 109$ b. user supplied arguments 110$ 111$ the i-th user supplied argument is represented by 112$ i+bias_user. bias_user is a bias used to distinguish 113$ user supplied arguments from plain text. 114$ 115$ c. generated arguments 116$ 117$ generated arguments are names which are supplied by the 118$ compiler during each macro expansion. the first time 119$ we see each generated argument in a new expansion, we 120$ generate a new name for it; the next time we see it 121$ we supply the old name. 122$ 123$ generated arguments are represented in two forms: 124$ 125$ i+bias_gen1: first occurrence of i-th generated argument 126$ i+bias_gen2: any other occurrence of i-th generated arg. 127$ 128$ the various biases are arranged as follows: 129$ 130$ bias_user = dimension of symbol table + 1. 131$ bias_gen1 = bias_user + 100 132$ bias_gen2 = bias_gen1 + 100 133 134$ the text for each macro consists of a series of mtab entries 135$ followed by a zero entry. if 'm' is a macro then 'morg(m)' 136$ gives its first entry in mtab. 137$ 138$ 2. astack 139$ 140$ astack is an array used to hold actual arguments of macros 141$ currently being expanded. each argument can consist of a 142$ series of tokens. its corresponding astack entry consists 143$ of a series of symbol table pointers followed by a zero word. 144$ 145$ 3. mstack 146$ 147$ mstack is a stack used to control recursion during macro 148$ expansion. we associate a series of mstack entries, known 149$ as a stack frame, with each macro we are in the process of 150$ expanding. 151 152$ each stack frame is organized as follows: 153$ 154$ first word: points to next word of macro text 155$ second word: gives number of arguments 156$ i+2 nd word: gives pointer to start of i-th argument in astack 157$ 158$ stack frames are built whenever lexscan encounters a macro. 159$ expand returns the next word of macro text if we are processing 160$ macros and calls trulex otherwise. 161 162$ when expand discovers a user-argument in the text of a macro 163$ it begins substituting the actual argument. this is done by 164$ building an extra stack frame which contains: 165$ 166$ first word: points to next word of argument text in astack 167$ second word: indicates zero arguments. 168$ 169$ the pointer to astack is biased by the dimension of mtab to 170$ distinguish an argument stack frame from a macro stack frame. 171 172$ the parser does not communicate with the scanner directly, but 173$ instead calls an interface routine -gettok-. this routine 174$ manipulates a token buffer which is used to implement backtracking 175$ in the parser. 176 177 178$ outputs of the parser 179$ --------------------- 180 181$ the parser produces two tables: 182 183$ 1. the polish string 184 185$ the polish string is represented as an array whose entries 186$ have two fields: 187 188$ pol_typ: type code pol_xxx 189$ pol_val: value of entry 190 191$ entries in the polish string are refered to as 'nodes'. there 192$ are three types of nodes: 193 194$ a. names 195 196$ these nodes indicate names appearing in the source program. 197$ they have: 198 199$ pol_typ: pol_name 200$ pol_val: pointer to 'names' array 201 202$ the names array is used to store the actual name of 203$ the node. 204 205$ b. counters 206 207$ counters are integers which indicate the number of clauses 208$ found by each operation of the parser. they have: 209 210$ pol_typ: pol_count 211$ pol_val: integer indicating count 212 213$ c. markers 214 215$ markers are nodes indicating points where semantic routines 216$ are to be invoked. they have: 217 218$ pol_typ: pol_mark 219$ pol_val: code p_xxx 220 221$ the polish string is written out whenever it is full. rather 222$ than write out both the polish string and the names array, 223$ out, we write the names entry for each node directly after 224$ the node itself. 225 226$ the variable 'polp' points to the last entry in the string. 227 228$ the polish string is actually written onto two files known 229$ as the main file and the auxiliary file. the two files 230$ are merged by the semaintic pass. this allows the semanitc 231$ pass to act as if there were no forward references to 232$ procedures. 233 234 235$ 2. names 236 237$ names is a word-sized array used to store the names of tokens. 238$ we pack each token into names by converting it to a 'standard' 239$ self defining string, then putting each word of the string into 240$ a successive word of names, starting with the low order word. 241 242$ pointers to names always point to the low order word of a string, 243$ and thus allow us to access its slen and sorg fields. 244 245$ names entries are 'standardized' in the sense that an 246$ n-character token will always have an sorg of .sds. n +1. 247 248 249 250 251 252$ the symbol table 253$ ---------------- 254 255$ the parser uses a symbol called symtab. it has the following 256$ fields: 257 258 259$ name: pointer to names array 260$ lex_typ: extended lexical type 261$ key_val: lexical value, see below 262$ lit_val: literal value, see below. 263$ link: clash list link 264$ morg: points to first word of macro text. 265$ margs: number of macro arguments 266$ mcode: indicates representation in macro text 267 268$ in the code which follows we distinguish between 'keywords' and 269$ 'literals'. 270 271$ a 'keyword' is any token whose lex_typ is not some standard 272$ type such as name, integer, or real. there are several groups 273$ of keywords, such as binary operators, statement keywords, etc. 274$ each set of keywords forms a separate lexical class. 275 276$ each keyword has a value associated with it. this value is 277$ given by the symbols key_val field. its meaning is a function 278$ of its lex_typ: 279 280$ lex_typ key_val 281$ -------- ------- 282 283$ l_dkey index for branch on keyword 284$ l_tkey1 as above 285$ l_tkey2 as above 286$ l_stkey as above 287$ l_lparen as above 288$ l_bin precedence 289$ l_un precedence 290$ l_minus precedence 291$ l_debug debugging code dbg_xxx 292 293$ key_val is zero for all other types. 294 295$ a 'literal' is any token which appears explicitly in the 296$ grammar. the lit_val field of each literal contains a code 297$ lit_xxx which identifies which literal it is. tokens which 298$ are not literals have a lit_val of 0. 299 300$ note that key_val and lit_val must be disjoint, since certain 301$ tokens are both literals and keywords. 302 303$ symtab is arranged as a hash table, with the link field used 304$ to chain entries with the same hash code. a separate table 305$ called 'heads' is used to hold the heads of clash chains, and 306$ the variable 'symtabp' points to the last used symtab entry. 307 308$ note that tokens are hashed on their names only, not on their types. 309$ the types of keywords are set by the initialization routine. if 310$ trulex discovers a token whose lex_typ field is not already set, 311$ it fills in a default. 312 313 1 .=member mods 2 3 4$ program revision history 5$ ------------------------ 6 7$ this section contains a description of each revision to the program. 8$ these descriptions have the following format: 9$ 10$ mm-dd-yy jdate author(s) 11$ 12$ 1.............15........25........................................ 13$ 14$ where mm-dd-yy are the month, day, and year, and jdate is the julian 15$ date. 16$ 17$ each time a revision is installed, the author should insert a 18$ description after line 'mods.21', and change the macro 'prog_level' 19$ to the current julian date. 20$ 21$ ...................................................................... sunb 1 sunb 2 sunb 3$ 07/24/84 84206 s. freudenberger sunb 4$ sunb 5$ 1. introduce program parameters -lcp- and -lcs- to control default sunb 6$ output: -lcp- controls the listing of program parameters, i.e. sunb 7$ the initial phase heading; -lcs- controls the listing of the sunb 8$ final statistics. if both are set, the old listing is generated; sunb 9$ if neither is set, no output is generated unless an error occurs. sunb 10$ modules affected: start, prsini, and prstrm. asca 1 asca 2 asca 3$ 03/05/84 84065 d. shields asca 4$ asca 5$ 1. for s37, add option 'cset=asc' to permit translation of programs asca 6$ developed in ascii without need to translated to portable asca 7$ character set. asca 8$ modules affected: start, prsini, and initlex. suna 1 suna 2 suna 3$ 02/05/84 84065 s. freudenberger suna 4$ suna 5$ 1. support motorola mc68000 microprocessor on sun workstation. suna 6$ modules affected: start, prsini, ptinit, trulex, initlex, suna 7$ stackexp, and svtab. smfc 1 smfc 2 smfc 3$ 09/01/83 83244 s. freudenberger smfc 4$ smfc 5$ 1. document machine-dependency of integer representation in setl smfc 6$ binary i/o. smfd 1$ module affected: newstat. smfc 7$ 2. correct a comditional assembly directive for s10/s20. smfc 8$ module affected: trulex. smfb 1 smfb 2 smfb 3$ 08/08/83 83220 s. freudenberger smfb 4$ smfb 5$ 1. increase the dimension of astack and mstack to 1023. smfb 6$ module affected: start. smfb 7$ 2. check while collecting open tokens for the end of a loop header, smfb 8$ to avoid scanning beyond it. this required to add the keyword end smfb 9$ to the list of keywords defined by default. smfb 10$ modules affected: start, initstd, opentoks, endtoks, and chktoks. smfb 11$ 3. code anyc string search function inline. smfb 12$ modules affected: trulex and initlex. smfb 13$ 4. change some cases of s32, s37, s47 conditional assembly to r32. smfb 14$ modules affected: inparse and svtab. smfa 1 smfa 2 smfa 3$ 12/16/82 82350 s. freudenberger smfa 4$ smfa 5$ 1. modify trulex to call an internal rather than an external smfa 6$ subroutine to read the next character. smfa 7$ module affected: trulex. smfa 8$ module deleted: getc. smfa 9$ 2. check that there really exists an open loop when we encounter a smfa 10$ 'continue' or 'quit' statement with no tokens. smfa 11$ module affected: chktoks and ermsg. 22 23 24$ 08/12/82 82224 s. freudenberger 25$ 26$ 1. we postpone the check for constant expressions to the semantic 27$ pass, and are thus able to constant fold additional operators. 28$ module affected: remote text inclusion. 29$ 2. the token buffer for the current token is set to zero before we 30$ attempt to find the token. 31$ module affected: gettok. 32$ 3. we check the symbol table index for zero before we retrieve the 33$ token name. 34$ mocudule affected: symsds. 35$ 4. the enders for error recovery in expressions have been expanded 36$ to include 'elseif' and 'else', and have been corrected to use 37$ portable character set representations in the initialisation code. 38$ module affected: ermet. 39$ 5. we check that the token buffer indeces are defined (non-zero), 40$ and replace the names of invalid tokens between tok_out and 41$ tok_in by question marks. 42$ module affected: trcbak. 43 44 45$ 06/15/82 82166 s. freudenberger 46$ 47$ 1. the setl source map file has been renamed to ssm from src. it is 48$ now written out whenever the ssm program parameter is supplied, 49$ not as a funtion of the opt program parameter. the opt program 50$ parameter now merely checks the availability of the setl q1 inter- 51$ face and that an ssm file has been specified. 52$ modules affected: start, prsini, putcard, stat1, newstat, and 53$ prstrm. 54$ 2. trailink blanks are removed before the record is written to the 55$ ssm file. 56$ modules affected: trulex and putcard. 57$ 3. we count the 'case of' of a case statement as a separate 58$ statement. 59$ module affected: (remote text inclusion) 60 61 62$ 06/01/82 82152 s. freudenberger 63$ 64$ 1. we added conditional code for the s37 mts implementation. 65$ module affected: prsini. 66$ 2. we re-introduced the keyword 'map' for ambiguous maps, i.e. maps 67$ which can have both single- and multi-valued image points. 68$ module affected: initkey. 69$ 3. the globals 'stat_no' and 'cstat_no' have been deleted. they were 70$ dead since 82032.2. 71$ modules affected: start, stat1, and newstat. 72$ 4. a new global flag has been introduced: 'eor_flag' is set whenever 73$ a new input record is read. 74$ modules affected: start and getc. 75$ 5. the implicit string concatenation now actually requires an end-of- 76$ record between the two string operands. 77$ module affected: trulex. 78 79 80$ 03/16/82 82075 s. freudenberger 81$ 82$ 1. the 'rtrs0' and 'rtrs1' debugging options have been deleted from 83$ the initialisation code of the symbol table. 84$ module affected: initkey. 85 86 87$ 02/01/82 82032 s. freudenberger 88$ 89$ 1. the listing output has been moved to start each line in column 1 90$ rather than column 7. dump outputs have not been modified. 91$ modules affected: prsini, usratp, overfl, and prstrm. 92$ 2. the statement counters have been modified to correspond to the 93$ variables used in the remaining compile phases, i.e. the use of 94$ cstmt_count, etc. 95$ modules affected: start, parse, putcard, stat1, and newstat. 96$ 3. the setl optimiser needs the setl binary interface to communicate 97$ with other compiler phases. thus we can check already here 98$ whether we will run into problems later. furthermore, the 99$ optimiser likes to produce an annotated source listing: we 100$ generate here a map from cummulative statement numbers to tuples 101$ of source lines, in setl binary format, and write it to the file 102$ src. 103$ new program parameters: 104$ opt=0/1 global optimisation flag: if set, write 105$ setl binary map from cummulative statement 106$ numbers to tuples of source lines 107$ src=/ file to which the setl binary map is 108$ written 109$ new inclusion from compl: module binio. 110$ modules affected: start, prsini, getcard, putcard, stat1, newstat, 111$ and prstrm. 112$ module deleted: lstcrd. 113$ 4. several tokens have been predefined, which before were defined 114$ when needed: ':=', 'then', and 'end'. 115$ modules affected: start, initstd, chkrepr, and badasn. 116$ 5. we only check tokens up to the 'then' in an 'if' statement. this 117$ prevents look-ahead beyond the 'then', and thus keeps statement 118$ numbers correct for the source listing. 119$ modules affected: opentoks, endtoks, and checktoks. 120 121 122$ 02/01/82 82032 d. shields 123$ 124$ use r32 conditional symbol for standard 32-bit fields. 125$ this replaces the field definitions for s32, s37 and s47. 126 127 128$ 01/15/82 82015 s. freudenberger & d. shields 129$ 130$ 1. prsini has been modified to print the phase header to the terminal 131$ whenever the new control card parameter 'termh=0/1' is set. 132$ new control card parameter: 133$ termh=0/1 print phase header on the terminal file 134$ module affected: prsini. 135$ 2. the nameset nsptab has been put in start to simplify compilation 136$ for s32u and s47 (unix). 137 138 139$ 11/29/81 81333 d.shields 140$ 141$ 1. support s47: amdahl uts (universal timesharing system). 142$ this implementation runs on s37 architecture using an operating 143$ system very close to unix (v7), and uses the ascii character set. 144 145 146$ 08/13/81 81232 s. tihor 147$ 148$ 1. expand the size of the s32 name table for the ada compiler. 149$ 2. expand the heads, symtab, and polish file sizes accordingly. 150$ 3. add the /upd switch and the /seq switch. the /seq switch 151$ initially supports the four options: l(line numbers), 152$ s(stmt_num), ls(both) and nothing. 153$ 4. add support for s20 (s10 with extended addressing). 154 155 156$ 12/05/80 80340 s. tihor & d. shields 157$ 158$ 1. shields mods for unix checkout 159$ 1a. support lower case for unix 160$ 1b. use binary parse tables for unix checkout 161$ 2. use name$sio to capture the names of the stdin, stdout, and stderr 162$ (term) files. 163$ 3. correct two stropped key words in ermet to unstropped form. 164$ 4. do not convert percent to colon inside of quotes even on s66. 165 166 167$ 11/05/80 80310 s. freudenberger 168$ 169$ the puterr routine has been split, so that the unexpected-end-of-file 170$ error message is printed more reasonably. 171 172 173$ 08/01/80 80214 s. freudenberger 174$ 175$ 1. for implementations selecting the .+lc conditional assembly, the 176$ test in the getcard routine for blank after .title has been 177$ changed to a search for blank-equivalent separators. 178 179 180$ 06/20/80 80172 e. deak 181$ 182$ modifications have been made to improve the error recovery. most 183$ changes have been made in the ermet routine to prevent scanning 184$ over reserved words. 185 186 187$ 07/08/80 80190 s. freudenberger 188$ 189$ 1. a special polish node is written to mark the end of both the 190$ polish and the auxiliary polish files. 191$ 2. the line "no errors..." is not echoed to the terminal anymore. 192$ 3. the layout of the title line has been changed. 193$ 4. lines containing listing directives are counted as input lines. 194 195 196$ 05/27/80 80148 s. freudenberger 197$ 198$ 1. adjust various macros for s37 in preparation to compile and 199$ run the optimizer on that system. 200 201 202$ 04/11/80 80102 d. shields 203$ 204$ 1. delete is_primitive. 205$ 2. 'rgcd1' was wrongly spelled 'tgcd1'. 206 207 208$ 02/04/80 80035 s. freudenberger and d. shields 209$ 210$ 1. implement unary operators acos, asin, atan, char, cos, exp, 211$ log, sin, sqrt, tan and tanh. 212$ 2. implement binary operators atan2 and interrogation (?). 213$ 3. implement type predicates is_atom, is_boolean, is_integer, 214$ is_map, is_real, is_set, is_string and is_tuple. 215$ change prim to is_primitive. 216$ 4. add procedure host() to provide means for adding 217$ implementation- or site-dependent features. 218$ 5. drop cset_dis option. 219$ 6. correct the call to 'opninc' so that the '.copy' option must 220$ start in column 2. 221$ 7. the meta-macros and system parameter macros have been replaced 222$ by an inclusion of cmnpl.sysmac. 223$ 8. the recursion stack of the parser has been changed to account 224$ for the possibility of more than 255 <*cl>'s. this change was 225$ a consequence of the increase in vlen_lim (cmnpl.q1symtab). 226 227 228$ 01/16/80 80016 s. freudenberger 229$ 230$ change 79248.5 has been undone, since the code generator symbol table 231$ allocator does not handle this extension properly. 232 233 234$ 01/15/80 80015 s. freudenberger 235$ 236$ 1. correct 'puttab' and 'putxtab' for backtracked environment. 237$ 2. correct 'retab'/'puttabs' so that subr-statement follows '*deck'. 238$ 3. correct error message in correspondence to grammar change. 239 240 241$ 12/17/79 79351 s. freudenberger 242$ 243$ 1. 'puttabs' has been shortened to six characters: 'puttbs'. 244 245 246$ 11/30/79 79334 s. freudenberger 247$ 248$ 1. the names table has been increased to the same size as in cmnpl. 249$ 2. on polish atbel overflow, we only write out hte non-backtracked 250$ part of the polish. 251 252 253$ 11/12/79 79316 s. freudenberger 254$ 255$ 1. the mode keyword 'map' has been dropped. the initialization 256$ for the repr keywords has been updated accordingly. 257$ 2. the error message for slash after compound operator has 258$ been included. 259$ 3. new error messages have been added according to the grammar 260$ extensions. 261 262 263$ 09/17/79 79259 s. freudenberger 264$ 265$ 1. the current input line is printed unconditionally when an error 266$ is found. 267$ 2. the grammar action to force the printing of the current line does 268$ not call 'put_card' directly anymore, but rather the newly added 269$ routine 'lstcrd', which checks whether the 'list' option has been 270$ selected. 271 272 273$ 09/13/79 79256 s. freudenberger 274$ 275$ 1. logical file names are sized using 'filenamlen' (define in 276$ cmnpl.sysmac, duplicated in prspl.start since sysmac is not 277$ included here. 278 279 280$ 09/05/79 79248 s. freudenberger 281$ 282$ 283$ this correction set installs setl 2.1 284$ 285$ 286$ 1. a conditional assembly 'lc' has been added to control upper/lower 287$ case equivalency for identifiers. 288$ 2. a character set control card parameter ('cset') has been added to 289$ control selection of either the portable or the extended character 290$ sets. (for further details, see comment in member start) 291$ 3. the lexical classes have been changed and ammended. 292$ 4. the parser debugging options have been renamed and made part of 293$ the general 'debug' statement. 294$ 5. we now accept certain lexical classes as names. these are the 295$ base type and the mode keywords, as well as the options for 296$ the 'debug' and 'trace' statements. 297$ 6. 'trulex' and 'initlex' have been completely rewritten, using the 298$ little string primitives. the modification also reflects the 299$ upper/lower case folding for identifiers. 300$ 7. 'trulex' now accepts '.5' as a real denotation. 301$ 8. 'initlex' contains the code to define one-character alternatives 302$ for predefined tokens. 303$ 9. the error handler has been updated. 304$ 10. the routines 'getc' and 'altchar' have been added. 'getc' 305$ returns the next character from the input buffer, and 'altchar' 306$ defines its second argument as an alternative for its first 307$ argument. 308 309 310$ 07/20/79 79201 s. freudenberger 311$ 312$ 1. error messages on the s10 are echoed to the device tty:, and not 313$ the file 'tty'. 314$ 2. the default for the s32 has been changed so that, in the absence 315$ of a 'term' parameter, errors are echoed to 'sys$error'. 316$ 3. error messages on the s10 are preceded by '?', the standard error 317$ marker for the dec-10. 318 319 320$ 05/18/79 79138 s. freudenberger 321 322$ 1. the quick parser has been renamed from stlqprs to stlqrs. 323$ 2. the term option has been cleaned up so that the skip's and 324$ the call's to contlpr interleave correctly. 325$ 3. a number of extreneous return statements have been deleted. 326$ 4. the source has been resequenced. 327 328 329$ 04/27/79 79117 s. freudenberger 330 331$ the environment has been modified to provide an assembly-level 332$ module which contains the parse table output from syn. this 333$ module is used, and it is not necessary anymore to read 'synbin' 334$ for each run. 335 336 337$ 04/12/79 79102 s. freudenberger and d. shields 338 339$ 1. three character set changes have been implemented: 340$ 1. the ascii number sign is not a synonym for the keyword exist. 341$ it is currently unused. 342$ 2. the ascii circumflex is not used as the break character. 343$ it is currently unused. 344$ 3. the ascii underline is the new break character. 345$ it is not acceptable as a synonym for the keyword in. 346$ 2. initial modifications have been included to eliminatethe need 347$ to read in the grammar file for every run. 348 349 350$ 04/10/79 79100 s. freudenberger 351 352$ 1. the parse listing commands -.eject- and -.title- only eject a 353$ page when the source is listed. 354$ 2. since there was no precedenc 7, all precedence 8 operators have 355$ been given precedence 7. the operators +:= and -:= have been 356$ given precedence 8. 357 358 359$ 04/03/79 79093 s. freudenberger and d. shields 360 361$ 1. as a first step to remove prefix stropping, the pre-control- 362$ card parameter has been removed, and the pre_flag initialized 363$ to false, i.e. reserved word stropping. 364$ 2. a conditional symbol -qp- has been added, allowing the creation 365$ of a smaller version of the parser. the main differences between 366$ the regular parser and the -quick- parser are the following: 367$ a. reduced macro- and symbol tables 368$ b. no polish strings are written onto the files -pol-/-xpol- 369$ c. there are no dump routines in this version. 370$ 3. a terminal option has been added, echo-printing all error 371$ messages onto the file specified by the term control card 372$ parameter. 373 374 375$ 03/27/79 79086 s. freudenberger 376 377$ 1. two tests were corrected in -endtoks- and -checktoks-, 378$ so that the correct number of tokens will be checked. 379$ 2. if dumps are requested, the dump routines will be invoked 380$ from -puttabs- rather than from -puttab- and -putxtab-. 381$ 3. two namesets have been renamed to avoid external name 382$ conflicts on systems which do not distinct between data 383$ and program segments: 384$ initkeyns ---> nsinitkey 385$ trulexns ---> nstrulex 386 387 388$ 03/05/79 79065 s. freudenberger 389 390$ 1. the size of the macro table (mtab) has been increased. 391$ 2. subroutine -init- has been renamed to -prsini-. this is both 392$ more consistent with the remaining phases of the compiler, and 393$ avoids a name conflict in the s10 implementation. 394$ 3. ltlsyn has been changed to emit a member synmax. this member 395$ contains a macro -parselitmax- giving the total number of 396$ literals in the grammar. -initlit- has been modified accordingly. 397$ 4. the error handling and recovery has been changed considerably. 398$ 4.1 a new control card parameter has been added: etoks=5/5 399$ in the new version, each error message prints the last 400$ etoks tokens that have been accepted by the parser, plus 401$ the tokens examined but not accepted. the latter are 402$ underlined, using a minus sign for unreserved and an 403$ equal sign for reserved tokens. 404$ 4.2 if the parser finishes to parse a program but is not at 405$ the end of the input file, it assumes that some error must 406$ have derailed it, and resumes parsing with the production 407$ 408$ 4.3 if the tokens after -end- do not match, a check is performed 409$ to see whether one ore more -end-s are missing. if so, an 410$ error is flaged, but the parse proceeds, inserting the missing 411$ -end-(s). 412$ 5. the statement numbering has been corrected (finally !). the 413$ cause for the neurotic behaviour was the routine -opentoks-, 414$ which save a fixed number of tokens. it has been modified so 415$ that it will only save the tokens to the minimum of the pre- 416$ defined limit and the next semicolon. this change is justified 417$ since both -endtoks- and -checktoks- only compare to the next 418$ semicolon. 419 420 421$ 12-27-78 78361 a. grand and d. shields 422 423$ this correction set installs machine-dependent code for the 424$ ibm 370, dec-10, and vax. 425 426 427$ 12-8-78 78342 a. grand 428 429$ 1. the scanner did not allow a signed exponent in real constants. 430$ 2. there was a grammar bug which required two new error messages. 431 432 433$ 11-15-78 78319 a. grand and s. freudenberger 434 435$ this correction does nothing but install the deck'mods' 436 1 .=member start 2 .=include cndasm suna 8 3 .-qp. 4 .+s66 subr start; 5 .+r32 prog stlprs; suna 9 .+r36 prog stlprs; suna 10 .+qp. suna 11 prog stlprs; 8 ..qp 9 11 12$ this is the main routine of the scanner-parser. it contains 13$ all the global variable declarations, plus calls to invoke 14$ the rest of the program. 15 16 17 +* prog_level = $ program level(julian date of last fix) sunb 11 'prs(84206) ' 19 ** 20 21 $ conditional symbols 22 23 $ enable qp to compile quick parse which has reduced 24 $ table sizes and runs as stand-along program to assist 25 $ in finding syntactic errors quickly. 26 27$ enable defenv_ptinit if procedure ptinit defined by environment 28 29 .+set defenv_ptinit 30 31 .+s47. 32 .-set defenv_ptinit $ read in parse tables during checkout. 33 ..s47 34 35 36 .=include sysmac 37 38$ character constants 39 40 +* cc_tab = $ tab character (if available) or blank 41 .+s10 09 42 .+s20 09 43 .+s32 09 44 .+s47 09 45 .+s37 1r 46 .+s66 1r suna 13 .+s68 09 47 ** 48 49 50 51 52 nameset symtab; $ symbol table and related variables 53 54 +* symtab_sz = $ size of symtab 55 .+s66 120 56 .+r32 168 57 .+s10 108 58 .+s20 108 59 ** 60 +* symtab_lim = 2000 ** $ dimension of symtab 61 .+qp +* symtab_lim = 1000 ** $ (small) dimension of symtab 62 63 size symtab(symtab_sz); 64 dims symtab(symtab_lim); 65 66 size symtabp(ps); $ pointer to last symtab entry 67 data symtabp = 0; 68 69 .+s66. 70 +* name(i) = .f. 001, 16, symtab(i) ** 71 +* lex_typ(i) = .f. 017, 08, symtab(i) ** 72 +* key_val(i) = .f. 025, 08, symtab(i) ** 73 +* lit_val(i) = .f. 033, 08, symtab(i) ** 74 +* link(i) = .f. 065, 16, symtab(i) ** 75 +* morg(i) = .f. 081, 16, symtab(i) ** 76 +* mcode(i) = .f. 097, 16, symtab(i) ** 77 +* margs(i) = .f. 113, 08, symtab(i) ** 78 ..s66 79 80 .+r32. 81 +* name(i) = .f. 001, 32, symtab(i) ** 82 +* lex_typ(i) = .f. 033, 08, symtab(i) ** 83 +* key_val(i) = .f. 041, 08, symtab(i) ** 84 +* lit_val(i) = .f. 049, 08, symtab(i) ** 85 +* link(i) = .f. 065, 32, symtab(i) ** 86 +* morg(i) = .f. 097, 32, symtab(i) ** 87 +* mcode(i) = .f. 129, 32, symtab(i) ** 88 +* margs(i) = .f. 161, 08, symtab(i) ** 89 ..r32 90 91 .+s10. 92 +* name(i) = .f. 001, 18, symtab(i) ** 93 +* lex_typ(i) = .f. 019, 08, symtab(i) ** 94 +* key_val(i) = .f. 027, 08, symtab(i) ** 95 +* lit_val(i) = .f. 037, 08, symtab(i) ** 96 +* link(i) = .f. 055, 18, symtab(i) ** 97 +* morg(i) = .f. 073, 18, symtab(i) ** 98 +* mcode(i) = .f. 091, 18, symtab(i) ** 99 +* margs(i) = .f. 045, 08, symtab(i) ** 100 ..s10 101 102 .+s20. 103 +* name(i) = .f. 001, 18, symtab(i) ** 104 +* lex_typ(i) = .f. 019, 08, symtab(i) ** 105 +* key_val(i) = .f. 027, 08, symtab(i) ** 106 +* lit_val(i) = .f. 037, 08, symtab(i) ** 107 +* link(i) = .f. 055, 18, symtab(i) ** 108 +* morg(i) = .f. 073, 18, symtab(i) ** 109 +* mcode(i) = .f. 091, 18, symtab(i) ** 110 +* margs(i) = .f. 045, 08, symtab(i) ** 111 ..s20 112 113 114$ the array 'heads' is used to hold the head of each clash list. 115$ its dimension must be a prime. 116 117 +* heads_lim = 509 ** 118 119 size heads(ps); 120 dims heads(heads_lim); 121 data heads = 0(heads_lim); 122 123$ constants for lexical types 124$ --------------------------- 125 126 .=zzyorg z 127 128 defc(l_name) $ names 129 defc(l_bold) $ bold (operator) name 130 defc(l_int) $ integer denotation 131 defc(l_real) $ real denotation 132 defc(l_string) $ string denotation 133 defc(l_delim) $ delimiters 134 defc(l_dkey) $ declaration statement keyword 135 defc(l_modekey) $ system defined modes 136 defc(l_btkey) $ base types (local, remote, ...) 137 defc(l_stkey) $ statement keyword 138 defc(l_rwkey) $ read/write keywords 139 defc(l_rkey1) $ libraries, reads, writes 140 defc(l_rkey2) $ imports, exports 141 defc(l_bin) $ system defined binary operators 142 defc(l_from) $ from, etc. 143 defc(l_un) $ system defined unary operators 144 defc(l_unbin) $ system defined unary/binary operators 145 defc(l_dots) $ two or more dots 146 defc(l_debug) $ compiler debugging options 147 defc(l_trace) $ user debugging options 148 149 +* l_min = l_name ** 150 +* l_max = l_trace ** 151 152$ standard symbols 153$ ---------------- 154 155$ the first few symbol table entries are used for standard literals 156$ such as '.macro', '.endm', etc. these locations are identified by 157$ a series of macros sym_xxx. 158 159 .=zzyorg z 160 161 defc(sym_macro) $ keyword macro 162 defc(sym_endm) $ keyword endm 163 defc(sym_drop) $ keyword drop 164 defc(sym_semi) $ semicolon 165 defc(sym_comma) $ comma 166 defc(sym_lp) $ left parenthesis 167 defc(sym_rp) $ right parenthesis 168 defc(sym_asn) $ assignment operator 169 defc(sym_then) $ keyword then 170 defc(sym_end) $ keyword end smfb 16 defc(sym_do) $ keyword do 171 172 end nameset; 173 174 175 nameset names; $ names 176 $ ----- 177 178 +* names_lim = $ dimension of names 179 .+s10 1500 180 .+s20 1500 181 .+r32 65536 182 .+s66 1024 183 ** 184 185 size names(ws); 186 dims names(names_lim); 187 188 size namesp(ps); $ pointer to last names entry 189 data namesp = 0; 190 191$ the following fields access the slen and sorg fields of names 192$ entries: 193 194 +* n_slen(p) = slen names(p) ** 195 +* n_sorg(p) = sorg names(p + (.sl.+1) / ws) ** 196 197 end nameset; 198 199 200 201 nameset polish; $ declarations for polish string 202 $ ------------------------------ 203 204 +* polsz = 16 ** $ size of polish 205 .+r32 +* polsz = 32 ** $ size of pol must be 2+log2(name_ 206 +* pol_lim = 3000 ** $ dimension of polish string 207 .+qp +* pol_lim = 10 ** $ dimension of (small) polish array 208 +* xpol_lim = 1000 ** $ dimension of auxiliary string 209 .+qp +* xpol_lim = 10 ** $ dimension of (small) xpolish arra 210 211 size polish(polsz); 212 dims polish(pol_lim); 213 214 size polp(ps); $ pointer to last entry 215 data polp = 0; 216 217 +* pol_typ(i) = .f. 01, 02, polish(i) ** 218 +* pol_val(i) = .f. 03, 14, polish(i) ** 219 .+r32 +* pol_val(i) = .f. 03, 30, polish(i) ** 220 221 222$ auxiliary polish string 223 size xpolish(polsz); 224 dims xpolish(xpol_lim); 225 226 size xpolp(ps); $ pointer to last entry 227 data xpolp = 0; 228 229 +* xpol_typ(i) = .f. 01, 02, xpolish(i) ** 230 +* xpol_val(i) = .f. 03, 14, xpolish(i) ** 231 .+r32 +* xpol_val(i) = .f. 03, 30, xpolish(i) ** 232$ polish string types 233 234 .=zzyorg z 235 236 defc0(pol_name) $ name 237 defc0(pol_count) $ counter 238 defc0(pol_mark) $ marker 239 defc0(pol_end) $ end-of-file marker 240 241 +* pol_min = pol_name ** $ minimum type 242 +* pol_max = pol_end ** $ maximum type 243 244$ the macro definitions for the marker codes p_xxx are produced 245$ by syn. 246 247 +* synimpmap(a, b) = macdef(a = b) ** 248 249 .=include synmac 250 .=include synimp 251 252 253$ utilities to read and write polish 254 255 +* putp(val, tp) = $ write node onto polish 256 .-qp if (polp = pol_lim) call puttab; 257 polp = polp + 1; 258 .-qp pol_typ(polp) = tp; 259 .-qp pol_val(polp) = val; 260 ** 261 262 263 +* putxp(val, tp) = $ write node onto xpolish 264 .-qp if (xpolp = xpol_lim) call putxtab; 265 xpolp = xpolp + 1; 266 .-qp xpol_typ(xpolp) = tp; 267 .-qp xpol_val(xpolp) = val; 268 ** 269 270 271 end nameset polish; 272 273 274 nameset macro; $ declarations for macro processor 275 $ -------------------------------- 276 277 +* mtab_lim = 3000 ** $ dimension of mtab 278 .+qp +* mtab_lim = 800 ** $ (small) dimension of mtab 279 280 size mtab(ps); 281 dims mtab(mtab_lim); 282 283 size mtabp(ps); $ pointer to last entry in mtab 284 data mtabp = 0; 285 smfb 17 +* mstack_lim = 1023 ** $ dimension of mstack 287 288 size mstack(ps); 289 dims mstack(mstack_lim); 290 291 size mstackp(ps); $ pointer to last mstack entry 292 data mstackp = 0; 293 smfb 18 +* astack_lim = 1023 ** $ dimension of astack 295 296 size astack(ps); 297 dims astack(astack_lim); 298 299 size astackp(ps); $ pointer to last entry 300 data astackp = 0; 301 302 +* param_lim = 26 ** $ maximum no. of formal parameters 303 304 size params(ps); $ array of formal parameters 305 dims params(param_lim); 306 307 size paramsp(ps); $ pointer to last parameter 308 data paramsp = 1; 309 310 size expand_buff(ps); $ 1 word buffer used by 'expand' 311 data expand_buff = 0; 312 313 314 $ biases for macro text encoding 315 316 +* bias_user = (symtab_lim + 1) ** $ user arguments 317 +* bias_gen1 = (bias_user + 100) ** $ first use of gen arg 318 +* bias_gen2 = (bias_gen1 + 100) ** $ next use of gen arg 319 320 end nameset; 321 322 323 nameset params; $ control card parameters 324 325 size upd_flag(1); $ update format source file sunb 12 size lcp_flag(1); $ listing control: program parameters sunb 13 size lcs_flag(1); $ listing control: program statistics 326 size list_flag(1); $ print source listing 327 size at_flag(1); $ produce titles automatically 328 size mlen_lim(ps); $ macro length limit 329 size pel(ps); $ parse error limit 330 size etok_lim(ps); $ no. of tokens printed after error msg 331 size m_flag(1); $ measurement flag 332 size opt_flag(1); $ global optimisation flag 333 size tp_flag(1); $ terminate after parsing 334 size cset_str(.sds. 10); $ source character set 335 size seqf_str(.sds. 10); $ sequence information format 336 size seqf_flag(2); $ sequence information code 337 338 $ options for parser debugging 339 size et_flag(1); $ dump tables after error 340 size pd_flag(1); $ dump polish 341 size sd_flag(1); $ dump symbol table 342 size mt_flag(1); $ trace macro processor 343 size pt_flag(1); $ trace parser 344 size tt_flag(1); $ trace tokens 345 346 size in_title(.sds. filenamlen); $ input file 347 size out_title(.sds. filenamlen); $ output file 348 size pol_title(.sds. filenamlen); $ polish file 349 size xpol_title(.sds. filenamlen); $ auxiliary polish file 350 size ssm_title(.sds. filenamlen); $ setl source map 351 size term_title(.sds. filenamlen); $ terminal file/device 352 353 end nameset; 354 355 nameset misc; $ miscelaneous 356 357 size error_count(ps); $ number of errors 358 data error_count = 0; 359 360 size line_no(ps); $ current line number 361 data line_no = 0; 362 363$ we keep three statement counters: 364$ 365$ cstmt_count: the cummulative statement count 366$ ustmt_count: the cummulative statement count at the start of the 367$ current unit 368$ estmt_count: the cummulative statement count for the first code 369$ producing instruction (currently not used in stlprs) 370$ 371$ the current statement number equals cstmt_count - ustmt_count + 1. 372 373 +* stmt_count = (cstmt_count - ustmt_count + 1) ** 374 375 size cstmt_count(ps); data cstmt_count = 0; 376 size ustmt_count(ps); data ustmt_count = 0; 377 size estmt_count(ps); data estmt_count = 0; 378 379 size eof_flag(1); $ set when end of file seen 380 data eof_flag = no; 381 382 size eor_flag(1); $ set when new record is read 383 data eor_flag = no; 384 385 size is_listed(1); $ on if current line has been listed 386 size is_written(1); $ current line has been written to src 387 data is_listed = yes; $ zero-th line has been listed 388 data is_written = yes; $ file - zero'th line has been written 389 390 end nameset; 391 392 393$ the setl parser supports the following charcter sets, as selected 394$ by the 'cset' control card parameter: 395$ 396$ cset=por: the portable character set. this set uses '<<' and '>>' 397$ for set braces, '(/' and '/)' for tuple brackets, and 398$ provides no single character alternates for any keyword. 399$ 400$ cset=ext: the extended character set. this set contains the por- 401$ table character set as a proper subset, and specifies 402$ an installation-dependend set of abbreviations for setl 403$ characters and symbols, such as e.g. set braces. 404$ 405 406 size char_set(ps); $ source character set 407 408$ character set options: 409 410 .=zzyorg z 411 412 defc(cset_por) $ portable character set 413 defc(cset_ext) $ extended character set asca 9 .+ascebc. asca 10 defc(cset_asc) $ for ascii text without conversion asca 11 ..ascebc 414 415$ the string 'altchars' is used to store the one character 416$ alternatives for setl symbols. the table 'altchar_tab' maps 417$ alternate symbols to the proper symbol table entry. 418 419 +* altchar_max = 10 ** $ maximum number of alternates 420 421 size altchars(.sds. altchar_max); 422 data altchars = '' .pad. altchar_max; 423 424 size altchar_tab(ps); 425 dims altchar_tab(altchar_max); 426 427 428$ debugging codes 429$ --------------- 430 431 .=zzyorg z 432 433 defc(dbg_ptrm0) $ disable/enable macro processor trace 434 defc(dbg_ptrm1) 435 defc(dbg_ptrp0) $ disable/enable parser trace 436 defc(dbg_ptrp1) 437 defc(dbg_ptrt0) $ disable/enable token trace 438 defc(dbg_ptrt1) 439 defc(dbg_prsod) $ open-token dump 440 defc(dbg_prspd) $ polish and xpolish string dumps 441 defc(dbg_prssd) $ symbol table dump 442 443 +* dbg_min = dbg_ptrm0 ** 444 +* dbg_max = dbg_prssd ** 445 446 447 +* prec_un = 12 ** $ precedence for unary operators 448 +* prec_ubin = 07 ** $ precedence for user defined binaries 449 450 451$ file numbers 452$ ------------ 453 454 .=zzyorg z 455 456 defc(in_file) $ input file 457 defc(out_file) $ output file 458 defc(pol_file) $ polish string file 459 defc(xpol_file) $ auxiliary polish file 460 defc(syn_file) $ tables from syn 461 .+sq1 462$ the setl optimiser needs the setl binary interface to communicate with 463$ other compiler phases. thus we can check already here whether we will 464$ run into problems later. furthermore, the optimiser likes to produce 465$ an annotated source listing: we generate here a map from cummulative 466$ statement numbers to tuples of source lines, in setl binary format, 467$ and write it to the file src. 468 469 defc(ssm_file) $ setl source map 470 471 .=include binio 472 473 size putbhdrblk(ws); $ binary header block 474 475 +* putbhdr(t, v) = $ write binary header block 476 putbhdrblk = 0; 477 bh_typ_ putbhdrblk = t; 478 bh_val_ putbhdrblk = v; 479 write ssm_file, putbhdrblk; 480 ** 481 482 +* putbdat(v) = $ write one word binary data block 483 write ssm_file, v; 484 ** 485 ..sq1 486 487 488 nameset nsptab; $ nameset with parse table. 489 size pt(ws); $ interpretable code 490 dims pt(parsearamax); 491 end nameset nsptab; 492 493 494 call prsini; 495 496 call parse; 497 498 call prstrm(no); 499 500 .-qp. 501 .+s66 end subr start; 502 .+r32 end prog stlprs; suna 14 .+r36 end prog stlprs; suna 15 .+qp. suna 16 end prog stlprs; 505 ..qp 1 .=member prsini 2 subr prsini; 3 4$ this is the main initialization routine. we begin by reading 5$ the control card parameters and opening various files, then 6$ call various separate routines to initialize symtab. 7 8 9 size termh_flag(1); $ print phase header on the terminal 10 11 size rc(ws); $ return code from namesio 12 13 14 15 size hash(ps); $ hashing function 16 size timestr(.sds. 30); $ current time 17 18 19 in_title = ''; 20 call namesio(in_file, rc, in_title, filenamlen); $ input file 21 if (rc > 1) in_title = ''; 22 23 out_title = ''; 24 call namesio(out_file, rc, out_title, filenamlen); $ output 25 if (rc > 1) out_title = ''; 26 27 term_title = ''; 28 call namesio(max_no_files, rc, term_title, filenamlen); $ terminal 29 if (rc > 1) term_title = ''; $ has max file number 30 31 32 .+s10. 33 call getspp(pol_title, 'pol=pol/pol'); $ polish file 34 call getspp(xpol_title, 'xpol=xpol/xpol'); $ aux. string 35 ..s10 36 .+s20. 37 call getspp(pol_title, 'pol=pol/pol'); $ polish file 38 call getspp(xpol_title, 'xpol=xpol/xpol'); $ aux. string 39 ..s20 40 41 .+s32. 42 call getspp(pol_title, 'pol=pol.tmp/'); $ polish file 43 call getspp(xpol_title, 'xpol=xpol.tmp/'); $ aux. string 44 call getspp(ssm_title, 'ssm=/ssm.tmp'); $ setl source map 45 ..s32 46 47 .+s37cms. 48 call getspp(pol_title, 'pol=pol/pol'); $ polish file 49 call getspp(xpol_title, 'xpol=xpol/xpol'); $ aux. string 50 call getspp(ssm_title, 'ssm=/ssm'); $ setl source map 51 ..s37cms 52 .+s37mts. 53 call getspp(pol_title, 'pol=-setlpol/'); $ polish file 54 call getspp(xpol_title, 'xpol=-setlxpol/'); $ aux. polish file 55 call getspp(ssm_title, 'ssm=/-setlssm'); $ setl source map 56 ..s37mts 57 .+s47. 58 call getspp(pol_title, 'pol=pol/pol'); $ polish file 59 call getspp(xpol_title, 'xpol=xpol/xpol'); $ aux. string 60 call getspp(ssm_title, 'ssm=/ssm'); $ setl source map 61 ..s47 62 63 .+s66. 64 call getspp(pol_title, 'pol=pol/pol'); $ polish file 65 call getspp(xpol_title, 'xpol=xpol/xpol'); $ aux. string 66 ..s66 suna 17 suna 18 .+s68. suna 19 call getspp(pol_title, 'pol=setl.pol/'); $ polish file suna 20 call getspp(xpol_title, 'xpol=setl.xpol/'); $ aux. polish file suna 21 call getspp(ssm_title, 'ssm=/setl.ssm'); $ setl source map suna 22 ..s68 67 68 69 call getipp(list_flag, 'list=0/1'); $ listing control 70 call getipp(etok_lim, 'etoks=5/5'); $ #toks printed after error 71 call getipp(pel, 'pel=1000/1000'); 72 call getipp(mlen_lim, 'mlen=1000/1000'); $ macro length 73 call getipp(tp_flag, 'tp=0/0'); $ terminate after parsing 74 call getipp(pt_flag, 'pt=0/1'); $ parser trace 75 call getipp(mt_flag, 'mt=0/1'); $ macro processor trace 76 call getipp(et_flag, 'et=0/1'); $ error trace 77 call getipp(tt_flag, 'tt=0/1'); $ token trace 78 call getipp(sd_flag, 'sd=0/1'); $ symtab dump 79 call getipp(pd_flag, 'pd=0/1'); $ polish dump 80 call getipp(at_flag, 'at=0/1'); $ auto title 81 call getipp(m_flag, 'meas=0/1'); $ measurements 82 call getipp(opt_flag, 'opt=0/1'); $ global optimisation 83 call getipp(upd_flag, 'upd=0/1'); $ upd format input 84 call getspp(cset_str, 'cset=ext/por'); $ source character set 85 call getspp(seqf_str, 'seq=ls/'); $ sequencing 86 call getipp(termh_flag, 'termh=0/1'); $ print phase header sunb 14 sunb 15 sunb 16 .-s68. sunb 17 .-s47. sunb 18 .-s32u. sunb 19 call getipp(lcp_flag, 'lcp=1/1'); $ list program parameters sunb 20 call getipp(lcs_flag, 'lcs=1/1'); $ list program statistics sunb 21 .+s32u. sunb 22 call getipp(lcp_flag, 'lcp=0/1'); $ list program parameters sunb 23 call getipp(lcs_flag, 'lcs=0/1'); $ list program statistics sunb 24 ..s32u sunb 25 .+s47. sunb 26 call getipp(lcp_flag, 'lcp=0/1'); $ list program parameters sunb 27 call getipp(lcs_flag, 'lcs=0/1'); $ list program statistics sunb 28 ..s47 sunb 29 .+s68. sunb 30 call getipp(lcp_flag, 'lcp=0/1'); $ list program parameters sunb 31 call getipp(lcs_flag, 'lcs=0/1'); $ list program statistics sunb 32 ..s68 sunb 33 87 88 if mt_flag then $ trace macro processor 89 monitor entry, limit = 10000; 90 else 91 monitor noentry; 92 end if; 93 94 seqf_flag = 0; 95 if ( 'l' .in. seqf_str) seqf_flag = 1; 96 if ( 's' .in. seqf_str) seqf_flag = seqf_flag + 2; 97 98 call opninc('', 0, ' .copy ', upd_flag); $ ltl inclusion facility 99 100 .+qp pol_title = '0'; xpol_title = '0'; $ set to null for quick parse 101 102 file pol_file access = write, title = pol_title; 103 file xpol_file access = write, title = xpol_title; 104 .+sq1. 105 if slen ssm_title then 106 file ssm_file access = write, title = ssm_title; 107 end if; 108 .-sq1. 109 if opt_flag then 110 put ,skip; $ emit blank line 111 call contlpr(27, yes);$ echo to the terminal 112 put ,'*** setl optimiser not available ***' ,skip; 113 call contlpr(27, no); $ stop to echo to the terminal 114 call ltlfin(1, 0); $ abnormally terminate 115 end if; 116 ..sq1 117 118 .+s66. 119 rewind pol_file; rewind xpol_file; 120 ..s66 121 122 if (.len. term_title) call opnterm(term_title); 123 124 if cset_str .seq. 'por' then char_set = cset_por; 125 elseif cset_str .seq. 'ext' then char_set = cset_ext; asca 12 .+ascebc. asca 13 elseif cset_str .seq. 'asc' then char_set = cset_asc; asca 14 ..ascebc 126 else char_set = cset_ext; 127 end if; 128 129 $ initialize listing control 130 call contlpr( 6, yes); $ start paging 131 call contlpr( 7, yes); $ enable titling 132 call lstime(timestr); $ get current time 133 call etitlr(0, 'cims.setl.' .cc. prog_level, 1, 0); 134 call etitlr(0, timestr, 41, 0); 135 call etitlr(0, 'page', 71, 0); 136 call contlpr( 8, 76); $ set page number in column 76 137 call contlpr(13, 0); $ set number of current page sunb 34 call contlpr(10, rc); $ get lines per page sunb 35 call contlpr(15, rc); $ set line number within page 138 139 sunb 36 if lcp_flag then $ print initial phase heading sunb 37 put ,'parameters for this compilation: ' ,skip 142 ,skip ,'source file: i = ' :in_title ,a ,'. ' 143 ,skip ,'listing file: l = ' :out_title ,a ,'. ' 144 ,skip ,'polish string file: pol = ' :pol_title ,a ,'. ' 145 ,skip ,'auxiliary string file: xpol = ' :xpol_title ,a ,'. ' 146 ,skip ,'setl source map: ssm = ' :ssm_title ,a ,'. ' 147 ,skip ,'list directives: list = ' :list_flag ,i ,', ' 148 ,'at = ' :at_flag ,i ,'. ' 149 ,'seq = ' :seqf_str ,a ,'. ' 150 ,skip ,'character set: cset = ' :cset_str ,a ,'. ' 151 ,skip ,'parse error limit: pel = ' :pel ,i ,'. ' 152 ,'parse error file: term = ' :term_title ,a ,'. ' 153 ,skip ,'global optimisation: opt= ' :opt_flag ,i ,'. ' 154 ,'measurements: meas = ' :m_flag ,i ,'. ' 155 ,skip 156 ,skip; sunb 38 end if; 157 158 159 if termh_flag then 160 $ the following line is printed on the terminal file only 161 call contlpr(26, no); call contlpr(27, yes); 162 put ,' start cims.setl.' ,prog_level :timestr ,a ,skip; 163 call contlpr(26, yes); call contlpr(27, no); 164 end if; 165 166 167$ next we call three routines to initialize reserved names, keywords, 168$ and literals. the order of these calls is important, since we assume 169$ that some symbols will always have the same position in symtab. 170 171 call initstd; 172 call initkey; 173 call initlit; 174 175$ initialize trulex 176 call initlex; 177 178 call initparse; $ initialize parse 179 180 181 end subr prsini; 1 .=member initstd 2 subr initstd; 3 4$ this routine initializes standard symbols used by the macro smfb 19$ processor, such as 'macro'. these must be the first entries 6$ in symtab. 7 8 9 size j(ps); $ loop index 10 size sym(ps); $ symbol table pointer 11 12 size syms(.sds. 5); $ symbols to be hashed into symtab smfb 20 dims syms(11); 14 data syms = 'macro', 'endm', 'drop', 15 ';', ',', '(', ')', ':=', smfb 21 'then', 'end', 'do'; 17 18 size hashlit(ps); $ hashes symbol into symbol table 19 20 smfb 22 do j = 1 to 11; 22 sym = hashlit(syms(j)); 23 end do; 24 25 26 end subr initstd; 1 .=member initkey 2 subr initkey; 3 4$ this routine initializes all the special lexical classes used 5$ by the parser. it does this through a series of calls to the 6$ lower level routine 'setkey'. each call to setkey processes 7$ a dozen or so literals. 8 9$ setkey has three arguments, which are passed globally: 10 11$ the actual initialization is done by 'setkey', which has 12$ two global inputs: 13 14$ key_lext: gives the lex_typ of the tokens being initialized. 15$ key_string: a character string which drives the initiliation. 16 17$ key_string is a character string consisting of alternate integers 18$ and keywords, separated by blanks. each integer indicates the 19$ key_val of the keyword which follows it. a plus sign where an 20$ integer is expected indicates that we should use the previous 21$ integer, incremented by 1. 22 23 24 nameset nsinitkey; 25 26 size key_lext(ps), 27 key_string(.sds. 80); 28 29 end nameset nsinitkey; 30 31$ we use the following macro to call setkey 32 +* set(s) = 33 key_string = s; 34 call setkey; 35 ** 36 37 38 $ declaration keywords 39 key_lext = l_dkey; 40 set('1 const + var + init ') 41 42 $ keywords for system defined modes 43 key_lext = l_modekey; 44 set('1 general + integer + real + string + boolean + atom ') 45 set('7 error + elmt + tuple + set + smap + map + mmap ') 46 47 $ keywords for base types 48 key_lext = l_btkey; 49 set('1 local + remote + sparse + packed + untyped ') 50 51 $ keywords for statements 52 key_lext = l_stkey; 53 set('1 assert + case + continue + debug + exit ') 54 set('6 fail + goto + if + loop + notrace + quit ') 55 set('12 return + stop + succeed + trace + yield ') 56 57 $ keywords for read/write options 58 key_lext = l_rwkey; 59 set('1 rd + wr + rw ') 60 61 $ keywords for 'rights' list 62 key_lext = l_rkey1; 63 set('1 libraries + reads + writes ') 64 65 key_lext = l_rkey2; 66 set('1 imports + exports ') 67 68 $ system defined binary operators 69 key_lext = l_bin; 70 set('1 impl 2 or 3 and 5 in 5 notin 5 incs 5 subset ') 71 set('5 < 5 <= 5 > 5 >= 5 = 5 /= ') 72 set('7 with 7 less 7 lessf 7 npow 7 min 7 max ') 73 set('10 * 10 / 10 div 10 mod 10 ? 10 atan2 11 ** ') 74 $ 75 key_lext = l_from; 76 set('7 from 7 fromb 7 frome '); 77 78 $ system defined unary operators 79 key_lext = l_un; 80 set('4 not 6 even 6 odd 6 is_integer 6 is_real 6 is_string ') 81 set('6 is_boolean 6 is_atom 6 is_tuple ') 82 set('6 is_set 6 is_map 12 arb 12 domain 12 range 12 pow 12 # ') 83 set('12 abs 12 char 12 ceil 12 floor 12 fix 12 float ') 84 set('12 sin 12 cos 12 tan 12 asin 12 acos 12 atan 12 tanh ') 85 set('12 exp 12 log 12 sqrt 12 random 12 sign 12 type ') 86 set('12 str 12 val ') 87 88 $ system defined operators which are both unary and binary. 89 $ their precedence is the precedence of the binary operator. 90 key_lext = l_unbin; 91 set('9 + 9 - ') 92 93 $ compiler debugging options 94 key_lext = l_debug; 95 set('1 ptrm0 + ptrm1 + ptrp0 + ptrp1 + ptrt0 + ptrt1 ') 96 set('7 prsod + prspd + prssd ') 97 set('10 stre0 + stre1 + strs0 + strs1 + sq1cd + sq1sd + scstd ') 98 set('17 cq1cd + cq1sd + cq2cd ') 99 set('20 rtre0 + rtre1 + rtrc0 + rtrc1 ') 100 set('24 rtrg0 + rtrg1 + rgcd0 + rgcd1 + rdump + rgarb ') 101 102 $ user trace options 103 key_lext = l_trace; 104 set('1 calls + statements ') 105 106 macdrop(set); 107 108 109 end subr initkey; 1 .=member setkey 2 subr setkey; 3 4$ this routine initializes a single group of keywords. its 5$ arguments are passed globally as described in 'initkey'. 6 7$ the principle argument is a character string 'key_string'. 8$ key_string consists of a series of tokens separated by a 9$ single blank. numbering tokens from right to left, each odd 10$ token is either: 11 12$ 1. an integer giving the key_val of the keyword which 13$ immeditely follows it. 14 15$ 2. a plus sign, indicating that the new key_val is one more 16$ than that of the previous keyword. 17 18$ the even numbered tokens are the keywords themselves. 19 20$ we loop along the string, finding each pair of tokens, 21$ and setting the key_val of each keyword. 22 23 24 size len(ps), $ length of key_string 25 val(ws), $ value of integer 26 first(ps), $ points to first character of token 27 last(ps), $ points to blank after token 28 ch(cs), $ current character 29 keyw(.sds. 20), $ current keyword 30 sym(ps); $ symbol table pointer to keyword 31 32 size hashlit(ps); $ hashing function 33 34 access nsinitkey; 35 36 37 first = 1; $ start of first token 38 val = 0; $ initial keyword value 39 40 len = .len. key_string; 41 42 while first < len; 43 44 ch = .ch. first, key_string; 45 46 if ch = 1r+ then 47 val = val + 1; 48 first = first + 1; $ point to blank 49 50 else $ convert integer 51 val = 0; 52 53 while ch ^= 1r ; 54 val = 10 * val + digofchar(ch); 55 first = first + 1; 56 ch = .ch. first, key_string; 57 end while; 58 end if; 59 60$ first now points to the blank. find the keyword. 61 62 first = first + 1; $ point to start of keyword 63 last = first + 1; $ point to next character 64 65$ look for next blank 66 while .ch. last, key_string ^= 1r ; 67 last = last + 1; 68 end while; 69 70$ 'last' now points to the blank after the keyword 71 72 keyw = .s. first, last-first, key_string; 73 sym = hashlit(keyw); 74 75 lex_typ(sym) = key_lext; 76 key_val(sym) = val; 77 78 first = last + 1; $ start of next token. 79 end while; 80 81 82 end subr setkey; 1 .=member initlit 2 subr initlit; 3 4$ this routine initiailizes the literals contained in the grammar. 5$ we use the results of 'syn' to create two arrays, which in turn 6$ drive the initialization routine 7 8$ 1. lit_string this is an array containing the actual literals 9 10$ 2. lit_code this is an array containing the corresponding 11$ literal code lit_xxx. 12 13$ we iterate over the two arrays, hashing in each string and setting the 14$ lit_val of the corresponding entry. 15 16$ note that there are more entries in lit_string than literals 17$ in the grammar. this is because certain terminal symbols of the 18$ grammar can be represented by one of several tokens. 19 20 21 .=include synmax $ this includes the macro parselitmax 22 23 size lit_string(.sds. 30); 24 dims lit_string(parselitmax); 25 26 size lit_code(ps); 27 dims lit_code(parselitmax); 28 29 size hashlit(ps); $ hashing function 30 31 .=zzyorg a $ counter for initilizing lit_string and lit_code 32 33 +* synlitmap(string, code) = 34 data lit_string(zzya) = string; 35 data lit_code(zzya) = code; 36 ** 37 38 .=include 'synlit' $ include synlitmap 39 40 41 size j(ps), $ loop index 42 lit(.sds. 30), $ current literal 43 p(ps); $ symbol table pointer to literal 44 45 46 do j = 1 to parselitmax; 47 lit = lit_string(j); 48 49 .+s66 if (lit = 2q::) lit = 1q: .cc. 1q:; 50 51 p = hashlit(lit); 52 53 lit_val(p) = lit_code(j); 54 end do; 55 56 57 end subr initlit; 1 .=member inparse 2 subr initparse; 3 4$ this routine defines the data structures for the parser. the 5$ parser can be thought of as an interpreter for an abstract 6$ parsing machine. the interpertable code is contained in the 7$ table 'pt', and has the following fields: 8 9$ parse_op an opcode po_xxx 10$ parse_param a single integer parameter 11$ parse_parm1 a subfield of parse_param 12$ parse_parm2 another subfield of parse_param 13 14$ the parse machine has two registers, represented by the array 15$ parsereg. the first two registers have special purposes: 16 17$ reg. 1 indicates success of last parser operation 18$ reg. 2 contains key_val of last keyword recognized. 19 20$ the variable 'parsep' points to the current instruction in 'pt'. 21 22$ the parse machine contains several instructions which are 23$ essentially procedure calls. we handle procedure linkage 24$ through a stack called 'rstack' (r-eturn stack). its fields are 25 26$ r_is_sev $ indicates call from po_sev instruction 27$ r_clause $ starting address of desired clause 28$ r_return $ return address 29$ r_number $ number of clauses seen during operation 30 31 32$ 'parse' consists of an infinite loop with a case statement 33$ which branches on the current opcode. 34 35 36 nameset parsens; 37 38 +* pt_op(i) = .f. 01, 04, pt(i) ** 39 +* pt_parm(i) = .f. 05, 12, pt(i) ** 40 +* pt_parm1(i) = .f. 05, 03, pt(i) ** 41 +* pt_parm2(i) = .f. 08, 09, pt(i) ** 42 43 size parsep(ps); $ location counter 44 data parsep = 1; 45 46 size parsereg(ps); $ registers 47 dims parsereg(8); 48 data parsereg = 0(8); 49 50 +* parse_ok = parsereg(1) ** $ condition code 51 +* parse_tok = parsereg(2) ** $ last token 52 53 +* rstack_sz = $ size of recursion stack 54 .+s10 36 55 .+s20 36 smfb 23 .+r32 64 59 .+s66 60 60 ** 61 62 +* rstack_lim = 100 ** $ dimension of rstack 63 64 size rstack(rstack_sz); $ recursion stack 65 dims rstack(rstack_lim); 66 67 size rstackp(ps); $ pointer to top of rstack 68 data rstackp = 0; 69 70 .+s10. 71 +* r_is_sev(i) = .f. 01, 01, rstack(i) ** 72 +* r_clause(i) = .f. 02, 11, rstack(i) ** 73 +* r_return(i) = .f. 13, 11, rstack(i) ** 74 +* r_number(i) = .f. 25, 08, rstack(i) ** 75 ..s10 76 77 .+s20. 78 +* r_is_sev(i) = .f. 01, 01, rstack(i) ** 79 +* r_clause(i) = .f. 02, 11, rstack(i) ** 80 +* r_return(i) = .f. 13, 11, rstack(i) ** 81 +* r_number(i) = .f. 25, 08, rstack(i) ** 82 ..s20 83 84 smfb 24 .+r32. 86 +* r_is_sev(i) = .f. 01, 01, rstack(i) ** 87 +* r_clause(i) = .f. 17, 16, rstack(i) ** 88 +* r_return(i) = .f. 33, 16, rstack(i) ** 89 +* r_number(i) = .f. 49, 16, rstack(i) ** smfb 25 ..r32 104 105 .+s66. 106 +* r_is_sev(i) = .f. 01, 01, rstack(i) ** 107 +* r_clause(i) = .f. 02, 11, rstack(i) ** 108 +* r_return(i) = .f. 13, 11, rstack(i) ** 109 +* r_number(i) = .f. 25, 08, rstack(i) ** 110 ..s66 111 112 113 +* r_num_lim = $ limit for r_number 114 .+s10 255 115 .+s20 255 smfb 26 .+r32 65535 119 .+s66 255 120 ** 121 122 123$ the parser has the following opcodes: 124 125 .=zzyorg z 126 127 defc(po_act) $ call separate routine 128 defc(po_bak) $ return from interpreter call 129 defc(po_err) $ emit error message 130 defc(po_jif) $ branch on parse_ok = no. 131 defc(po_jmp) $ jump 132 defc(po_lex) $ find lexical type 133 defc(po_lit) $ find literal 134 defc(po_set) $ set register 135 defc(po_sev) $ call to find several clauses 136 defc(po_sub) $ call to find 1 clause 137 defc(po_skip) $ branch on keyword 138 defc(po_save) $ save parser status 139 defc(po_nosave) $ pop saved status 140 defc(po_restore) $ restore saved status 141 defc(po_m) $ emit marker 142 143 +* po_min = po_act ** 144 +* po_max = po_m ** 145 146 end nameset; 147 148 149 nameset backup; smfb 27 150$ this nameset contains the variables which implement backtracking 151$ in the parser. these include: 152 153$ 1. sstack: 154 155$ this is a stack for saving parser states. its fields are 156 157$ s_polp: saves pointer to polish string 158$ s_xpolp: saves pointer to auxiliary string 159$ s_tout: saves pointer to token buffer 160$ s_stat: saves statement counter 161 162$ 2. tok_buff: 163 164$ this is a circular buffer used to save tokens. its related 165$ variables are: 166 167$ tok_in: points to last token added to buffer 168$ tok_out: points to last token returned to parser 169 170$ tok_buff is manipulated by the scanner-interface routine 'gettok'. 171 172 +* sstack_sz = $ size of string 177 .+s10 72 178 .+s20 72 smfb 28 .+r32 128 smfb 29 .+s66 60 179 ** 180 181 +* sstack_lim = 50 ** $ number of saved states 182 183 size sstack(sstack_sz); 184 dims sstack(sstack_lim); 185 186 size sstackp(ps); 187 data sstackp = 0; 188 189 .+s66. 191 +* s_polp(i) = .f. 01, 15, sstack(i) ** 192 +* s_xpolp(i) = .f. 16, 15, sstack(i) ** 193 +* s_tout(i) = .f. 31, 15, sstack(i) ** 194 +* s_stat(i) = .f. 46, 15, sstack(i) ** 195 ..s66 204 smfb 30 .+r32. 207 +* s_polp(i) = .f. 01, 32, sstack(i) ** 208 +* s_xpolp(i) = .f. 33, 32, sstack(i) ** 209 +* s_tout(i) = .f. 65, 32, sstack(i) ** 210 +* s_stat(i) = .f. 97, 32, sstack(i) ** smfb 31 ..r32 221 222 .+s10. 224 +* s_polp(i) = .f. 01, 18, sstack(i) ** 225 +* s_xpolp(i) = .f. 19, 18, sstack(i) ** 226 +* s_tout(i) = .f. 37, 18, sstack(i) ** 227 +* s_stat(i) = .f. 55, 18, sstack(i) ** 228 ..s10 229 230 .+s20. 232 +* s_polp(i) = .f. 01, 18, sstack(i) ** 233 +* s_xpolp(i) = .f. 19, 18, sstack(i) ** 234 +* s_tout(i) = .f. 37, 18, sstack(i) ** 235 +* s_stat(i) = .f. 55, 18, sstack(i) ** 236 ..s20 237 238 239 240 +* tok_lim = 500 ** $ maximum saved tokens 241 242 size tok_buff(ps); 243 dims tok_buff(tok_lim); 244 245 size tok_in(ps); 246 data tok_in = 0; 247 248 size tok_out(ps); 249 data tok_out = 0; 250 end nameset; 251 252 size j(ps); $ loop index for initialization 253 254 call ptinit; $ initialize parse table. 255 256$ initialize parsep and parse_ok 257 parsep = 1; 258 parse_ok = yes; 259 260 261 end subr initparse; 1 .=member ptinit 2 3 4 .-defenv_ptinit. 5 6 7 subr ptinit; 8 9$ initialize parse table by reading it in. 10 11$ the default version of ptinit reads in the parse table. 12$ the parse table for setl is too large to permit compiling 13$ the data statements that would be required to initialize 14$ the table due to limitations of the little compiler. 15$ however, some systems may be able to use the option of the 16$ syn program by which the table is represented in form that 17$ permits initialization by assembly language data statements, 18$ and such systems can use defenv_ptinit option to avoid need 19$ to read in parse table. 20 21 22 size syn_title(.sds.30); $ title for 'syn' file. 23 24 .+s66 call getspp(syn_title, 'syn=synbin/synout'); 25 .+s37 call getspp(syn_title, 'syn=synbin/synout'); 26 .+s47 call getspp(syn_title, 'syn=synbin/synout'); 27 .+s32 call getspp(syn_title, 'syn=synbin.dat/synout.dat'); 28 .+s10 call getspp(syn_title, 'syn=synbin/synout'); 29 .+s20 call getspp(syn_title, 'syn=synbin/synout'); suna 23 .+s68 call getspp(syn_title, 'syn=synbin/'); 30 31 file syn_file 32 access = read, 33 title = syn_title; 34 35 rewind syn_file; 36 37 read syn_file, pt(1) to pt(parsearamax); 38 39 file syn_file access=release; 40 41 42 end subr ptinit; 43 44 45 ..defenv_ptinit 46 47 1 .=member parse 2 subr parse; 3 4$ this is the actual parser. its operation is described in 5$ initparse, above. 6 7 access parsens, backup; 8 9 size op(ps), $ current opcode 10 parm(ps), $ current parameter 11 i(ps), $ register number 12 j(ps), $ register value 13 tok(ps), $ token obtained from scanner 14 junk(ps); $ disgarded type code from polish 15 size tp(ps); $ type of current token 16 17 size gettok(ps); $ lexical scanner 18 19 20 while 1; $ main loop 21 22 op = pt_op(parsep); 23 parm = pt_parm(parsep); 24 25 if pt_flag then $ tracing parser 26 put, skip, column(7), 'parsep: ': parsep, i, 27 column(20), 'op: ': op, i, 28 column(30), 'parm: ': parm, i; 29 end if; 30 31 go to case(op) in po_min to po_max; 32 33 /case(po_lit)/ $ test for matching literal 34 35$ literals are identified by 36 37$ lit_val: should be equal to 'parm'. 38 39$ a 'find literal' instruction is always the first of three 40$ instructions 41 42$ 1. find literal 43$ 2. branch on failure 44$ 3. next action on success 45 46$ if we are successful, we increment parsep to point to (3). 47$ otherwise we increment it to point to (2). 48 49 parse_tok = gettok(0); 50 51 if lit_val(parse_tok) = parm & ^ eof_flag then 52 parse_ok = yes; 53 parsep = parsep + 2; 54 else 55 parse_ok = no; 56 57 tok_out = tok_out - 1; $ return token to buffer 58 if (tok_out = 0) tok_out = tok_lim; 59 60 parsep = parsep + 1; 61 end if; 62 63 cont; 64 65 66 /case(po_lex)/ $ find desired lexical type 67 68$ a tokens lexical type is given by its 'lex_typ' field. 69$ if the token has the proper type, we write it onto the polish 70$ string and advance parsep by 2. otherwise we return it and 71$ advance parsep by 1. 72 73 parse_tok = gettok(0); tp = lex_typ(parse_tok); 74 75 if ^ eof_flag & parm = tp then 76 parse_ok = yes; 77 78 putp(name(parse_tok), pol_name); 79 parsep = parsep + 2; 80 81 else 82 parse_ok = no; 83 84 tok_out = tok_out-1; $ return token 85 if (tok_out = 0) tok_out = tok_lim; 86 87 parsep = parsep + 1; 88 end if; 89 90 cont; 91 92 /case(po_sev)/ $ 'procedure calls' to find clauses 93 94 /case(po_sub)/ 95 96$ po_sev corresponds to the construct in the grammar, 97$ and seeks several instances of a clause. po_sub corresponds 98$ to and seeks a single instance. 99 100 countup(rstackp, rstack_lim, 'rstack'); 101 102 r_is_sev(rstackp) = (op = po_sev); 103 r_clause(rstackp) = parm; 104 r_return(rstackp) = parsep + 1; 105 r_number(rstackp) = 0; 106 107 parsep = parm; 108 cont; 109 110 111 /case(po_bak)/ $ return from or 112 113$ the action we take here depends on whether we are looking 114$ for one clause or several. 115 116$ if we are looking for one clause, we return unconditionally. if 117$ we are looking for several clauses, we take one of two actions: 118 119$ 1. if we have been successful so far, look for another clause. 120$ 2. otherwise write the number of clauses found onto the 121$ polish string and return. 122 123$ if rstack is empty, we have parsed a sentence symbol. if the 124$ input file is exhausted, we return; otherwise -chkprs- has done 125$ the necessary recovery, and we continue to parse. 126 127 if rstackp = 0 then 128 call chkprs; 129 if (eof_flag) return; 130 131 cont; 132 133 134 elseif r_is_sev(rstackp) then $ return from 135 136 if parse_ok then $ look for another 137 countup(r_number(rstackp), r_num_lim, ''); 138 parsep = r_clause(rstackp); 139 140 else $ write marker and return success 141 putp(r_number(rstackp), pol_count); 142 parse_ok = yes; 143 parsep = r_return(rstackp); 144 rstackp = rstackp-1; 145 end if; 146 147 else $ return from 148 parsep = r_return(rstackp); 149 if (parse_ok) parsep = parsep + 1; 150 rstackp = rstackp-1; 151 end if; 152 153 cont; 154 155 156 /case(po_err)/ $ issue error message 157 158$ if the condition code is true we merely advance. otherwise we 159$ call 'ermet', which prints the error message and resets the 160$ parser. 161 162 if parse_ok then 163 parsep = parsep + 1; 164 else 165 call ermet(parm); 166 end if; 167 168 cont; 169 170 171 /case(po_jif)/ $ conditional jump -lab 172 173 if parse_ok then 174 parsep = parsep + 1; 175 else 176 parsep = parm; 177 end if; 178 179 cont; 180 181 182 /case(po_jmp)/ $ go to +lab 183 184 parsep = parm; 185 cont; 186 187 188 /case(po_set)/ $ set register i to j 189 190 i = pt_parm1(parsep) + 1; 191 j = pt_parm2(parsep); 192 193 parsereg(i) = j; 194 195 parsep = parsep + 1; 196 cont; 197 198 199 /case(po_skip)/ $ branch on keyword 200 201 parsep = parsep + key_val(parse_tok); 202 cont; 203 204 205 /case(po_save)/ $ save parser status 206 207 countup(sstackp, sstack_lim, 'save'); 208 209 s_polp(sstackp) = polp; 210 s_xpolp(sstackp) = xpolp; 211 s_tout(sstackp) = tok_out; 212 s_stat(sstackp) = cstmt_count; 213 214 parsep = parsep + 1; 215 cont; 216 217 218 /case(po_nosave)/ $ disgard saved status 219 220 sstackp = sstackp-1; 221 parsep = parsep + 1; 222 cont; 223 224 225 /case(po_restore)/ $ restore parser to saved status 226 227 polp = s_polp(sstackp); 228 xpolp = s_xpolp(sstackp); 229 tok_out = s_tout(sstackp); 230 cstmt_count = s_stat(sstackp); 231 232 sstackp = sstackp-1; 233 234 parsep = parsep + 1; 235 cont; 236 237 238 /case(po_m)/ $ write marker 239 240 putp(parm, pol_mark); 241 parsep = parsep + 1; 242 cont; 243 244 245 /case(po_act)/ $ call generator routine 246 247 call actgen(parm); 248 parsep = parsep + 1; 249 cont; 250 251 252 end while; 253 254 end subr parse; 1 .=member actgen 2 subr actgen(parm); 3 4$ this routine performs generated actions specified in the 5$ grammar. the actual generator calls are included from the 6$ output of 'syn'. 7 8 size parm(ps); 9 10 +* pac = go to esac; ** $ utility macro 11 12 13 go to pa(parm) in 1 to parseactmax; 14 15 .=include 'synact' $ include generator calls 16 17/esac/ $ end of case 18 19 return; 20 21 end subr actgen; 1 .=member gettok 2 fnct gettok(dummy); 3 4$ this routine acts as an interface bewteen the scanner and the parser. 5$ it returns the next token in the input stream. 6 7$ gettok maintains a buffer of tokens obtained from the scanner. 8$ this is used to implement backtracking in the parser. 9 10 11$ the token buffer is circular. when we enter gettok we begin by 12$ checking whether the buffer is empty(in pointer = out pointer). 13$ if so, we get a token from the scanner and insert it in the buffer. 14 15$ once we know the buffer is non-empty, we merely return the first 16$ token in the buffer. 17 18$ at certain points in the parse, we must save the status of the parser. 19$ one of the variables saved is a pointer to the token buffer indicating 20$ the last token returned. every time we add a token to the buffer, we 21$ must make sure that we do not clobber the oldest token which is part 22$ of a saved state. 23 24 25 size tok(ps); $ current token 26 27 size gettok(ps), $ token returned 28 lexscan(ps), $ lower level scanner routine 29 symsds(sds_sz); $ gets token name 30 31 access backup; $ nameset containing token buffer 32 33 34 if tok_in = tok_out then $ buffer empty 35 tok_in = tok_in + 1; 36 if (tok_in > tok_lim) tok_in = 1; 37 38 if sstackp ^= 0 then $ saving parser states 39 if (tok_in = s_tout(1)) call overfl('tok_buff'); 40 end if; 41 42 tok_buff(tok_in) = 0; $ not yet found 43 tok = lexscan(0); $ get token from scanner 44 tok_buff(tok_in) = tok; 45 46 end if; 47 48 tok_out = tok_out + 1; $ remove token 49 if (tok_out > tok_lim) tok_out = 1; 50 51 gettok = tok_buff(tok_out); 52 53 if tt_flag then $ tracing tokens output by gettok 54 put, skip, column(7), 'gettok = ': symsds(gettok), a; 55 end if; 56 57 58 end fnct gettok; 1 .=member lexscan 2 fnct lexscan(dummy); 3 4$ this is the top level routine of the macro processor. it 5$ returns a token from the next routine down, intercepting 6$ macro occurrences. 7 8$ when we see a macro occurrence, we call 'initexp' which 9$ initiates expansion of the macro. 10 11 12 size lexscan(ps), $ value returned 13 absorb(ps); $ lower level scanner routine 14 15 16 while 1; 17 lexscan = absorb(0); 18 19 if eof_flag then $ end of file 20 return; 21 22 elseif morg(lexscan) = 0 then $ not macro 23 return; 24 25 else 26 call initexp(lexscan); $ collect arguments and try again 27 end if; 28 29 end while; 30 31 32 end fnct lexscan; 1 .=member initexp 2 subr initexp(nam); 3 4$ this routine is called to initiate expansion of the macro 'nam'. 5$ we begin by checking whther the macro has arguments. if not, 6$ we simply build a new stack frame. otherwise we collect the 7$ macros arguments by calling absorb, then build the stack 8$ frame. 9 10 11 size nam(ps); $ name of macro 12 13 size n(ps), $ number of arguments 14 nparen(ps), $ no of parens 15 tok(ps), $ current token 16 delim(ps), $ desired delimiter 17 j(ps); $ loop index 18 19 size args(ps); $ array of astack pointers 20 dims args(param_lim); 21 22 size absorb(ps); $ lower level scanner routine 23 24 25 n = margs(nam); $ get number of arguments 26 27 if n = 0 then $ no arguments 28 mstackp = mstackp+2; 29 if (mstackp > mstack_lim) call overfl('mstack'); 30 31 mstack(mstackp) = morg(nam); 32 mstack(mstackp-1) = 0; 33 34 return; 35 36 end if; 37 38 39$ otherwise collect the macros arguments. we do this by calling 40$ 'absorb' for tokens. as we get each token, we add it to the argument 41$ buffer 'astack'. we save a pointer to the first astack entry for 42$ each argument in an auxilliary table called 'args'. these pointers 43$ are later used to build the stack frame. 44 45 if (absorb(0) ^= sym_lp) call ermsg(1, 0); $ get leading '(' 46 47 do j = 1 to n; $ collect arguments 48 49 args(j) = astackp+1; $ save pointer to argument 50 nparen = 0; $ depth of parenthesis 51 52 while 1; 53 tok = absorb(0); 54 55 if eof_flag then $ end of file 56 call eofr; 57 58 elseif tok = sym_lp then $ left paren 59 nparen = nparen + 1; 60 61 elseif tok = sym_rp then $ right paren 62 if (nparen = 0) quit; $ end of argument list 63 nparen = nparen - 1; 64 65 elseif nparen = 0 & tok = sym_comma then $ end of argumen 66 quit; 67 end if; 68 69$ add token to astack. 70 countup(astackp, astack_lim, 'astack'); 71 astack(astackp) = tok; 72 end while; 73 74$ push a zero onto astack to indicate end of argument 75 countup(astackp, astack_lim, 'astack'); 76 astack(astackp) = 0; 77 78$ check that we found the right delimiter 79 if j = n then 80 delim = sym_rp; 81 else 82 delim = sym_comma; 83 end if; 84 85 if (delim ^= tok) call ermsg(2, nam); 86 end do; 87 88$ build stack frame 89 mstackp = mstackp + n + 2; 90 if (mstackp > mstack_lim) call overfl('mstack'); 91 92 mstack(mstackp) = morg(nam); 93 mstack(mstackp-1) = n; 94 95 do j = 1 to n; 96 mstack(mstackp-1-j) = args(j); 97 end do; 98 99 100 end subr initexp; 1 .=member absorb 2 fnct absorb(dummy); 3 4$ this is the second level of the macro processor. it passes along 5$ tokens, intercepting macro drops and definitions. 6 7 8 size absorb(ps), $ value returned 9 expand(ps); $ lower level scanner routine 10 11 12 while 1; 13 absorb = expand(0); $ obtain token 14 15 if eof_flag then $ end of file 16 return; 17 18 elseif absorb = sym_macro then 19 call getdef; 20 21 elseif absorb = sym_drop then 22 call getdrop; 23 24 else 25 quit; 26 end if; 27 28 end while; 29 30 31 end fnct absorb; 1 .=member getdef 2 subr getdef; 3 4$ this routine processes macro definitions. this is handled in three 5$ steps: 6 7$ 1. get the macro name and store it in the global 'new_mac'. 8 9$ 2. get the formal parameters and set the margs field of the 10$ macro. 11 12$ 3. get the macro text and set the morg field of the macro. 13 14 15 size mac(ps); $ macro name 16 17 size expand(ps); $ lower level scanner routine 18 19 20 mac = expand(0); 21 22 if eof_flag then $ end of file 23 call eofr; 24 25 elseif lex_typ(mac) > l_bold then $ illegal token type 26 call ermsg(3, mac); 27 return; 28 29 elseif morg(mac) ^= 0 then $ redefinition 30 call ermsg(4, mac); 31 end if; 32 33 call getparms(mac); $ get parameter list 34 call gettext(mac); $ get text 35 36 37 end subr getdef; 1 .=member getparms 2 subr getparms(mac); 3 4$ this routine collects the formal parameters for the macro 'mac'. 5$ we do this by calling 'expand' to get tokens and parsing the 6$ parameter list. as we recognize each parameter, we do three 7$ things: 8 9$ 1. check for duplicate parameters. 10$ 2. enter the parameter name in the array 'params'. 11$ 3. set the 'mcode' field of the parameter to indicate 12$ how the parameter will be represented in the macros text. 13 14$ once we have processed all the parameters, we set the 15$ margs field of the macro to indicate the number of 16$ arguments. 17 18$ the mcode field is used to indicate the representation of arguments 19$ in the macro text. the mcode field is initially zero, indicating 20$ straight text, and is reset while we are processing each macro 21$ definition. after we absorb the text we must reset the mcode 22$ field of each of the parameters to 0. 23 24 25 size mac(ps); $ macro name 26 27 size delim(ps), $ delimiter 28 nuser(ps), $ no. of user args 29 ngen(ps), $ no. of generated arguments 30 semi_seen(1), $ on if ';' has been seen 31 tok(ps), $ current token 32 param(ps); $ current parameter 33 34 size expand(ps); $ lower level scanner function 35 36 37$ we begin by checking whether the macro name is followed by a 38$ semi-colon. if so, there are no arguments. 39 40 delim = expand(0); 41 42 if eof_flag then $ end of file 43 call eofr; 44 45 elseif delim = sym_semi then $ macro with no arguments 46 margs(mac) = 0; 47 paramsp = 0; $ indicate params is empty. 48 return; 49 50 elseif delim ^= sym_lp then 51 call ermsg(5, 0); 52 end if; 53 54 nuser = 0; $ number of user supplied arguments 55 ngen = 0; $ number of generated arguments 56 57 paramsp = 0; 58 59$ see if the left paren is followed by a semicolon. if so, there are 60$ no user supplied arguments. 61 62 tok = expand(0); 63 64 if tok = sym_semi then 65 semi_seen = yes; 66 else 67 semi_seen = no; 68 expand_buff = tok; $ return token 69 end if; 70 71 while 1; 72 param = expand(0); 73 74 if eof_flag then $ end of file 75 call eofr; 76 77 elseif mcode(param) ^= 0 then $ duplicate argument 78 call ermsg(6, param); 79 80 elseif semi_seen then $ generated argument 81 ngen = ngen + 1; 82 mcode(param) = bias_gen1 + ngen; 83 84 else $ user supplied argument 85 nuser = nuser + 1; 86 mcode(param) = bias_user + nuser; 87 end if; 88 89$ store in set of parameters 90 countup(paramsp, param_lim, 'params'); 91 params(paramsp) = param; 92 93 delim = expand(0); 94 95 if eof_flag then $ end of file 96 call eofr; 97 98 elseif delim = sym_rp then $ end of list 99 quit; 100 101 elseif delim = sym_semi then 102 if (semi_seen) call ermsg(7, 0); 103 semi_seen = yes; 104 105 elseif delim ^= sym_comma then 106 call ermsg(8, 0); 107 end if; 108 109 end while; 110 111 margs(mac) = nuser; $ set no. of args 112 113 if (expand(0) ^= sym_semi) call ermsg(9, 0); $ trailing ';' 114 115 116 end subr getparms; 1 .=member gettext 2 subr gettext(mac); 3 4$ this routine gets the text for the macro 'mac'. this is done 5$ by getting tokens from the macro expander and adding them to 6$ mttab until the end of the macro is encountered. 7 8$ as we get each token, we examine it to see if we have reached 9$ the end of the macro. there are two ways to indicate the 10$ end of a macro: 11 12$ endm mac; or endm; 13 14$ the word 'endm' alone is only recognized as the end of the 15$ macro if the macro text contains an equal number of imbedded 16$ 'macro'-s and 'endm'-s. 17 18$ we keep track of the length of the macro. if it exceeds a 19$ certain length, we assume that there is a missing 'endm'. 20 21$ note that parameters and generated arguments recieve a 22$ special representation in the macro text. this representation 23$ is given by the mcode field of their symbol table entries. 24$ a mcode field of zero indicates simple text; this is 25$ represented directly by a symbol table pointer. 26 27$ once we have gathered all the text, we must clear the mcode 28$ fields of the parameters. we use the set 'params' to do this. 29 30 31 size mac(ps); $ macro name 32 33 size len(ps), $ length of macro 34 lev(ps), $ nesting depth for nested macros 35 code(ps), $ encoding of word of macro text 36 j(ps), $ loop index 37 tok(ps), $ current token 38 tok1(ps), $ extra token 39 param(ps); $ parameter name 40 41 size expand(ps); $ lower level scanner routine 42 43 44 morg(mac) = mtabp + 1; $ set origin field 45 46 len = 0; $ length of macro 47 lev = 0; $ depth of nexting of macros 48 49 while 1; 50 len = len+1; 51 52 if len > mlen_lim then 53 call ermsg(10, 0); 54 quit; 55 end if; 56 57 tok = expand(0); 58 59 if eof_flag then $ end of file 60 call eofr; 61 62 elseif tok = sym_macro then $ nested macro definitions 63 lev = lev + 1; 64 65 elseif tok = sym_endm then $ end of some macro 66 if (lev = 0) quit; 67 lev = lev-1; $ decrement nesting depth 68 69 tok1 = expand(0); $ look for 'endm mac' 70 if (tok1 = mac) quit; 71 expand_buff = tok1; $ return token 72 73 elseif tok = mac then $ macro name occurs within itself 74 call ermsg(11, mac); 75 end if; 76 77 78$ find out how we will encode the token in mtab. if the tokens 79$ mcode is zero, then the token is straight text, and we simply 80$ use its symbol table pointer. otherwise the token is an argument, 81$ and mcode gives its encoding. 82 83$ each generated argument has one encoding on its first occurrence and 84$ another on all later occurrences. after getting the code of the token, 85$ see if it is a first occurrence of a generated argument. if so, 86$ change its mcode field to indicate its encoding on all later 87$ occurrences. 88 89 code = mcode(tok); 90 91 if code = 0 then $ simple text 92 code = tok; 93 94 elseif code > bias_gen1 & code < bias_gen2 then 95 mcode(tok) = code + (bias_gen2 - bias_gen1); 96 end if; 97 98 countup(mtabp, mtab_lim, 'mtab'); 99 mtab(mtabp) = code; 100 101$ adjust code for generated arguments 102 end while; 103 104$ add zero entry to macro text 105 countup(mtabp, mtab_lim, 'mtab'); 106 mtab(mtabp) = 0; 107 108$ get trailing tokens after 'endm' 109 until eof_flag ! tok = sym_semi; 110 tok = expand(0); 111 end until; 112 113$ reset encoding for parameters 114 do j = 1 to paramsp; 115 param = params(j); 116 mcode(param) = 0; 117 end do; 118 119 120 end subr gettext; 1 .=member getdrop 2 subr getdrop; 3 4$ this routine processes the 'macro drop' statement. this statement 5$ has the form: 6 7$ drop ; 8 9$ we iterate until we see a semicolon, dropping macros. 10 11 12 size mac(ps), $ macro name 13 delim(ps); $ delimeter 14 15 size expand(ps); $ lower level scanner routine 16 17 18 while 1; 19 mac = expand(0); 20 21 if eof_flag then $ end of file 22 call eofr; 23 elseif lex_typ(mac) > l_bold then $ not valid macro name 24 call ermsg(12, mac); 25 else 26 morg(mac) = 0; 27 margs(mac) = 0; 28 end if; 29 30 delim = expand(0); 31 32 if eof_flag then $ end of file 33 call eofr; 34 elseif delim = sym_semi then 35 return; 36 elseif delim ^= sym_comma then 37 call ermsg(13, 0); 38 end if; 39 40 end while; 41 42 43 end subr getdrop; 1 .=member expand 2 fnct expand(dummy); 3 4$ this is the bottom level routine of the macro expander. it 5$ checks mstack to see whether we are currently expanding macros. 6$ if so, it returns the next token of macro text; otherwise 7$ it gets a token from the scanner. 8 9$ in order to parse macro definitions, we keep a one token 10$ buffer. if this buffer contains a token, we return it. 11$ otherwise loop until we can return a token, expanding macro 12$ arguments where necessary. 13 14 15 size p(ps), $ pointer to mtab or astack 16 n(ps), $ number of arguments 17 arg(ps); $ pointer to astack 18 19 size expand(ps), $ token returned 20 trulex(ps), $ lower level scanner routine 21 bldgen(ps); $ builds generated arguments 22 23 24$ begin by checking 1 word buffer. 25 26 if expand_buff ^= 0 then $ return saved token 27 expand = expand_buff; 28 expand_buff = 0; 29 return; 30 end if; 31 32$ otherwise loop until we can return a token of macro text 33 34 while 1; 35 36 if mstackp = 0 then $ not expanding any macros 37 expand = trulex(0); 38 return; 39 end if; 40 41$ get pointer to next word of macro text, then advance mstack. 42 p = mstack(mstackp); 43 mstack(mstackp) = p+1; 44 45$ p points to the next word of text in either astack or mtab. 46$ if p is greater than 'mtab_lim' then it is a pointer into 47$ astack biased by mtab_lim. 48 49 if p > mtab_lim then 50 expand = astack(p-mtab_lim); 51 else 52 expand = mtab(p); 53 end if; 54 55 if expand = 0 then $ end of macro or argument. pop mstack. 56 n = mstack(mstackp-1); $ no. of args 57 58 if n ^= 0 then $ pop astack 59 astackp = mstack(mstackp-2)-1; 60 end if; 61 62 mstackp = mstackp-n-2; $ pop mstack 63 64 elseif expand <= symtab_lim then $ simple text 65 return; 66 67 elseif expand <= bias_gen1 then $ user argument 68 69$ get a pointer to the start of the argument in astack, then 70$ build a stack frame. 71 72 n = expand-bias_user; $ argument number 73 arg = mstack(mstackp-1-n); $ pointer to astack 74 75 mstackp = mstackp + 2; $ build stack frame 76 if (mstackp > mstack_lim) call overfl('mstack'); 77 78 mstack(mstackp) = arg + mtab_lim; $ store biased pointer 79 mstack(mstackp-1) = 0; 80 81 elseif expand <= bias_gen2 then $ first use of gen. arg. 82 expand = bldgen(expand-bias_gen1, yes); 83 return; 84 85 else $ second use of generated argument 86 expand = bldgen(expand-bias_gen2, no); 87 return; 88 end if; 89 90 end while; 91 92 93 end fnct expand; 1 .=member trulex 2 fnct trulex(dummy); 3 4$ this routine is the actual lexical scanner. it returns the next 5$ token from the input file. 6$ 7$ trulex is essentially a giant case statement which brances on the 8$ class of the first character in the token. each branch of the case 9$ statement contains a separate loop for gathering the remainder 10$ of the token. 11$ 12$ the global variable 'char' always contains the next character of 13$ the input file. there are several primitives for manipulating 14$ 'char': 15$ 16$ 1. getc: advances 'char' to the next character 17$ 2. backc: removes last character from token. 18$ 3. keepc: adds 'char' to the token being built up. 19$ 20$ the current card image is stored in a self defining string called 21$ 'card' with a standard origin. the number of characters per line 22$ is given by the macro 'cpc'. only the first 'scpc' characters 23$ are significant; the remaining characters are used for sequence 24$ numbers, etc. 25$ 26$ the current token is stored as an array of characters called 27$ 'tok'. it is converted to an sds before it is hashed. 28$ 29$ the flag 'eof_flag' is set when the end of the input file is 30$ encountered. it is checked every time we start processing a 31$ new token. 32$ 33$ once the end of file has been encountered, 'getc' always returns 34$ a blank as the next character. the blank will act as a delimiter 35$ for names, etc. without requiring a separate end of file check 36$ within most of the loops. 37 38 +* card_org = $ sorg for 'card' 39 (.sds. cpc + 1) 40 ** 41 42 43 +* card_char(i) = $ i-th character of current line 44 .f. card_org - (i*cs), cs, card 45 ** 46 47 48 +* keepc = $ store character in 'tok' 49 tok_pos = tok_pos + 1; 50 if (tok_pos > toklen_lim) call tok_over; 51 tok(tok_pos) = char; 52 ** 53 54 55$ the following macro is used to back up char, tok_pos, and 56$ card_pos. we will only back up as far as the start of 57$ the current token, so there is never any need to retreat to 58$ the previous line. 59 60 +* backc = 61 char = tok(tok_pos); 62 tok_pos = tok_pos-1; 63 card_pos = card_pos-1; 64 ** smfa 13 smfa 14 smfa 15$ we define a standard code block at the end of this routine to get the smfa 16$ next character from the input stream. the following macro is used to smfa 17$ simulate the internal procedure call. smfa 18 smfa 19 +* l_call(lab) = $ local call to code block at label -lab- smfa 20 retpt = zzya; $ save return label smfa 21 go to lab; $ branch to internal procedure smfa 22 /rlab(zzya)/ $ define label for return point smfa 23 ** smfa 24 smfa 25 size retpt(ps); $ return label smfa 26 smfa 27 .=zzyorg a $ reset counter for return labels smfa 28 smfa 29 .+mc size ctpc(ws); $ little 'convert-to-primary-case' smfa 30 smfa 31 size fold(1); $ indicates whether case-folding should be smfa 32 data fold = yes; $ performed 65 66 67 nameset nstrulex; 68 69 size card(.sds. cpc); $ current card image 70 size card_pos(ps); $ current position in card 71 size card_len(ps); $ index of last non-blank on card 72 73 size eor_seen(1); $ end-of-record has occurred 74 data eor_seen = no; 75 76 size tok(cs); $ token being built 77 dims tok(toklen_lim); 78 79 size tok_pos(ps), $ current position in tok = token le 80 tok_typ(ps); $ lex_typ of token 81 82 size char(ps), $ current character 83 char1(ps); $ extra character 84 85 size p(ps); $ symtab pointer 86 87$ little string search set definitions: 88 89$ little uses the following pre-defined string search sets: 90 91 +* ss_blank = 001 ** $ blank 92 +* ss_separ = 002 ** $ blank-equivalent separators 93 +* ss_digit = 004 ** $ digits 0...9 94 +* ss_ucchar = 008 ** $ upper case alphabetics a...z 95 +* ss_lcchar = 016 ** $ lower case alphabetics a...z 96 +* ss_break = 032 ** $ break character ('_') 97 98$ setl uses the following additional search classes: 99 100 +* ss_delim = 064 ** $ delimiter 101 +* ss_ddelim = 128 ** $ double delimiter 102 +* ss_alter = 512 ** $ 'altchar_tab'-defined characters 103 104$ groups of search classes: 105 106 +* ss_alpha = 024 ** $ alphabetic 107 +* ss_alpham = 060 ** $ alphameric 108 smfb 32$ for efficiency, we duplicate some of the little string functions as smfb 33$ internal subroutines, to avoid repeated external subroutine calls. smfb 34$ this implies that we need to dublicate the definition of the string smfb 35$ search tables. smfb 36 smfb 37 size sstab(ws); $ string search table smfb 38 dims sstab(cssz); smfb 39 smfc 11 .+r36. $ initialise sstab for s10/s20 (9 bit ascii) smfb 41 data sstab = smfb 42 0(9), ss_separ /* tab */, 0(2), ss_separ /* form feed */, smfb 43 0(19), ss_blank ! ss_separ, 0(15), ss_digit(10), 0(7), smfb 44 ss_ucchar(26), 0(4), ss_break, 0, ss_lcchar(26), 0(5), smfb 45 0(384); smfc 12 ..r36 smfb 47 .+s32. $ initialise sstab for s32 (8 bit ascii) smfb 48 data sstab = smfb 49 0(9), ss_separ /* tab */, 0(2), ss_separ /* form feed */, smfb 50 0(19), ss_blank ! ss_separ, 0(15), ss_digit(10), 0(7), smfb 51 ss_ucchar(26), 0(4), ss_break, 0, ss_lcchar(26), 0(5), smfb 52 0(128); smfb 53 ..s32 smfb 54 .+s37. $ initialise sstab for s37 (8 bit ebcdic) smfb 55 data sstab = smfb 56 0(5), ss_separ /* tab */, 0(6), ss_separ /* form feed */, smfb 57 0(51), ss_blank ! ss_separ, 0(44), ss_break, 0(18), 0(1), smfb 58 ss_lcchar(9), 0(7), ss_lcchar(9), 0(8), ss_lcchar(8), smfb 59 0(22), 0, ss_ucchar(9), 0(7), ss_ucchar(9), 0(8), smfb 60 ss_ucchar(8), 0(6), ss_digit(10), 0(6); smfb 61 ..s37 smfb 62 .+s47. $ initialise sstab for s47 (8 bit ascii) smfb 63 data sstab = smfb 64 0(9), ss_separ /* tab */, 0(2), ss_separ /* form feed */, smfb 65 0(19), ss_blank ! ss_separ, 0(15), ss_digit(10), 0(7), smfb 66 ss_ucchar(26), 0(4), ss_break, 0, ss_lcchar(26), 0(5), smfb 67 0(128); smfb 68 ..s47 smfb 69 .+s66. $ initialise sstab for s66 smfb 70 data sstab = smfb 71 0, smfb 72 ss_ucchar(26), $ alphabetics smfb 73 ss_digit(10), $ numerics smfb 74 0(8), smfb 75 ss_blank ! ss_separ, $ blank (the only separator) smfb 76 0(7), smfb 77 ss_break, $ break (underline) smfb 78 0(10); $ remaining characters smfb 79 ..s66 suna 24 .+s68. $ initialise sstab for mc68000 (8 bit ascii) suna 25 data sstab = suna 26 0(9), ss_separ /* tab */, 0(2), ss_separ /* form feed */, suna 27 0(19), ss_blank ! ss_separ, 0(15), ss_digit(10), 0(7), suna 28 ss_ucchar(26), 0(4), ss_break, 0, ss_lcchar(26), 0(5), suna 29 0(128); suna 30 ..s68 smfb 80 smfb 81 +* anyc(c, sm) = $ look for character in string set smfb 82 ((sm & sstab(1+c)) ^= 0) smfb 83 ** smfb 84 109$ macros for character types: 110 111 +* alphabetic(c) = anyc(c, ss_alpha) ** 112 +* alphameric(c) = anyc(c, ss_alpham) ** 113 +* numeric(c) = anyc(c, ss_digit) ** 114 115 +* is_blank(c) = 116 .+mc anyc(c, ss_separ) 117 .-mc (c = 1r ) 118 ** smfb 85 smfb 86$ define the table to map each character to the proper lexical case. smfb 87 smfb 88 size lttab(ws); $ lexical table smfb 89 dims lttab(cssz); smfb 90 smfb 91 .=zzyorg z smfb 92 smfb 93 defc(lt_separ) $ token separators smfb 94 defc(lt_alpha) $ alphabetic character smfb 95 defc(lt_ddelim) $ double-character delimiter smfb 96 defc(lt_delim) $ single-character delimiter smfb 97 defc(lt_alter) $ one-character alternate character smfb 98 defc(lt_period) $ the character '.' smfb 99 defc(lt_quote) $ the character -'- smfb 100 defc(lt_digit) $ numeric character smfb 101 defc(lt_dollar) $ the character '$' smfb 102 defc(lt_error) $ all remaining characters 119 120 121$ delimitiers 122$ ----------- 123 124$ the table 'dsym_tab' maps single delimiters to their symbol table 125$ entries. 126 127 +* delims = '(),;:=+-*/#?' ** 128 129 size dsym_tab(ps); 130 dims dsym_tab(.len. delims); 131 132$ 'ddelims(i,j)' is true for characters ij which form two character 133$ delimiters. 134 135 +* double_delims_1 = '(/:/*:' ** 136 +* double_delims_2 = '/)====*:' ** 137 138 +* ddelim(i, j) = .f. i+1, 1, a_ddelim(j+1) ** 139 smfb 103 size a_ddelim(cssz); 141 dims a_ddelim(cssz); 142 143$ codes for ddelim 144 145$ the code for ddelim are zero origined. this allows us to 146$ initialize the entries for all chracacters which are not 147$ double delimiters in a single data statement. 148 149 .=zzyorg z 150 151 defc0(dd_single) $ single delimiter 152 defc0(dd_double) $ double delimiter 153 154 +* dd_min = dd_single ** $ minumum value 155 +* dd_max = dd_double ** $ maximum value 156 157 data a_ddelim = dd_single(cssz); $ set to default case 158 159 end nameset; 160 161 162 size string(.sds. toklen_lim); $ token in sds form 163 164 size j(ps); $ loop index 165 size org(ps); $ origin for self-defining string 166 167 size trulex(ps); $ pointer returned 168 size hash(ps); $ hashing function called 169 smfb 104 size brkc(ws); $ little string primitives 172 size spns(ws); 173 174 175/begin/ $ main entry point 176 177 if (eof_flag) return; 178 179 tok_pos = 0; $ initialize length of token 180 smfb 105 go to case(lttab(1+char)) in lt_separ to lt_error; smfb 106 smfb 107/case(lt_separ)/ $ token separators 182 $ 183 $ advance to the next non-blank character 184 $ 185 if card_pos = 0 then smfa 33 l_call(getc); 187 if (is_blank(char) = no) go to begin; 188 end if; 189 190 card_pos = card_pos + spns(card, card_pos, ss_separ) - 1; smfa 34 l_call(getc); 192 193 go to begin; 194 195 smfb 108/case(lt_alpha)/ $ alphabetic character 197 $ 198 $ <*name> 199 $ 200 until alphameric(char) = no; smfa 35 keepc; l_call(getc); 202 end until; 203 204 tok_typ = l_name; 205 go to esac; 206 207 smfb 109/case(lt_ddelim)/ $ double-character delimiter 209 $ 210 $ this class includes characters such as '*' which may 211 $ either stand alone or be part of a two character 212 $ delimiter such as '**'. 213 $ 214 $ we begin by saving char and getting the next character. 215 $ next we lookup the two characters in 'ddelim' which 216 $ gives us one of two cases: 217 $ 218 $ 1. the first character is a delimiter and the second is 219 $ part of the next token. 220 $ 221 $ 2. the two characters form a double delimiter. 222 $ smfa 36 char1 = char; l_call(getc); $ save char, then get next char 224 225 go to c(ddelim(char1, char)) in dd_min to dd_max; 226 227 /c(dd_single)/ $ char1 is a single delimiter 228 229 trulex = dsym_tab(brkc(delims, 1, char1)+1); 230 return; 231 232 /c(dd_double)/ $ double delimiter 233 234 tok(1) = char1; tok(2) = char; tok_pos = 2; 235 tok_typ = l_delim; 236 smfa 37 l_call(getc); $ get character after delimiter 238 go to esac; 239 240 smfb 110/case(lt_delim)/ $ single-character delimiter 242 $ 243 $ look up the symbol table pointer and return 244 $ 245 trulex = dsym_tab(brkc(delims, 1, char)+1); smfa 38 l_call(getc); 247 return; 248 249 smfb 111/case(lt_alter)/ $ one-character alternate character 251 $ 252 $ one character alternative for setl symbol or literal 253 $ 254 trulex = altchar_tab(brkc(altchars, 1, char)+1); smfa 39 l_call(getc); 256 return; 257 258 smfb 112/case(lt_period)/ $ the character '.' 260 $ 261 $ <*bold>, <*real>, or <*dots> 262 $ 263 $ if the period is followed by an alphabetic, it is part 264 $ of a <*bold>; if it is followed by a numeric, it is 265 $ part of a <*real>; otherwise it must be the start of 266 $ a <*dots>. 267 $ smfa 40 keepc; l_call(getc); $ read character after dot 269 270 if alphabetic(char) then $ start of <*bold> 271 until alphameric(char) = no; smfa 41 keepc; l_call(getc); 273 end until; 274 275 tok_typ = l_bold; 276 277 elseif numeric(char) then $ start of <*real> 278 go to get_real; 279 280 elseif char = 1r. then $ start of <*dots> 281 until char ^= 1r.; smfa 42 keepc; l_call(getc); 283 end until; 284 285 tok_typ = l_dots; 286 287 else $ invalid starting character 288 call ermsg(14, 0); 289 go to begin; 290 end if; 291 292 go to esac; 293 294 smfb 113/case(lt_quote)/ $ the character -'- 296 $ 297 $ <*string> 298 $ 299 $ imbedded quotes are represented by a pair of quotes 300 $ with no intervening blanks. two string denotations 301 $ separated by blanks are treated as a single denotation. 302 $ 303 $ n.b. both the opening and closing quotes are considered 304 $ part of the token. this way the string -'x'- 305 $ looks different from the identifier -x-. 306 $ 307 $ strings are particularly likely to overflow the token 308 $ buffer, for example if the closing quote is missing. 309 $ for this reason we make an explicit test and jump out 310 $ of the loop if necessary. 311 312 eor_seen = no; $ initialise end-of-record flag 313 keepc; $ save quote smfa 43 fold = no; $ do not fold case while reading string 314 315 while 1; smfa 44 l_call(getc); 317 318 if char = 1r' then $ found quote smfa 45 fold = yes; $ could be end of string smfa 46 l_call(getc); $ read next character 320 321 if char = 1r' then $ double quote 322 keepc; 323 324 elseif char = 1r then $ look for adjacent string 325 until is_blank(char) = no; smfa 47 l_call(getc); if (eof_flag) quit; 327 if (eor_flag) eor_seen = yes; 328 end until; 329 330 if (char ^= 1r') quit; 331 if ( ^ eor_seen) quit; 332 else 333 quit; 334 end if; smfa 48 smfa 49 fold = no; $ continue to read string 335 336 elseif eof_flag then $ missing close quote 337 call eofr; 338 339 else $ simple text 340 keepc; 341 end if; 342 343 if tok_pos = toklen_lim - 1 then $ too long 344 call ermsg(18, 0); 345 quit; 346 end if; 347 348 end while; smfa 50 smfa 51 fold = yes; $ fold case while not reading string 349 350 $ insert final quote in token 351 tok_pos = tok_pos + 1; 352 tok(tok_pos) = 1r'; 353 354 tok_typ = l_string; 355 go to esac; 356 357 smfb 114/case(lt_digit)/ $ numeric character 359 $ 360 $ <*int>, or <*real> 361 $ 362 $ begin by collecting the integer part of the token. 363 $ then see if the digits are followed by a decimal point. 364 $ 365 until numeric(char) = no; smfa 52 keepc; l_call(getc); 367 end until; 368 $ 369 $ if the digits are followed by a period, we may have 370 $ either one token, namely a real denotation such as 1.0, 371 $ or two tokens, such as in '1 .op'. we can determine 372 $ the proper alternative by inspecting the character 373 $ after the period. 374 $ 375 if char = 1r. then smfa 53 keepc; l_call(getc); 377 378 if numeric(char) then $ <*real> - get fraction 379 go to get_real; 380 else $ period is part of the next token 381 backc; 382 end if; 383 end if; 384 385 tok_typ = l_int; 386 go to esac; 387 388 389/get_real/ 390 391 until numeric(char) = no; smfa 54 keepc; l_call(getc); 393 end until; 394 395 if char = 1re then $ probably its an exponent smfa 55 keepc; l_call(getc); 397 398 if char = 1r+ ! char = 1r- then $ sign for exponent smfa 56 keepc; l_call(getc); 400 end if; 401 402 if numeric(char) then $ valid exponent 403 until numeric(char) = no; smfa 57 keepc; l_call(getc); 405 end until; 406 407 else $ we had '1.0 else' 408 backc; 409 end if; 410 411 end if; 412 413 tok_typ = l_real; 414 go to esac; 415 416 smfb 115/case(lt_dollar)/ $ the character '$' 418 $ 419 $ rest of line is comment 420 $ smfa 58 call getcard; 422 go to begin; 423 424 smfb 116/case(lt_error)/ $ all remaining characters 426 $ 427 $ current character can not start a token 428 $ 429 call ermsg(14, 0); 430 smfa 59 l_call(getc); 432 go to begin; 433 435 436/esac/ $ hash in token 437 438$ we begin by converting the token to an sds. the sds must 439$ be standard. i.e. each token xxx must have the same sorg 440$ as a string xxx generated by the little compiler. 441 442 org = .sds. tok_pos + 1; $ origin for string 443 444 string = 0; $ initialize string 445 slen string = tok_pos; 446 sorg string = org; 447 448 do j = 1 to tok_pos; 449 .f. org-j*cs, cs, string = tok(j); 450 end do; 451 452 p = hash(string); 453 454$ if the token is a literal or keyword then its lex_type has 455$ already been set; otherwise we set it to tok_typ. 456 457 if (key_val(p) = 0 & lit_val(p) = 0) lex_typ(p) = tok_typ; 458 459 trulex = p; smfa 60 smfa 61 return; smfa 62 smfa 63 smfa 64/getc/ $ local subroutine to read next character smfa 65 smfa 66$ this routine places the next character from the input stream into the smfa 67$ global variable 'char'. the global 'fold' indicates whether the smfa 68$ character is to be translated to its primary case equivalent. smfa 69 smfa 70 if card_pos >= scpc then $ read next card smfa 71 eor_flag = yes; $ indicate end-of-record smfa 72 call getcard; $ read next card smfa 73 else $ advance in buffer smfa 74 eor_flag = no; smfa 75 card_pos = card_pos + 1; smfa 76 char = card_char(card_pos); smfa 77 smfa 78 .+s66. smfa 79$ if using 64 set and outside of quotes, convert percent to colon smfb 117 if (fold .and. (char=3b'63')) char = 1r:; smfa 81 ..s66 smfa 82 smfa 83 .+mc if (fold) char = ctpc(char); smfa 84 end if; smfa 85 smfa 86 go to rlab(retpt) in 1 to zzya; $ return from local subroutine 460 461 462 end fnct trulex; 1 .=member initlex 2 subr initlex; 3 4$ this routine initializes various tables used by trulex. 5 6 7 size j(ps); $ loop index 8 size p(ps); $ symbol table pointer 9 size str(.sds. 1); $ used to initialize delimiters 10 size c(cs); $ character code 11 size c1(cs), c2(cs); $ pair of characters 12 13 access nstrulex; 14 15 size hashlit(ps); $ hashing function 16 size brkc(ws); $ little break string primitive 17 18 19 $ initialise the input line to a blank line 20 card = ' ' .pad. cpc; sorg card = card_org; slen card = wpc * cpw; 21 card_pos = 0; 22 char = 1r ; 23 24 $ initialize delimiters, finding their hash, and setting 25 $ their class. we will revise the class for double 26 $ delimiters later. 27 28 call blds(delims, ss_delim); 29 30 str = ' '; 31 32 do j = 1 to .len. delims; 33 c = .ch. j, delims; .ch. 1, str = c; smfb 118 sstab(1+c) = sstab(1+c) ! ss_delim; $ define pattern set 34 35 p = hashlit(str); 36 dsym_tab(brkc(delims, 1, c)+1) = p; 37 end do; 38 39 $ process double delimiters 40 $ 41 $ n.b. 'ddelim' has already been initialized by a data state- 42 $ ment so that all invalid pairs indicate dd_single. 43 44 call blds(double_delims_1, ss_ddelim); 45 46 do j = 1 to .len. double_delims_1; 47 c1 = .ch. j, double_delims_1; 48 c2 = .ch. j, double_delims_2; 49 ddelim(c1, c2) = dd_double; smfb 119 smfb 120 sstab(1+c1) = sstab(1+c1) ! ss_ddelim; $ define pattern set 50 end do; 51 52 $ initialize alternate characters 53 54 .len. altchars = 0; 55 56 .+s10. 57 if char_set = cset_ext then 58 call altchar('<<', 3b'173'); $ left set brace 59 call altchar('>>', 3b'175'); $ right set brace 60 call altchar('(/', 3b'133'); $ left square bracket 61 call altchar('/)', 3b'135'); $ right square bracket 62 call altchar('st', 3b'174'); $ vertical bar 63 call altchar('st', 3b'041'); $ exclamation mark 64 end if; 65 ..s10 66 67 .+s20. 68 if char_set = cset_ext then 69 call altchar('<<', 3b'173'); $ left set brace 70 call altchar('>>', 3b'175'); $ right set brace 71 call altchar('(/', 3b'133'); $ left square bracket 72 call altchar('/)', 3b'135'); $ right square bracket 73 call altchar('st', 3b'174'); $ vertical bar 74 call altchar('st', 3b'041'); $ exclamation mark 75 end if; 76 ..s20 77 78 79 .+s32. 80 if char_set = cset_ext then 81 call altchar('<<', 4b'7b'); $ left set brace 82 call altchar('>>', 4b'7d'); $ right set brace 83 call altchar('(/', 4b'5b'); $ left square bracket 84 call altchar('/)', 4b'5d'); $ right square bracket 85 call altchar('st', 4b'7c'); $ vertical bar 86 call altchar('st', 4b'21'); $ exclamation mark 87 end if; 88 ..s32 89 90 .+s37. 91 if char_set = cset_ext then 92 call altchar('<<', 4b'8b'); $ left set brace 93 call altchar('>>', 4b'9b'); $ right set brace 94 call altchar('(/', 4b'ad'); $ left square bracket 95 call altchar('/)', 4b'bd'); $ right square bracket 96 call altchar('st', 4b'4f'); $ vertical bar 97 call altchar('st', 4b'5a'); $ exclamation mark asca 15 asca 16 .+ascebc. asca 17$ cset_por for s37 reflects tn chain conventions; cset_asc reflects asca 18$ the (cdc-based) ascii to ebcdic translation tables used at nyu. asca 19$ use cset_asc for files originally written in ascii which are to asca 20$ be compiled under ebcdic using the translation used by nyu tape asca 21$ driver. asca 22 elseif char_set = cset_asc then asca 23 call altchar('<<', 4b'c0'); $ left set brace asca 24 call altchar('>>', 4b'd0'); $ right set brace asca 25 call altchar('(/', 4b'4a'); $ left square bracket asca 26 call altchar('/)', 4b'5a'); $ right square bracket asca 27 call altchar('st', 4b'6a'); $ vertical bar asca 28 call altchar('st', 4b'4f'); $ exclamation mark asca 29 ..ascebc 98 end if; 99 ..s37 100 101 .+s47. 102 if char_set = cset_ext then 103 call altchar('<<', 4b'7b'); $ left set brace 104 call altchar('>>', 4b'7d'); $ right set brace 105 call altchar('(/', 4b'5b'); $ left square bracket 106 call altchar('/)', 4b'5d'); $ right square bracket 107 call altchar('st', 4b'7c'); $ vertical bar 108 call altchar('st', 4b'21'); $ exclamation mark 109 end if; 110 ..s47 111 112 .+s66. 113 if char_set = cset_ext then 114 call altchar('<<', 3b'74'); $ at sign 115 call altchar('>>', 3b'75'); $ reverse slant 116 call altchar('(/', 3b'61'); $ left square bracket 117 call altchar('/)', 3b'62'); $ right square bracket 118 call altchar('st', 3b'67'); $ ampersand 119 end if; 120 ..s66 suna 31 suna 32 .+s68. suna 33 if char_set = cset_ext then suna 34 call altchar('<<', 4b'7b'); $ left set brace suna 35 call altchar('>>', 4b'7d'); $ right set brace suna 36 call altchar('(/', 4b'5b'); $ left tuple bracket suna 37 call altchar('/)', 4b'5d'); $ right tuple bracket suna 38 call altchar('st', 4b'7c'); $ vertical bar suna 39 call altchar('st', 4b'21'); $ exclamation mark suna 40 end if; suna 41 ..s68 121 122 call blds(altchars, ss_alter); smfb 121 smfb 122 do j = 1 to .len. altchars; smfb 123 c = .ch. j, altchars; smfb 124 sstab(1+c) = sstab(1+c) ! ss_alter; $ define pattern set smfb 125 end do; smfb 126 smfb 127 do c = 0 to cssz-1; smfb 128 if is_blank(c) then lttab(1+c) = lt_separ; cont; end; smfb 129 if alphabetic(c) then lttab(1+c) = lt_alpha; cont; end; smfb 130 if anyc(c, ss_ddelim) then lttab(1+c) = lt_ddelim; cont; end; smfb 131 if anyc(c, ss_delim) then lttab(1+c) = lt_delim; cont; end; smfb 132 if anyc(c, ss_alter) then lttab(1+c) = lt_alter; cont; end; smfb 133 if c = 1r. then lttab(1+c) = lt_period; cont; end; smfb 134 if c = 1r' then lttab(1+c) = lt_quote; cont; end; smfb 135 if numeric(c) then lttab(1+c) = lt_digit; cont; end; smfb 136 if c = 1r$ then lttab(1+c) = lt_dollar; cont; end; smfb 137 lttab(1+c) = lt_error; smfb 138 end do; smfb 139 smfb 140 smfb 141 macdrop(anyc); smfb 142 smfb 143 macdrop4(lt_separ, lt_alpha, lt_ddelim, lt_delim) smfb 144 macdrop4(lt_alter, lt_period, lt_quote, lt_digit) smfb 145 macdrop2(lt_dollar, lt_error) 123 124 125 end subr initlex; 1 .=member altch 2 subr altchar(s, c); 3 4$ this routine initializes the character 'c' as an alternate to the 5$ string 's'. 6 7 8 size s(.sds. 72); $ string representing token 9 size c(cs); $ one character alternate for 's' 10 11 size hashlit(ps); $ hashing function called 12 13 14 .len. altchars = (.len. altchars) + 1; 15 .ch. (.len. altchars), altchars = c; 16 17 altchar_tab(.len. altchars) = hashlit(s); 18 19 20 end subr altchar; 1 .=member getcard 2 subr getcard; 3 4$ this routine reads the next card image and sets 'char' to the first 5$ character. it also looks for title, eject, and include cards. 6 7$ the card image is read by the little library routine 'getinc' which 8$ returns an array in r(cpw) format. we begin by converting it to 9$ a self defining string 10 11$ card images are added to the listing file after they have been 12$ parsed. this way the card image for 'proc p' can be used as a 13$ title before it appears in the text. 14 15 16 size ara(ws); $ array read by getinc 17 dims ara(wpc); 18 19 size j(ps), $ loop index 20 tab_str(.sds. 1), $ tab character 21 init_col(ps), $ initial column for title 22 p1(ps), $ pointer to start of title 23 p2(ps); $ pointer to end of title 24 25 size str(.sds. cpc), $ string containing 'list', etc. 26 title(.sds. cpc); $ new listing title 27 size anyc(ws); $ little string primitive 28 29 access nstrulex; 30 31 32 call put_card; $ print previous card 33 34/loop/ $ process listing directives 35 36 call getinc(ara, 1, wpc, eof_flag); 37 line_no = line_no + 1; 38 39 if eof_flag then $ set 'card' to blanks and return 40 card = ' ' .pad. cpc; 41 return; 42 end if; 43 44 do j = 1 to wpc; 45 .f. card_org - j*ws, ws, card = ara(j); 46 end do; 47 48 sorg card = card_org; 49 slen card = wpc * cpw; 50 51$ look for '.' in column 2. if we find one look for the next blank 52$ and find the string between the dot and the blank. 53 54 if card_char(2) = 1r. then 55 56 do j = 3 to cpc; 57 .+mc if (anyc(card_char(j), 2)) quit do; 58 .+mc $ anyc(..., 2) matches blank-equivalent separators 59 .-mc if (card_char(j) = 1r ) quit do; 60 end do; 61 62 str = .s. 3, j-3, card; 63 .+mc call stpc(str); $ convert to primary case 64 65 if str .seq. 'list' then $ turn on listing 66 list_flag = yes; 67 go to loop; 68 69 elseif str .seq. 'nolist' then $ turn off listing 70 list_flag = no; 71 go to loop; 72 73 elseif str .seq. 'eject' then $ page eject 74 if (list_flag) put, page; 75 go to loop; 76 77 elseif str .seq. 'title' then $ change title 78 79$ the title is surrounded by quotes. find it. 80 p1 = '''' .in. card; 81 if (p1 = 0) p1 = 1; 82 card_char(p1) = 1r ; 83 84 p2 = '''' .in. card; 85 if (p2 = 0) p2 = scpc; 86 title = .s. p1+1, p2-p1-1, card; 87 88$ call little library to set title 89 if cc_tab = 1r then 90 call etitlr(yes, '', 9, cpc+1); 91 call etitlr(yes, ' ', 5, 1); 92 else 93 tab_str = ' '; 94 .ch. 1, tab_str = cc_tab; 95 call etitlr(yes, tab_str, 9, cpc); 96 call etitlr(yes, ' ', 5, 1); 97 end if; 98 99 if seqf_flag then 100 init_col = 10; 101 else 102 init_col = 2; 103 end if; 104 105 call etitlr(yes, title, init_col, 0); 106 if (list_flag) put, page; 107 108 go to loop; 109 end if; 110 end if; 111 112 is_listed = no; $ line has not been listed 113 is_written = no; $ line has not been written for optimiser 114 115 card_pos = 0; 116 char = 1r ; $ nominal blank at end of line 117 118 119 end subr getcard; 1 .=member putcard 2 subr put_card; 3 4$ this routine prints the current card image. 5 6 size j(ps); $ loop index 7 size temp1(ws); $ temporaries for line/statement numbers 8 size temp2(ws); 9 10 size anyc(ws); $ little seek-character string primitive 11 size rsps(ws); $ little right-span string-set string prim 12 13 access nstrulex; 14 15 .+sq1. 16 if ((slen ssm_title) ^= 0) & ^ is_written then 17 $ (logically) 'delete' trailink blanks 18 card_len = slen card; 19 if is_blank((.ch. card_len, card)) then 20 card_len = card_len - rsps(card, card_len, ss_separ); 21 end if; 22 23 $ copy the line for the optimiser 24 putbhdr(bt_string, card_len); 25 26 do j = 1 to (card_len + cpw - 1) / cpw; 27 putbdat((.f. (sorg card)-j*ws, ws, card)) 28 end do; 29 30 is_written = yes; $ line has been copied 31 end if; 32 ..sq1 33 if (is_listed ! ^ list_flag) return; 34 35 is_listed = yes; 36 37 go to prcase(seqf_flag) in 0 to 3; 38 39/prcase(1)/ $ line number only 40 41 put :line_no ,i(7); 42 go to print_tab; 43 44 45/prcase(2)/ $ statement number only 46 47 put :stmt_count ,i(7); 48 go to print_tab; 49 50 51/prcase(3)/ $ 3 digits of each 52 53 temp1 = mod(line_no,1000); 54 if (temp1 = 0) call setdig(1, line_no/1000, 2, yes); 55 56 temp2 = mod(stmt_count, 1000); 57 if (temp1 = 0) call setdig(1, stmt_count/1000, 6, yes); 58 if (stmt_count < 10) call etitlr(1, ' ', 6, 3); 59 60 put :temp1 ,i(3) ,x(1) :temp2 ,i(3); 61 62 63/print_tab/ 64 65 66$ if a tab character is available, print it so that the following 67$ will be aligned properly. 68 69 if cc_tab ^= 1r then 70 put: cc_tab, r(1); 71 else 72 put, column(9); 73 end if; 74 75/prcase(0)/ 76 77 put: (.s. 1, scpc, card), a; 78 79 if ( (.s. scpc+1, cpc-scpc, card) 80 .sne. (' ' .pad. cpc-scpc)) then 81 put, ' . ': (.s. scpc+1, cpc-scpc, card), a; 82 end if; 83 84 if m_flag then 85 if cc_tab ^= 1r then 86 put, column(16+cpc); 87 else 88 put, column(11+cpc); 89 end if; 90 91 put :cstmt_count ,i; 92 end if; 93 94 put, skip; 95 96 97 end subr put_card; 1 .=member setdig 2 subr setdig(p1, numb, from, blank_zer); 3 4 size p1(ps), 5 numb(ws), 6 from(ps), 7 blank_zer(1); 8 9 size temp(.sds. 3), 10 i(ps), 11 num(ws), 12 leng(ps); 13 14 temp = ' '; leng = 3; 15 16 num = numb; 17 18 do i = 3 to 1 by -1; 19 if ( (num > 0) .or. ^ blank_zer) .ch. i, temp = mod(num,10) 20 + 1r0; 21 num = num / 10; 22 end do; 23 24 call etitlr(p1, temp, from, leng); 25 26 end subr setdig; 1 .=member atitle 2 subr atitle; 3 4$ this routine performs automatic titling. 5 6 access nstrulex; 7 8 size j(ps), $ loop index 9 tab_str(.sds. 1), $ tab character 10 init_col(ps), $ starting column for heading 11 len(ps); $ length of title 12 13 if at_flag then 14 15$ find first non blank then character 16 j = 1; 17 18 while 1; 19 if (j > cpc) quit; 20 if (.ch. j, card ^= 1r ) quit; 21 22 j = j + 1; 23 end while; 24 25 len = scpc-j+1; 26 27 if cc_tab = 1r then 28 call etitlr(yes, '', 9, cpc+1); 29 call etitlr(yes, ' ', 5, 1); 30 else 31 tab_str = ' '; 32 .ch. 1, tab_str = cc_tab; 33 call etitlr(yes, tab_str, 9, cpc); 34 call etitlr(yes, ' ', 5, 1); 35 end if; 36 37 if (seqf_flag) then 38 init_col = 10; 39 else 40 init_col = 2; 41 end if; 42 43 call etitlr(yes, (.s. j, len, card), init_col, len); 44 put, page; 45 end if; 46 47 48 end subr atitle; 1 .=member tokover 2 subr tok_over; 3 4$ this routine is called when a token exceeds the maximum token length. 5$ we print an error message and then reset tok_pos so that only the 6$ second half of the token will be saved. 7 8 access nstrulex; 9 10 call ermsg(18, 0); 11 tok_pos = 0; 12 13 14 end subr tok_over; 1 .=member bldgen 2 fnct bldgen(n, first); 3 4$ this routine builds a token for a generated macro argument. 5$ we allow each macro to have up to 26 generated arguments. 6 7$ we generate a token of the form 8 9$ gax.yyyyy 10 11$ where x is the n-th letter of the alphabet and yyyyy is the number of 12$ macro expansions which have contained a token starting with 'gax'. 13 14$ first is true if this is the first token 'gax' we are generating 15$ for this macro expansion. if first is true, we increment the 16$ counter yyyyy. 17 18 19 size n(ps), $ number of generated argument 20 first(1); $ on for first occurrence of argument 21 22 size nn(ps), $ local copy of n. 23 j(ps), $ loop index 24 str(.sds. 9); $ string for token 25 26 size count(ps); $ array of counters yyyyy 27 dims count(param_lim); 28 data count = 0(param_lim); 29 30 +* count_lim = 99999 ** $ maximum value for yyyyy 31 32 33 size bldgen(ps), $ value returned 34 hash(ps); $ hashing function 35 36 37 if first then 38 countup(count(n), count_lim, 'generated arguments'); 39 end if; 40 41 str = 'ga . '; $ initialize string 42 .ch. 3, str = .ch. n, 'abcdefghijklmnopqrstuvwxyz'; 43 44 nn = count(n); $ copy counter for conversion to string 45 46 do j = 9 to 5 by -1; 47 .ch. j, str = charofdig(mod(nn, 10)); 48 nn = nn/10; 49 end do; 50 51 bldgen = hash(str); 52 lex_typ(bldgen) = l_name; 53 54 55 end fnct bldgen; 1 .=member bugact 2 subr bugact; 3 4$ this routine processes compiler debugging options. 5 6 7 access parsens; 8 9 10 if dbg_min<=key_val(parse_tok) & key_val(parse_tok)<=dbg_max then 11 12 go to case(key_val(parse_tok)) in dbg_min to dbg_max; 13 14 15/case(dbg_ptrm0)/ $ turn macro processor trace off 16 mt_flag = no; go to esac; 17 18/case(dbg_ptrm1)/ $ turn macro processor trace on 19 mt_flag = yes; go to esac; 20 21/case(dbg_ptrp0)/ $ turn parse trace off 22 pt_flag = no; go to esac; 23 24/case(dbg_ptrp1)/ $ turn parse trace on 25 pt_flag = yes; go to esac; 26 27/case(dbg_ptrt0)/ $ turn token trace off 28 tt_flag = no; go to esac; 29 30/case(dbg_ptrt1)/ $ turn token trace on 31 tt_flag = yes; go to esac; 32 33/case(dbg_prsod)/ $ dump open-tokens 34 call odump; go to esac; 35 36/case(dbg_prspd)/ $ dump polish and xpolish strings 37 call pdump; go to esac; 38 39/case(dbg_prssd)/ $ dump parser dbgbol table 40 call sdump; go to esac; 41 42 43/esac/ 44 45 46 end if; 47 48 49 end subr bugact; 1 .=member hash 2 fnct hash(string); 3 4$ this routine hashes a token into symtab and returns a pointer to 5$ it. 6 7$ self defining strings generally contain a few unused bits 8$ whose value is undefined. 'hash' assumes that these bits 9$ are all zero. this is true for strings produced by 'trulex', 10$ but not for strings produced by '.s.', etc. 11 12$ strings whose extra bits are not guarenteed zero should 13$ be hashed by calling 'hashlit'. 14 15 16 size string(sds_sz); $ token to be hashed 17 18 size hash(ps); $ symtab pointer returned 19 20 size indx(ps), $ index into hash headers 21 len(ps), $ slen of string 22 org(ps), $ sorg of string 23 words(ps), $ no. of words in new names entry 24 nam(ps), $ pointer to names 25 hashc(ws), $ hash code of token 26 word(ws), $ current word of token 27 head(ps), $ pointer to hash header 28 j(ps); $ loop index 29 30 31$ we begin by making a names entry for the new token. we do this 32$ now since it is easier to compare two names entries than a names 33$ entry and a string. we do not adjust 'namesp' until we build 34$ a new symtab entry. 35 36$ as we make the names entry, we compute the hash code as the 37$ exclusive-or of all its words. 38 39 len = slen string; $ length of token 40 org = sorg string; $ origin of token 41 42 words = org/ws; $ number of words needed 43 if (namesp + words > names_lim) call overfl('names'); 44 45 hashc = 0; $ hash code 46 47 do j = 1 to words; 48 word = .f. 1 + (j-1)*ws, ws, string; 49 50 hashc = hashc .ex. word; 51 names(namesp+j) = word; 52 end do; 53 54 hashc = (.f. 1, ws/2, hashc) .ex. (.f. ws/2+1, ws/2, hashc); 55 56$ get a pointer into heads, and search the clash list. 57 indx = mod(hashc, heads_lim) + 1; 58 hash = heads(indx); 59 60 while hash ^= 0; 61 62 nam = name(hash); 63 64 do j = 1 to words; 65 if (names(nam-1+j) ^= names(namesp+j)) go to nxt; 66 end do; 67 68 return; 69 70 /nxt/ 71 hash = link(hash); 72 end while; 73 74$ no match. get free symtab entry 75 countup(symtabp, symtab_lim, 'symtab'); 76 hash = symtabp; 77 78$ fill in symtab entry 79 symtab(hash) = 0; 80 name(hash) = namesp + 1; 81 namesp = namesp + words; 82 83$ add to front of clash list 84 link(hash) = heads(indx); 85 heads(indx) = hash; 86 87 88 end fnct hash; 1 .=member hashlit 2 fnct hashlit(str1); 3 4$ this is a special hashing routine for processing literals. 5$ it hashes strings which are not necessarily in the format 6$ required by 'hash'. it also converts bold names to names 7$ if we are not in the prefix stropping mode. 8 9 10 size str1(.sds. 30); $ original string 11 12 size hashlit(ps); $ pointer returned 13 14 size str2(.sds. 30), $ new string built 15 p1(ps), $ position in str1 16 p2(ps), $ position in str2 17 j(ps), $ loop index 18 len(ps); $ length of strings 19 20 size hash(ps); $ hashing function 21 22 23 p1 = 1; $ initialize 24 p2 = 1; 25 26 len = slen str1; 27 28 str2 = 0; 29 30 sorg str2 = .sds. len+1; 31 slen str2 = len; 32 33 do j = 0 to len-1; 34 .ch. p2+j, str2 = .ch. p1+j, str1; 35 end do; 36 37 hashlit = hash(str2); 38 39 40 end fnct hashlit; 1 .=member stat1 2 subr stat1; 3 4$ this routine resets the statement counter to 1. 5 6 cstmt_count = cstmt_count + 1; 7 ustmt_count = cstmt_count; 8 estmt_count = 0; 9 .+sq1. 10 if (slen ssm_title) then 11 if cstmt_count = 1 then 12 putbhdr(bt_map, 0) 13 else 14 putbhdr(bt_tuple, 1) 15 putbhdr(bt_tuple, 1) 16 end if; 17 18 putbhdr(bt_tuple, 0) 19 putbhdr(bt_int, 1) putbdat(cstmt_count) 20 putbhdr(bt_tuple, 0) 21 end if; 22 ..sq1 23 24 25 end subr stat1; 1 .=member newstat 2 subr new_stat; 3 4$ this routine is called at the start of each statement. 5 6 cstmt_count = cstmt_count + 1; 7 .+sq1. smfc 13$ assert 0 <= cstmt_count < li_dbas as defined in compl.lipkg. 8 if (slen ssm_title) then 9 if cstmt_count = 1 then 10 put ,skip; $ emit blank line 11 call contlpr(27, yes);$ echo to the terminal 12 put ,'*** setl compiler error (code po1) ***' ,skip; 13 call contlpr(27, no); $ stop to echo to the terminal 14 call ltlfin(1, 0); $ abnormally terminate 15 end if; 16 17 putbhdr(bt_tuple, 1) 18 putbhdr(bt_tuple, 1) 19 20 putbhdr(bt_tuple, 0) 21 putbhdr(bt_int, 1) putbdat(cstmt_count) 22 putbhdr(bt_tuple, 0) 23 end if; 24 ..sq1 25 26 27 end subr new_stat; 1 .=member back 2 subr back(i, j); 3 4$ this routine backs up the scanner by i tokens and the polish string 5$ by j nodes. 6 7$ i and j are both in the range 0-1. 8 9 10 size i(ps), 11 j(ps); 12 13 access backup; 14 15 16 tok_out = tok_out - i; 17 if (tok_out = 0) tok_out = tok_lim; 18 19 polp = polp - j; 20 21 22 end subr back; 1 .=member backx 2 subr backx; 3 4$ this routine backs up the auxiliary string one node. 5 6 xpolp = xpolp - 1; 7 8 9 end subr backx; 1 .=member copyx 2 subr copyx; 3 4$ this routine copies the last node on the polish string file onto 5$ the auxiliary file. 6 7 if (xpolp = xpol_lim) call putxtab; 8 xpolp = xpolp + 1; 9 10 xpolish(xpolp) = polish(polp); 11 12 13 end subr copyx; 1 .=member mx 2 subr mx(node); 3 4$ this routine writes a marker onto the auxiliary string. 5 6 size node(ps); $ marker p_xxx 7 8 putxp(node, pol_mark); 9 10 11 end subr mx; 1 .=member stackexp 2 subr stackexp; 3 4$ this is the first of three routine which make up an operator precedenc 5$ parser for expressions. 6 7$ the precedence parser does not handle all expressions, but rather thos 8$ expressions made up of binary and unary operators. 9 10$ when the top down parser recognizes operators and operands it immed- 11$ iately writes them onto the polish string. each time it recognizes an 12$ operator it calls the precedence parser which removes the operator 13$ from the string, places it on a stack, and replaces it with any 14$ previous operators which have higher precedence. 15 16$ since expressions can be nested, we must have some way of delimiting 17$ the stack entries associated with each expression. we do this by 18$ calling 'stackexp' at the start of each expression. we push a zero 19$ word onto the stack. this has the effect both of separating the 20$ entries for different expressions and of pushing an entry of 21$ minimal precedence on the stack. 22 23$ the precedence stack is called 'ostack' and has three fields: 24 25$ oname: names pointer for operator 26$ omark: a marker, one of p_bin, p_ubin, p_un, p_uun. 27$ oprec: the precedence. 28 29$ the marker node is emitted after the operator itself. 30 31$ the variable 'ostackp' points to the last entry in ostack. 32 33 34 nameset ostack; 35 36 +* ostack_lim = 20 ** 37 suna 42 +* ostack_sz = suna 43 .+r32 64 suna 44 .+r36 32 suna 45 .+s66 32 suna 46 ** 41 42 size ostack(ostack_sz); 43 dims ostack(ostack_lim); 44 45 size ostackp(ps); 46 data ostackp = 0; 47 suna 47 .+r32. suna 48 +* oname(i) = .f. 01, 32, ostack(i) ** suna 49 +* omark(i) = .f. 33, 08, ostack(i) ** suna 50 +* oprec(i) = .f. 41, 08, ostack(i) ** suna 51 ..r32 suna 52 .+r36. suna 53 +* oname(i) = .f. 01, 16, ostack(i) ** suna 54 +* omark(i) = .f. 17, 08, ostack(i) ** suna 55 +* oprec(i) = .f. 25, 08, ostack(i) ** suna 56 ..r36 suna 57 .+s66. suna 58 +* oname(i) = .f. 01, 16, ostack(i) ** suna 59 +* omark(i) = .f. 17, 08, ostack(i) ** suna 60 +* oprec(i) = .f. 25, 08, ostack(i) ** suna 61 ..s66 62 63 end nameset; 64 65 66 countup(ostackp, ostack_lim, 'ostack'); 67 ostack(ostackp) = 0; 68 69 70 end subr stackexp; 1 .=member opprec 2 subr opprec(m); 3 4$ this routine processes binary and unary operators used in expressions. 5$ 6$ when it is called, the parser has examined the next token to see 7$ whether the operator is the prefix of an on-the-fly assignment or 8$ a compound operator. therefore the binary operator we are interested 9$ is the current last token in the token buffer, which may or may not be 10$ identical to parse_tok. rather than backing up the scanner and polish 11$ to re-parse the operator (once we found it not to be part of a more 12$ complex form) in order to reset parse_tok, we access tok_buff directly 13$ 14$ 'm' is a marker node associated with the operator. by examining 15$ m, we can tell whether the operator is binary, unary, etc. 16 17$ if the operator is unary, we simply make a new ostack entry. 18 19$ if the operator is binary, we must first emit all previous 20$ operators whose precedence with greater or equal precedence. 21$ after this we push the new operator. 22 23$ note that as we pop the stack, we never check whether it is 24$ empty. this is unnecessary since stackexp always makes an 25$ ostack entry with zero precedence. when we reach this 26$ entry, we will always quit the loop. 27 28 29 size m(ps); $ marker node 30 31 size nam(ps), $ name of operator 32 prec(ps), $ its precedence 33 bin(1); $ flags binary operator 34 35 access ostack; 36 access parsens; 37 access backup; 38 39 40 parse_tok = tok_buff(tok_out); 41 polp = polp - 1; 42 nam = name(parse_tok); 43 44$ find precedence and set 'bin' to flag binary operators. 45 46 if m = p_bin then $ system binary 47 prec = key_val(parse_tok); 48 bin = yes; 49 50 elseif m = p_un then $ system unary 51 prec = key_val(parse_tok); 52 if (lex_typ(parse_tok) = l_unbin) prec = prec_un; 53 54 bin = no; 55 56 elseif m = p_ubin then $ user-defined binary 57 prec = prec_ubin; 58 bin = yes; 59 60 else $ user unary 61 prec = prec_un; 62 bin = no; 63 end if; 64 65$ if this is a binary operator, pop all previous operators with 66$ greater or equal precedence. 67 if bin then 68 while oprec(ostackp) >= prec; 69 putp(oname(ostackp), pol_name); $ write name 70 putp(omark(ostackp), pol_mark); $ write marker 71 72 ostackp = ostackp-1; $ pop stack 73 end while; 74 end if; 75 76$ push new operator. 77 countup(ostackp, ostack_lim, 'ostack'); 78 oname(ostackp) = nam; 79 omark(ostackp) = m; 80 oprec(ostackp) = prec; 81 82 83 end subr opprec; 1 .=member popop 2 subr popop; 3 4$ this routine is called after we have seen a user defined niladic 5$ operator. when we first parsed the operator, we assumed it was 6$ unary, and pushed it onto ostack. we now pop it, and write it 7$ onto the polish string. 8 9 10 access ostack; 11 12 13 putp(oname(ostackp), pol_name); 14 15 ostackp = ostackp - 1; 16 17 18 end subr popop; 1 .=member popop1 2 subr popop1; 3 4$ this routine is like popop, but simply throws away the top 5$ ostack entry. 6 7 8 access ostack; 9 10 11 ostackp = ostackp - 1; 12 13 14 end subr popop1; 1 .=member popexp 2 subr popexp; 3 4$ this routine is called at the end of each expression. we pop 5$ and emit all the remaining operators in the expression, then 6$ pop the zero word left by 'stackexp'. 7 8 9 access ostack; 10 11 12 while ostack(ostackp) ^= 0; 13 putp(oname(ostackp), pol_name); $ write name 14 putp(omark(ostackp), pol_mark); $ write marker 15 16 ostackp = ostackp - 1; $ pop stack 17 end while; 18 19 ostackp = ostackp - 1; $ pop zero word 20 21 22 end subr popexp; 1 .=member opentoks 2 subr opentoks(iter, skip); 3 4$ this routine is called at the start of each control statement. 5$ it collects information which is necessary to process 'end', 6$ 'quit', and 'cont' statements. 7 8$ 1. a counter 'open_count' giving the number of control 9$ statements which are currently open. 10 11$ 2. a bit vector 'is_iter' which indicates which control 12$ statements are iterators. this bit vector is indexed 13$ by values of open_count. 14 15$ 3. an array 'opntoks' containing the first few tokens 16$ of each control statement. 17 18$ at this point we increment open_count, set is_iter, and 19$ collect tokens in opntoks. we collect tokens by calling 20$ gettok. when we are done we restore the token buffer to 21$ its original status so that the tokens we have saved will 22$ be passed onto the parser. 23 24$ the parameters to the routine are: 25 26$ iter: indicates that we are opening an iterator 27$ skip: indicates that we are to skip the current token 28 29 30 nameset opntoks; $ nameset used to save tokens for openers 31 32 +* save_lim = 05 ** $ tokens saved/opener 33 +* open_lim = 20 ** $ maximum openers 34 +* opntoks_lim = (save_lim * open_lim) ** $ opntoks limit 35 36 size open_count(ps); $ number of openers 37 data open_count = 0; 38 39 +* is_iter(p) = .f. p, 1, itervect ** 40 size itervect(open_lim); 41 42 size opntoks(ps); $ array of tokens 43 dims opntoks(opntoks_lim); 44 45 size opntoksp(ps); $ pointer to last saved token 46 data opntoksp = 0; 47 48 end nameset; 49 50 size iter(1), $ indicates opening iterator 51 skip(1); $ indicates skipping current token 52 smfb 146 size tok(ps); $ last token read smfb 147 size temp(ps); $ saved value of 'tok_out' smfb 148 size j(ps); $ loop index 55 56 size gettok(ps); $ scanner function 57 58 access backup; 59 60 61$ increment open_count and set is_iter 62 63 countup(open_count, opntoks_lim, 'openers'); 64 is_iter(open_count) = iter; 65 66 temp = tok_out; $ save place in token buffer 67 68 if ^ skip then $ save current token 69 countup(opntoksp, opntoks_lim, 'opntoks'); 70 opntoks(opntoksp) = tok_buff(tok_out); 71 end if; 72 73$ save remaining tokens 74 do j = 1 to save_lim - 1 + skip; 75 countup(opntoksp, opntoks_lim, 'opntoks'); 76 smfb 149 tok = tok_buff(tok_out); smfb 150 smfb 151 if tok = sym_semi ! tok = sym_then ! smfb 152 tok = sym_rp ! tok = sym_do then smfb 153 $ note that since we do not consume a new token from the smfb 154 $ input, 'tok_buff(tok_out)' and 'tok' will not change, smfb 155 $ and this loop indeed will set all remaining entries of smfb 156 $ 'opntoks' to zero. smfb 157 opntoks(opntoksp) = 0; 81 else 82 opntoks(opntoksp) = gettok(0); 83 end if; 84 end do; 85 86 tok_out = temp; $ reset token buffer 87 88 89 end subr opentoks; 1 .=member endtoks 2 subr endtoks; 3 4$ this routine is called after seeing the keyword 'end'. we compare 5$ the tokens which follow it with those of the most recent opener. 6 7$ on exit the token buffer is positioned so that the next token 8$ will be the semicolon. 9 10 11 size toks(ps); $ array of tokens after '.end' 12 dims toks(save_lim); 13 14 size toksp(ps), $ pointer to last entry in toks 15 tok(ps), $ current token 16 org(ps), $ origin in opntoks 17 lev(ps), $ level of enders 18 temp1(ps), $ entry value of tok_out 19 temp2(ps), $ temporary used for swap 20 i(ps), $ loop index 21 j(ps); $ loop index 22 23 size gettok(ps); $ scanner routine 24 25 access backup, opntoks, parsens; 26 27 28 temp1 = tok_out; $ save place in token buffer 29 30 toksp = 0; $ collect tokens to next semicolon 31 while 1; smfb 158 tok = gettok(0); if (eof_flag) call eofr; smfb 159 if (tok = sym_semi) quit; smfb 160 if tok = sym_then ! tok = sym_rp ! tok = sym_do then smfb 161 until tok = sym_semi; smfb 162 tok = gettok(0); if (eof_flag) call eofr; smfb 163 end until; smfb 164 quit while; smfb 165 end if; 36 37 if toksp < save_lim then $ save token 38 toksp = toksp + 1; 39 toks(toksp) = tok; 40 end if; 41 end while; 42 43 tok_out = tok_out - 1; $ return semicolon 44 if (tok_out = 0) tok_out = tok_lim; 45 46 if toksp = 0 then $ no tokens, 47 call poptoks; $ reset opntoksp, open_count 48 return; $ and return. 49 end if; 50 51 $ iterate over the openers, looking for a match. 52 53 lev = 0; $ nesting level 54 do i = open_count to 1 by -1; 55 lev = lev + 1; 56 org = (i-1) * save_lim; $ origin in opntoks 57 58 do j = 1 to toksp; $ compare tokens 59 if (toks(j) ^= opntoks(org+j)) cont do i; 60 end do; 61 62 go to found; 63 end do; 64 65 temp2 = tok_out; 66 tok_out = temp1; 67 68 call ermsg(15, 0); 69 70 tok_out = temp2; 71 call poptoks; 72 73 return; 74 75 76/found/ 77 78 call poptoks; $ reset opntoksp, open_count 79 80 if (lev = 1) return; $ found correct match 81 82$ at this point we found a match, but are missing one or more 83$ -.end-s. we recover by pretending we have not yet seen the 84$ current -.end-. 85 86 tok_out = temp1; 87 88 call ermsg(16, 0); 89 90 tok_out = tok_out - 1; 91 if (tok_out = 0) tok_out = tok_lim; 92 93 parsep = parseerrloc; 94 95 return; 96 97 98 end subr endtoks; 1 .=member poptoks 2 subr poptoks; 3 4$ this routine is called at the end of a control statement, after we 5$ have made any necessary checks for matching tokens. it pops 6$ opntoks and resets open_count. 7 8 9 access opntoks; 10 11 12 opntoksp = opntoksp - save_lim; 13 open_count = open_count - 1; 14 15 16 end subr poptoks; 1 .=member chktoks 2 subr checktoks(dft); 3 4$ this routine compares the tokens after the keywords 'cont' and 5$ 'quit' to find out which loop they refer to. 6 7$ we write out a counter to indicate which loop to cont(quit). 8$ this is done as follows: 9 10$ 1. if the keyword cont(quit) is followed by a semicolon, we 11$ write out a default, given by the parameter 'dft'. 12 13$ 2. otherwise we search opntoks starting with the most recent 14$ entry until we find one which matches, keeping a counter 15$ of the number of loop entries we look at, then use this 16$ counter. 17 18$ on exit the token buffer is positioned so that the next token 19$ will be the semicolon. 20 21 22 size dft(ps); $ default counter value 23 24 size toks(ps); $ array of tokens after 'cont' 25 dims toks(save_lim); 26 27 size toksp(ps), $ pointer to last entry in toks 28 tok(ps), $ current token 29 org(ps), $ origin in opntoks 30 lev(ps), $ level of loop nesting 31 i(ps), $ loop index 32 j(ps); $ loop index 33 34 size gettok(ps); $ scanner routine 35 36 access opntoks; 37 38 39$ we begin by collecting tokens up to a semicolon, and putting them 40$ in the array 'toks'. 41 42 toksp = 0; 43 44 while 1; smfb 166 tok = gettok(0); if (eof_flag) call eofr; smfb 167 if (tok = sym_semi) quit; smfb 168 if tok = sym_then ! tok = sym_rp ! tok = sym_do then smfb 169 until tok = sym_semi; smfb 170 tok = gettok(0); if (eof_flag) call eofr; smfb 171 end until; smfb 172 quit while; smfb 173 end if; 49 50 if toksp < save_lim then $ save token 51 toksp = toksp + 1; 52 toks(toksp) = tok; 53 end if; 54 end while; 55 56$ return semicolon 57 call back(1, 0); 58 59$ if there were no tokens, write out a zero onto the polish string and 60$ return. 61 62 if toksp = 0 then smfa 87 until 1; smfa 88 do i = open_count to 1 by -1; smfa 89 if (is_iter(i)) quit until 1; smfa 90 end do; smfa 91 call ermsg(21, 0); smfa 92 end until; 63 putp(dft, pol_count); 64 return; 65 end if; 66 67$ otherwise iterate over the openers, looking for a match. 68 69 lev = 0; $ nesting level of loops 70 71 do i = open_count to 1 by -1; 72 if (^ is_iter(i)) cont; 73 74 lev = lev + 1; 75 org = (i-1) * save_lim; $ origin in opntoks 76 77 do j = 1 to toksp; $ compare tokens 78 if (toks(j) ^= opntoks(org+j)) cont do i; 79 end do; 80 81$ found match. write out count and return. 82 83 putp(lev, pol_count); 84 85 return; 86 end do; 87 88$ no match found - issue message and write out a zero. 89 90 call ermsg(17, 0); 91 92 putp(dft, pol_count); 93 94 95 end subr checktoks; 1 .=member svtab 2 trace entry; 3 4 subr svtab; 5 6$ this routine saves the values of mttab, namesp, and symtabp 7$ so that we can later free the space used by local names. 8$ of the scope. 9 10 nameset svtabns; 11 +* sv_lim = 10 ** $ dimension of sv. 12 smfb 176 .+r32 size sv(96); suna 62 .+r36 size sv(72); smfb 177 .+s66 size sv(60); 19 dims sv(sv_lim); 20 21 .+s66. 22 +* sv_symtabp(i) = .f. 01, 16, sv(i) ** 23 +* sv_namesp(i) = .f. 17, 16, sv(i) ** 24 +* sv_mtabp(i) = .f. 33, 16, sv(i) ** 25 ..s66 26 smfb 178 .+r32. 34 +* sv_symtabp(i) = .f. 01, 32, sv(i) ** 35 +* sv_namesp(i) = .f. 33, 32, sv(i) ** 36 +* sv_mtabp(i) = .f. 65, 32, sv(i) ** smfb 179 ..r32 43 suna 63 .+r36. 45 +* sv_symtabp(i) = .f. 01, 18, sv(i) ** 46 +* sv_namesp(i) = .f. 19, 18, sv(i) ** 47 +* sv_mtabp(i) = .f. 37, 18, sv(i) ** suna 64 ..r36 55 56 57 size svp(ps); 58 data svp = 0; 59 end nameset; 60 61 countup(svp, sv_lim, 'sv'); 62 63 sv_symtabp(svp) = symtabp; 64 sv_namesp(svp) = namesp; 65 sv_mtabp(svp) = mtabp; 66 68 69 end subr svtab; 1 .=member retab 2 subr retab; 3 4$ this routine resets mttabp, namesp, and symtabp to their saved 5$ values, thus freeing a block of table space. 6 7 access svtabns; 8 9 size j(ps), $ loop index 10 p(ps); $ symtab pointer 11 12 symtabp = sv_symtabp(svp); 13 namesp = sv_namesp(svp); 14 mtabp = sv_mtabp(svp); 15 16 svp = svp - 1; 17 18$ delete all the freed symtab entries from the clash lists. 19 do j = 1 to heads_lim; 20 p = heads(j); 21 22 while p > symtabp; 23 p = link(p); 24 end while; 25 26 heads(j) = p; 27 end do; 28 29 return; 30 31 end subr retab; 1 .=member puttabs 2 subr puttbs; 3 4$ this routine writes out both polish strings. 5 6 if (sd_flag) call sdump; $ dump symbol table 7 8 call puttab; $ write polish string 9 call putxtab; $ write xpolish string 10 11 12 end subr puttbs; 1 .=member puttab 2 subr puttab; 3 4$ this routine writes out the polish string. the polish string has 5$ a somewhat different external format from its internal one. 6$ markers and counters are written out exactly as they are stored 7$ in core howwever names have: 8 9$ pol_typ: pol_name 10$ pol_val: number of words in 'names' entry 11 12$ followed by the actual names entry. 13 14$ puttab may be called while we are saving parser states. in this 15$ case we write out the part of the string we are not saving, move 16$ the remaining nodes towards the front of the string and adjust 17$ the pointers from sstack to the polish string. if we cannot write 18$ anything out then the string is full, and we abort. 19 20 size last(ps), $ last node we can write 21 j(ps), $ loop index 22 nam(ps), $ names pointer 23 words(ps); $ number of words 24 25 access backup; 26 27 28 if (pd_flag) call pdump; 29 30$ find last node to write 31 32 if sstackp = 0 then 33 last = polp; 34 35 else 36 last = s_polp(1)-1; 37 if (last = 0) call overfl('polish'); 38 end if; 39 .-qp. 40 41$ write string 42 do j = 1 to last; 43 if pol_typ(j) = pol_name then 44 nam = pol_val(j); 45 words = n_sorg(nam)/ws; 46 47 pol_val(j) = words; 48 49 write pol_file, 50 polish(j), names(nam) to names(nam+words-1); 51 52 else 53 write pol_file, polish(j); 54 end if; 55 end do; 56 57$ move all remaining nodes to front of string then adjust sstack 58 do j = last+1 to polp; 59 polish(j-last) = polish(j); 60 end do; 61 62 do j = 1 to sstackp; 63 s_polp(j) = s_polp(j) - last; 64 end do; 65 66 ..qp 67 polp = polp - last; $ reset polish pointer 68 69 70 end subr puttab; 71 72 notrace entry; 73 1 .=member putxtab 2 subr putxtab; 3 4$ this routine writes out the xpolish string. the xpolish string has 5$ a somewhat different external format from its internal one. 6$ markers and counters are written out exactly as they are stored 7$ in core howwever names have: 8 9$ xpol_typ: pol_name 10$ xpol_val: number of words in 'names' entry 11 12$ followed by the actual names entry. 13 14$ putxtab may be called while we are saving parser states. in this 15$ case we write out the part of the string we are not saving, move 16$ the remaining nodes towards the front of the string and adjust 17$ the pointers from sstack to the xpolish string. if we cannot write 18$ anything out then the string is full, and we abort. 19 20 size last(ps), $ last node we can write 21 j(ps), $ loop index 22 nam(ps), $ names pointer 23 words(ps); $ number of words 24 25 access backup; 26 27 28 if (pd_flag) call xpdump; 29 30$ find last node to write 31 32 if sstackp = 0 then 33 last = xpolp; 34 35 else 36 last = s_xpolp(1)-1; 37 if (last = 0) call overfl('xpolish'); 38 end if; 39 40 .-qp. 41$ write string 42 do j = 1 to last; 43 if xpol_typ(j) = pol_name then 44 nam = xpol_val(j); 45 words = n_sorg(nam)/ws; 46 47 xpol_val(j) = words; 48 49 write xpol_file, 50 xpolish(j), names(nam) to names(nam+words-1); 51 52 else 53 write xpol_file, xpolish(j); 54 end if; 55 end do; 56 57$ move all remaining nodes to front of string then adjust sstack 58 do j = last+1 to xpolp; 59 xpolish(j-last) = xpolish(j); 60 end do; 61 62 ..qp 63 do j = 1 to sstackp; 64 s_xpolp(j) = s_xpolp(j) - last; 65 end do; 66 67 xpolp = xpolp - last; $ reset aux. polish pointer 68 69 70 end subr putxtab; 71 72 notrace entry; 73 1 .=member symsds 2 fnct symsds(p); 3 4$ this routine returns the name of a symbol table entry as a self 5$ defining string. 6 7 8 size p(ps); $ symbol table pointer 9 10 size symsds(sds_sz), 11 namesds(sds_sz); $ gets string from names ptr 12 13 14 if p = 0 then 15 symsds = ''; 16 else 17 symsds = namesds(name(p)); 18 end if; 19 20 21 end fnct symsds; 1 .=member namesds 2 fnct namesds(nam); 3 4$ this routine converts a names entry to an sds string. 5 6 size nam(ps); $ pointer to names entry 7 8 size namesds(sds_sz); 9 10 size j(ps), $ loop index 11 words(ps); $ number of words in names entry 12 13 if nam = 0 then $ missing names entry 14 namesds = ''; 15 return; 16 end if; 17 18 words = n_sorg(nam)/ws; 19 20 do j = 0 to words-1; 21 .f. 1+j*ws, ws, namesds = names(nam+j); 22 end do; 23 24 25 end fnct namesds; 1 .=member ermsg 2 subr ermsg(n, nam); 3 4$ this routine prints lexical error messages. 'n' is the message 5$ number and 'nam' is an optional symbol table pointer to the 6$ name which caused the error. 7 8 9 size n(ps), $ error number 10 nam(ps); $ symtab pointer 11 12 size string(.sds. 80); $ message text 13 14 smfa 93 go to case(n) in 1 to 21; 16 17 +* et(n, msg) = 18 /case(n)/ string = msg; go to esac; 19 ** 20 21 et(01, 'macro argument list to begin with -(-'); 22 et(02, 'to have proper number of arguments'); 23 et(03, 'to be valid macro name'); 24 et(04, 'to be defined only once'); 25 et(05, '-(- in macro parameter list'); 26 et(06, 'to appear only once in macro parameter list'); 27 et(07, 'only one -;- in macro parameter list'); 28 et(08, '-,- in macro parameter list'); 29 et(09, '-;- after -endm-'); 30 et(10, '-endm-. macro exceeds maximum length'); 31 et(11, 'macro not to occur within itself'); 32 et(12, 'to be a valid macro name'); 33 et(13, '-,- in macro list'); 34 et(14, 'valid starting character for token'); 35 et(15, 'matching tokens after -end-.') 36 et(16, 'matching tokens after -end-; attempt to recover ' .cc. 37 'by inserting missing -end-(s).') 38 et(17, 'matching tokens after -quit- or -cont-'); 39 et(18, 'token to be less than 128 characters'); 40 et(19, 'end of file after last program member'); 41 et(20, '= after +: or -:.'); smfa 94 et(21, 'open loop to -continue- or -quit-') 42 43 44/esac/ 45 46 call put_err(n, nam, string); 47 48 49 macdrop(et) 50 51 end subr ermsg; 1 .=member eofr 2 subr eofr; 3 4$ this routine is called when the end of the input file is 5$ encounterd during a macro definition, etc. 6 7$ we print an error message and abort 8 9 call put_card; $ print current line 10 put, skip; $ skip to next line 11 call contlpr(27, yes); $ start to echo to the terminal 12 .+s10 put, '?'; $ emit s10 error message marker 13 .+s20 put, '?'; $ emit s10 error message marker 14 put, $ print message 15 column(5), 16 '*** compilation terminated by unexpected end-of-file ***', 17 skip; 18 call contlpr(27, no); $ stop to echo to the terminal 19 20 call trcbak; $ print current token stream 21 22 call prstrm(yes); 23 24 end subr eofr; 1 .=member ermet 2 subr ermet(n); 3 4$ this rouine processes syntactic errors. it prints an error message 5$ then adjusts the parser in an attempt to recover from the error. 6 7$ the routine begins with a 'case' statement which maps the 8$ error number into a message and a recovery mode. at the end of 9$ the 'case' statement we print the message and perform the 10$ appropriate recovery actions. 11 12$ there are 4 error recovery modes: 13 14$ 1. standard errors 15 16$ here we simply skip to the next semicolon and branch to 17$ the grammar's production. 18 19$ 2. errors in expressions 20 21$ these are treated like standard errors, except that we stop 22$ when we see one of ';', ')', ']', '\', 'then', or 'end'. 23 24$ 3. errors in declarations. 25 26$ these generally occor in the context 27 28$ (1) = 29$ (2) = 30 31$ when we fail to recognize a declaration we think we have 32$ found an entire , return to line (1), and look 33$ for a . if we don't find a we advance the token 34$ file to the next declaration keyword and backup the 35$ parser to search for a . 36 37$ 4. special errors 38 39$ these are errors which are handled by separate error 40$ recovery routines, so we simply return. 41 42 .=zzyorg z 43 44 defc(rec_std) 45 defc(rec_nst) 46 defc(rec_end) 47 defc(rec_exp) 48 defc(rec_dcl) 49 defc(rec_spc) 50 51 52 size n(ps); $ error number 53 54 size string(.sds. 80), $ message text 55 mode(ps), $ recovery mode 56 tok(ps); $ next token 57 58$ the array 'enders' gives all tokens which can end an expression. 59 60 +* enders_lim = 09 ** $ number of enders 61 62 size init(1); $ initilization flag 63 data init = no; 64 65 66 size enders(.sds. 5); 67 dims enders(enders_lim); 68 69 size j(ps), $ loop index 70 str(sds_sz); $ next token as string 71 72 data enders = 73 ')', '>>', '/)', ',', ';', 'then', 'elseif', 'else', 'end'; 74 75 +* okey_lim = 11 ** 76 size okeywds(.sds. 9); 77 dims okeywds(okey_lim); 78 data okeywds = 'procedure', 'proc', 'operator', 'op', 79 'case', 'else', 'elseif', 80 'end', '(', 'loop', 'if'; 81 82 83 size symsds(sds_sz), $ returns name of token 84 hashlit(ps), $ hashes literals 85 gettok(ps); $ scanner interface 86 87 access parsens, 88 opntoks, 89 backup; 90 91 92 if ^ init then $ initialize tables 93 init = yes; 94 95 do j = 1 to enders_lim; 96 enders(j) = hashlit(enders(j)); 97 end do; 98 do j = 1 to okey_lim; 99 okeywds(j) = hashlit(okeywds(j)); 100 end do; 101 end if; 102 103 go to case(n) in 0 to 98; 104 105 +* et(n, m, str) = 106 /case(n)/ 107 mode = m; 108 string = str; 109 go to esac; 110 ** 111 112 et(00, rec_std, 'valid error message') 113 et(01, rec_std, 'semicolon') 114 et(02, rec_std, 'left parenthesis') 115 et(03, rec_std, 'right parenthesis') 116 et(04, rec_std, 'left set brace') 117 et(05, rec_std, 'right set brace') 118 et(06, rec_std, 'left tuple bracket') 119 et(07, rec_std, 'right tuple bracket') 120 et(08, rec_std, 'colon') 121 et(09, rec_std, 'assignment operator') 122 et(10, rec_std, 'dash after directory name') 123 et(11, rec_std, 'directory name in directory statement') 124 et(12, rec_std, 'directory or program name in program statement') 125 et(13, rec_std, 'program name in program statement') 126 et(14, rec_std, 'library name in library statement') 127 et(15, rec_std, 'directory name in module statement') 128 et(16, rec_std, 'module name in module statement') 129 et(17, rec_std, 'main program description in directory') 130 et(18, rec_std, 'routine definition') 131 et(19, rec_std, 'namelist after access right') 132 et(20, rec_std, 'valid variable number of arguments option') 133 et(21, rec_std, 'procedure description') 134 et(22, rec_std, 'directory name in main program description') 135 et(23, rec_std, 'program name in program description') 136 et(24, rec_std, 'directory name in module description') 137 et(25, rec_std, 'module name in module description') 138 et(26, rec_nst, 'main program.') 139 et(27, rec_std, 'name in declaration') 140 et(28, rec_std, 'constant right-hand side in declaration') 141 et(29, rec_std, 'keyword -back-') 142 et(30, rec_std, 'mode descriptor') 143 et(31, rec_std, 'mode name') 144 et(32, rec_std, '-base- to follow -plex-') 145 et(33, rec_std, 'list of base names') 146 et(34, rec_std, 'mode descriptor after base type') 147 et(35, rec_std, 'valid range definition for integer mode') 148 et(36, rec_std, 'base name after -elmt-') 149 et(37, rec_std, 'mode descriptor for tuple range') 150 et(38, rec_std, 'constant length for homogeneous tuple') 151 et(39, rec_std, 'mode descriptor for set elements') 152 et(40, rec_std, 'mode descriptor for multi-valued map range') 153 et(41, rec_std, 'mode descriptor for map domain elements') 154 et(42, rec_std, 'procedure name') 155 et(43, rec_std, 'operator name') 156 et(44, rec_std, 'formal parameter name') 157 et(45, rec_std, 'statement') 158 et(46, rec_std, 'boolean expression in assertion') 159 et(47, rec_std, 'right-hand side') 160 et(48, rec_std, 'restricted expression after -from-, ...') 161 et(49, rec_std, 'expression after -case-') 162 et(50, rec_nst, 'keyword -of-') 163 et(51, rec_std, 'case tag') 164 et(52, rec_std, 'constant expression after -arb- in case tag') 165 et(53, rec_std, 'debug option') 166 et(54, rec_std, 'label name after -goto-') 167 et(55, rec_std, 'boolean expression after -if-') 168 et(56, rec_nst, 'keyword - then') 169 et(57, rec_std, 'boolean expression after -elseif-') 170 et(58, rec_nst, 'loop iterator') 171 et(59, rec_nst, 'keyword -do- or -)-') smfb 180 et(60, rec_std, 'expression or semicolon after -return-') 173 et(61, rec_std, 'trace option') 174 et(62, rec_std, 'expression after -yield-') 175 et(63, rec_std, 'keyword -end-') 176 et(64, rec_exp, 'expression after assignment operator') 177 et(65, rec_exp, 'valid expression after -from-, ...') 178 et(66, rec_exp, 'factor after compound operator') 179 et(67, rec_exp, 'term after assigning binary operator') 180 et(68, rec_exp, 'term after binary operator') 181 et(69, rec_exp, 'iterator list') 182 et(70, rec_exp, 'constraint after iterator list') 183 et(71, rec_exp, 'term after unary operator') 184 et(72, rec_exp, 'boolean expression after keyword -case-') 185 et(73, rec_exp, 'keyword -of-') 186 et(74, rec_exp, 'case tag') 187 et(75, rec_exp, 'keyword -else-') 188 et(76, rec_exp, 'expression after -else-') 189 et(77, rec_exp, 'expression after case tag') 190 et(78, rec_exp, 'boolean expression after keyword -if-') 191 et(79, rec_exp, 'keyword -then-') 192 et(80, rec_exp, 'expression after keyword -then-') 193 et(81, rec_exp, 'boolean expression after keyword -elseif-') 194 et(82, rec_exp, 'expression') 195 et(83, rec_exp, 'left-hand side or dash') 196 et(84, rec_exp, 'expression as index or routine parameter') 197 et(85, rec_std, 'iterator') 198 et(86, rec_std, 'boolean expression after keyword -while-') 199 et(87, rec_std, 'boolean expression after keyword -where-') 200 et(88, rec_std, 'boolean expression after keyword -until-') 201 et(89, rec_std, 'name') 202 et(90, rec_std, 'valid block structure') 203 et(91, rec_std, 'valid statement') 204 et(92, rec_nst, 'valid procedure definition.') 205 et(93, rec_std, 'valid data structure representation statement') 206 et(94, rec_spc, 'colon in assignment') 207 et(95, rec_exp, '-/- after compound operator') 208 et(96, rec_std, 'operator name') 209 et(97, rec_std, 'integer or real denotation') 210 et(98, rec_end, 'keyword -end-' ) 211 212 213/esac/ 214 215 call put_err(n, 0, string); $ print message 216 217 go to rcase(mode) in rec_std to rec_spc; 218 219/rcase(rec_std)/ $ standard error 220 221$ advance the scanner to the next semicolon and set the parser to 222$ . 223 224 until tok = sym_semi; 225 tok = gettok(0); 226 if (eof_flag) call eofr;; 227 end until; 228 229 parsep = parseerrloc; 230 231 return; 232 233/rcase(rec_nst)/ 234 235 $ scan to next semicolon, not past -okey- statement. 236 until tok = sym_semi; 237 tok = gettok(0); if (eof_flag) call eofr; 238 239 do j = 1 to okey_lim; 240 if tok = okeywds(j) then 241 call back(1,0); quit until; 242 end if; 243 end do; 244 end until; 245 246 $ parser next looks for a statement block 247 parse_ok = yes; parsep = parsep + 1; return; 248 249/rcase(rec_end)/ 250 251 $ error recovery for missing end statement. 252 $ since the current token might begin a new procedure, must 253 $ not scan past. 254 call contlpr(27, yes); $ start to echo to the terminal 255 call put_card; 256 put,column(14), 'closing statment '; 257 do j = 1 to save_lim; 258 put: symsds(opntoks(opntoksp-save_lim + j)), a, x(1); 259 end do; 260 put, skip; 261 call contlpr(27, no); $ stop to echo to the terminal 262 263 until tok = sym_semi; 264 tok = gettok(0); if (eof_flag) call eofr; 265 do j = 1 to 4; 266 if tok = okeywds(j) then 267 call back(1, 0); quit until; 268 end if; 269 end do; 270 end until; 271 272 call poptoks; $ since we are closing an opener 273 parsep = parseerrloc; 274 return; 275 276/rcase(rec_exp)/ $ error in expression 277 278 while 1; 279 tok = gettok(0); 280 if (eof_flag) call eofr; 281 282 do j = 1 to enders_lim; 283 if (tok = enders(j)) quit while; 284 end do; 285 end while; 286 287 tok_out = tok_out-1; $ return last token 288 if (tok_out = 0) tok_out = tok_lim; 289 290 parsep = parseerrloc; 291 292 return; 293 294/rcase(rec_dcl)/ $ error in declaration 295 296 while 1; $ advance token file 297 tok = gettok(0); 298 299 if (lex_typ(tok) = l_dkey) quit; 300 if (eof_flag) call eofr; 301 end while; 302 303 tok_out = tok_out-1; $ return last token 304 if (tok_out = 0) tok_out = tok_lim; 305 306$ back up parser two operations. 307 308 j = 0; 309 310 until j = 2; 311 parsep = parsep-1; 312 if (pt_op(parsep) = po_sub) j = j+1; 313 end until; 314 315 putp(p_error, pol_mark); 316 317 return; 318 319/rcase(rec_spc)/ $ simply return 320 321 322 323 macdrop(et) 324 325 end subr ermet; 1 .=member puterr 2 subr put_err(n, nam, str); 3 4$ this routine prints error messages. error messages have the 5$ form: 6 7$ *** error n: expect nam str *** 8 9$ nam is either zero or a symbol table pointer, while str is a 10$ string. 11 12$ we print a maximum of 1 error message per line of input. 13 14 size n(ps), $ error number 15 nam(ps), $ symbol table pointer 16 str(sds_sz); $ error message text 17 18 size j(ps), $ loop index 19 p(ps); $ pointer into token buffer 20 21 size cur_tok(ps), $ current token 22 tok_sds(sds_sz), $ current token as sds 23 tok_len(ps), $ length of cur_tok 24 line_pos(ps), $ position in current line 25 mrk_line(sds_sz), $ underline string 26 mrk_flag(1); $ on if tok_out <= p <= tok_in 27 28 size err_line(ps); $ line number of last error 29 data err_line = 0; 30 31 size symsds(sds_sz); $ returns symbol as sds 32 33 access backup, 34 nstrulex; 35 36 37 if (err_line = line_no) return; 38 err_line = line_no; 39 40 call contlpr(27, yes); $ start to echo to terminal 41 42 call put_card; $ print last line 43 44 .+s10 put, '?'; $ emit standard s10 error character 45 .+s20 put, '?'; $ emit standard s10 error character 46 47 put, column(5), '******** error ': n, i, ': expect '; 48 if (nam ^= 0) put: symsds(nam), a, x(1); 49 put: str, a; 50 if (m_flag) put, column(19+cpc), '*****'; 51 put, skip; 52 53 call trcbak; $ print current token stream 54 55 error_count = error_count + 1; 56 57 if error_count > pel then 58 put, skip, '**** parse error limit exceeded ***', skip; 59 call prstrm(yes); 60 end if; 61 62 call contlpr(27, no); $ stop to echo to terminal 63 64 65 end subr put_err; 1 .=member trcbak 2 subr trcbak; 3$ 4$ this routine prints the input token trace back. 5$ 6$ we print the token stream from -tok_out- minus -etok_lim- through 7$ -tok_in-, if there are -etok_lim- tokens before -tok_out- in the 8$ buffer. we underline each token from -tom_out- through -tok_in, 9$ using an equal sign for reserved, and a minus sign for unre- 10$ served tokens. 11$ 12 size cur_tok(ps); $ current token 13 size tok_sds(sds_sz); $ current token as sds 14 size tok_len(ps); $ length of cur_tok 15 size line_pos(ps); $ position in current line 16 size mrk_line(sds_sz); $ underline string 17 size mrk_flag(1); $ on if tok_out <= p <= tok_in 18 19 size j(ps); $ loop index 20 size p(ps); $ pointer into token buffer 21 22 size symsds(sds_sz); $ returns symbol as sds 23 24 access backup; 25 access nstrulex; 26 27 28 if (tok_out = 0) return; $ no tokens in buffer 29 30 call contlpr(27, yes); $ start to echo to terminal 31 32 p = tok_out; 33 do j = 1 to etok_lim + 1; 34 p = p - 1; 35 if (p = 0) p = tok_lim; 36 37 if p = tok_in then $ less than -etok_lim- tokens in buffer 38 p = p + 1; if (p > tok_lim) p = 1; 39 end if; 40 end do; 41 42 put, column(14), 'parsing: '; line_pos = 23; 43 mrk_line = '' .pad. 9; $ nb. .len. tok_line = 9 44 mrk_flag = no; 45 46 until p = tok_in; 47 p = p + 1; 48 if (p > tok_lim) p = 1; 49 50 cur_tok = tok_buff(p); 51 tok_sds = symsds(cur_tok); 52 tok_len = .len. tok_sds; 53 54 if (mrk_flag = yes) & (tok_len = 0) then 55 tok_sds = '?????'; tok_len = 5; 56 end if; 57 58 if line_pos + tok_len + 1 > 132 then 59 put, skip, column(14): mrk_line, a, skip, column(14); 60 line_pos = 14; 61 62 mrk_line = ''; 63 end if; 64 65 put, ' ': tok_sds, a; line_pos = line_pos + 1 + tok_len; 66 67 if mrk_flag then $ underline blank 68 mrk_line = mrk_line .cc. '-'; 69 else 70 mrk_line = mrk_line .cc. ' '; 71 end if; 72 73 if (p = tok_out) mrk_flag = yes; 74 75 do j = 1 to tok_len; $ underline token 76 if mrk_flag then 77 if lex_typ(cur_tok) > l_string ! 78 lit_val(cur_tok) ^= 0 then 79 mrk_line = mrk_line .cc. '='; $ reserved token 80 else 81 mrk_line = mrk_line .cc. '-'; $ unreserved token 82 end if; 83 else 84 mrk_line = mrk_line .cc. ' '; 85 end if mrk_flag; 86 end do j; 87 end until; 88 89 put, skip, column(14): mrk_line, a, skip; 90 91 call contlpr(27, no); $ stop to echo to terminal 92 93 94 end subr trcbak; 1 .=member chkprs 2 subr chkprs; 3 4$ this routine is called when we have parsed the sentence symbol. 5 6$ we check whether we are at the end of the input file. if so, we 7$ finished parsing, and merely return. otherwise, some error must 8$ have caused us to assume an incorrect block structure; we recover 9$ by restarting the parser with the production 10 11 12 size tok(ps), $ next token 13 j(ps); $ loop index 14 15 size first(1); $ on if aleady initialized 16 data first = yes; 17 18 size endnames(.sds. 10); $ 'end' tokens as strings 19 dims endnames(17); 20 21 data endnames = 22 'procedure', 'proc', 'operator', 'op', 23 'end', '(', ')', 'do', ',', 'else', 'elseif', 24 'doing', 'while', 'where', 'step', 'until', 'term'; 25 26 size endtoks(ps); 27 dims endtoks(17); 28 29 size hashlit(ps), $ hashes a literal 30 gettok(ps); $ returns the next token 31 32 access parsens, backup; 33 34 35 if first then $ initialize endtoks 36 first = no; 37 38 do j = 1 to 17; 39 endtoks(j) = hashlit(endnames(j)); 40 end do; 41 end if; 42 43 tok = gettok(0); $ check for end-of-file 44 if (eof_flag) return; 45 46 do j = 1 to 17; 47 if tok = endtoks(j) then $ found unexpected end, ... 48 call ermet(91); 49 tp_flag = yes; $ terminate after parse 50 $ skip to next semicolon 51 tok = gettok(0); $ check for end-of-file 52 if (eof_flag) return; 53 54 quit do j; 55 end if tok; 56 end do j; 57 58 tok_out = tok_out - 1; $ return token 59 if (tok_out = 0) tok_out = tok_lim; 60 61 parsep = parseerrmploc; 62 parse_ok = yes; 63 64 65 end subr chkprs; 1 .=member chkst 2 subr chkst; 3 4$ this routine is called when the parser looks for a 5$ and fails to find one. there are two possibilities: 6 7$ 1. the next token is 'end', or some other token which can 8$ normally follow a series of statements. 9 10$ 2. otherwise we assume that the next token begins an invalid 11$ statement, in which case we issue an error message. 12 13$ a token can follow a series of statements if either: 14 15$ 1. it is a bold name. 16$ 2. it is in the set 'endtoks', below. 17 18 size tok(ps), $ next token 19 j(ps); $ loop index 20 21 size init(1); $ on if aleady initialized 22 data init = no; 23 24 size endnames(.sds. 10); $ 'end' tokens as strings 25 dims endnames(17); 26 27 data endnames = 28 '(', 29 'procedure', 'proc', 'operator', 'op', 30 'end', ')', 'do', ',', 'else', 'elseif', 31 'doing', 'while', 'where', 'step', 'until', 'term'; 32 33 size endtoks(ps); 34 dims endtoks(16); 35 36 size hashlit(ps), $ hashes a literal 37 gettok(ps); $ gets a token 38 39 access parsens, backup; 40 41 if init = no then $ initialize endtoks 42 init = yes; 43 44 do j = 1 to 17; 45 endtoks(j) = hashlit(endnames(j)); 46 end do; 47 end if; 48 49 50$ see if next token can follow a series of statements. 51 tok = gettok(0); 52 53 do j = 1 to 17; 54 if (tok = endtoks(j)) go to pass; 55 end do; 56 57/fail/ $ tok can't follow a block 58 59 call ermet(91); 60 parse_ok = yes; 61 62 return; 63 64/pass/ $ valid token 65 66 tok_out = tok_out - 1; $ return tok 67 if (tok_out = 0) tok_out = tok_lim; 68 69 return; 70 71 end subr chkst; 1 .=member errend 2 subr errend; 3 4$ this routine is called generally when an extra end 5$ is found in the program. 6 7 size savepp(ps); $ save parsep location 8 access parsens; 9 10 11 savepp = parsep; 12 call ermet(92); 13 parsep = savepp; 14 15 16 end subr errend; 1 .=member scanend 2 subr scanend; 3 4$ this routine simply scans ahead to the next semicolon after 5$ an end statement is found. it is called for error recovery. 6 7 size tok(ps); $ token 8 size gettok(ps); $ returns the next token 9 10 11 until tok = sym_semi; 12 tok = gettok(0); if (eof_flag) call eofr; 13 end until; 14 15 call back(1, 0); 16 17 18 end subr scanend; 1 .=member chkbody 1 .=member chkrepr 2 subr chkrepr; 3 4$ this routine is similar to chkbody, but is called after we 5$ fail to find a . 6 7$ 1. the next token is 'end' 8$ 2. the next token is the start of an invalid repr. 9 10 11 size tok(ps); $ next token 12 13 size gettok(ps); $ returns next token 14 15 access parsens, backup; 16 17 18 tok = gettok(0); 19 20 if tok = sym_end then 21 tok_out = tok_out - 1; 22 if (tok_out = 0) tok_out = tok_lim; 23 else 24 call ermet(93); 25 parse_ok = yes; 26 end if; 27 28 29 end subr chkrepr; 1 .=member badasn 2 subr badasn; 3 4$ this routine is called after seeing a statement of the form 5$ "x = ..." where the colon has been left out of the assignment 6$ symbol. 7 8$ we issue an error message then set up the token buffer so that the 9$ next token seen will be ':='. 10 11 12 access parsens, backup; 13 14 call ermet(75); 15 16 tok_out = tok_out - 1; 17 if (tok_out = 0) tok_out = tok_lim; 18 19 tok_buff(tok_out) = sym_asn; 20 21 22 end subr badasn; 1 .=member sdump 2 subr sdump; 3 .-qp. 4 5$ this routine dumps the symbol table. the dump is formatted 6$ in columns; every 'lines_lim' lines we print a series of 7$ column headings. 8 9$ we print the lexical types of tokens using the 'synlexmap' 10$ created by 'syn'. 11 12 13 +* lines_max = 20 ** 14 15 size j(ps), $ loop indx 16 str(sds_sz), $ token name as sds 17 tp(.sds. 20), $ lexical type as ads 18 lines(ps); $ number of lines since last dump 19 20 size init(1); $ flags first dump 21 data init = yes; 22 23 size ltyps(.sds. 20); $ array of lexical types 24 dims ltyps(l_max); 25 26 +* synlexmap(a, b) = data ltyps(b) = a; ** 27 28 .=include 'synlex' $ include map from lexical codes to names 29 30 size symsds(sds_sz); $ returns name of symbol 31 32 33$ if this is the first call, we must go over ltyps, trimming 34$ each name to five characters. 35 36 if init then 37 init = no; 38 39 do j = l_min to l_max; 40 if (.len. ltyps(j) > 5) ltyps(j) = .s. 1, 5, ltyps(j); 41 end do; 42 end if; 43 44 put, skip(2), column(7), 's y m t a b d u m p', skip(2); 45 lines = lines_max; $ to force new header 46 47 do j = 1 to symtabp; 48 lines = lines + 1; 49 50 if lines > lines_max then $ print header 51 lines = 1; 52 53 put, skip(2), column(7), 54 'index name lxtyp kyval ltval', 55 ' link morg mcode margs', 56 skip, column(7), 57 '--------------------------------------', 58 '--------------------------', 59 skip; 60 end if; 61 62 str = symsds(j); $ get token name 63 if (.len. str > 10) .len. str = 10; 64 65 if lex_typ(j) = 0 then 66 tp = '0'; 67 else 68 tp = ltyps(lex_typ(j)); 69 end if; 70 71 put, column(07): j, i, 72 column(14): str, a, 73 column(26): tp, a, 74 column(33): key_val(j), i, 75 column(40): lit_val(j), i, 76 column(47): link(j), i, 77 column(54): morg(j), i, 78 column(61): mcode(j), i, 79 column(68): margs(j), i, 80 skip; 81 82 end do; 83 84 put, skip(2), column(7), 'end symtab dump', skip(2); 85 86 87 ..qp 88 end subr sdump; 1 .=member pdump 2 subr pdump; 3 .-qp. 4 5$ this routine dumps the polish string. we print nodes one after 6$ another in standard columns. the handling of tabs is driven 7$ by an array 'postab' which contains the various tabs. 8 9 10 +* pos_lim = 4 ** $ maximum nodes/line 11 12 size postab(ps); 13 dims postab(pos_lim); 14 15 data postab = 7, 22, 37, 52; 16 17 size pos(ps), $ index into postab 18 j(ps), $ loop index 19 typ(ps), $ type of current node 20 val(ps), $ value of current node 21 str(sds_sz); $ token in sds form 22 23$ the names of the polish string markers are obtained from the 24$ output of 'syn'. they are kept in a nameset so that they can 25$ be accessed by both polish string dump routines. 26 27 nameset mark_names; 28 size mark_name(.sds. 20); 29 dims mark_name(parseimpmax); 30 31 +* synmarkmap(a, b) = data mark_name(a) = b; ** 32 33 .=include synmark 34 35 end nameset; 36 37 size namesds(sds_sz); $ gives name of token 38 39 40 put, skip(2), column(7), 'p o l i s h s t r i n g d u m p', 41 skip(2); 42 43 pos = 0; $ column position 44 45 do j = 1 to polp; 46 pos = pos + 1; 47 48 if pos > pos_lim then $ end of line 49 put, skip; 50 pos = 1; 51 end if; 52 53 put, column(postab(pos)); $ position in proper column 54 55 typ = pol_typ(j); 56 val = pol_val(j); 57 58 go to case(typ) in pol_min to pol_max; 59 60 /case(pol_name)/ 61 62 str = namesds(val); $ get name as sds 63 if (.len. str > 10) .len. str = 10; 64 65 put: str, a; 66 cont; 67 68 /case(pol_count)/ 69 70 put: val, i; 71 cont; 72 73 /case(pol_mark)/ 74 75 put: mark_name(val), a; 76 cont do j; 77 78 /case(pol_end)/ 79 80 put, skip(2), column(7), 'end of polish file', skip; 81 cont do j; 82 83 end do; 84 85 put, skip(2), column(7), 'end polish dump', skip; 86 87 88 macdrop(pos_lim) 89 90 91 ..qp 92 end subr pdump; 1 .=member xpdump 2 subr xpdump; 3 .-qp. 4 5$ this routine is similar to pdump, but dumps the auxiliary string. 6 7 8 +* pos_lim = 4 ** $ maximum nodes/line 9 10 size postab(ps); 11 dims postab(pos_lim); 12 13 data postab = 7, 22, 37, 52; 14 15 size pos(ps), $ index into postab 16 j(ps), $ loop index 17 typ(ps), $ type of current node 18 val(ps), $ value of current node 19 str(sds_sz); $ token in sds form 20 21 access mark_names; 22 23 size namesds(sds_sz); $ gives name of token 24 25 26 put, skip(2), column(7), 'x p o l s t r i n g d u m p', 27 skip(2); 28 29 pos = 0; $ column position 30 31 do j = 1 to xpolp; 32 pos = pos + 1; 33 34 if pos > pos_lim then $ end of line 35 put, skip; 36 pos = 1; 37 end if; 38 39 put, column(postab(pos)); $ position in proper column 40 41 typ = xpol_typ(j); 42 val = xpol_val(j); 43 44 go to case(typ) in pol_min to pol_max; 45 46 /case(pol_name)/ 47 48 str = namesds(val); $ get name as sds 49 if (.len. str > 10) .len. str = 10; 50 51 put: str, a; 52 cont; 53 54 /case(pol_count)/ 55 56 put: val, i; 57 cont; 58 59 /case(pol_mark)/ 60 61 put: mark_name(val), a; 62 cont do j; 63 64 /case(pol_end)/ 65 66 put, skip(2), column(7), 'end of xpolish file', skip; 67 cont do j; 68 69 end do; 70 71 put, skip(2), column(7), 'end xpolish dump', skip; 72 73 74 macdrop(pos_lim) 75 76 77 ..qp 78 end subr xpdump; 1 .=member odump 2 subr odump; 3 .-qp. 4 5$ this routine dumps opntoks. we simply print a series of names. 6 7 8 size j(ps); $ loop index 9 10 size symsds(sds_sz); $ returns token name 11 12 access opntoks; 13 14 15 put, skip(2), column(7), 'o p n t o k s d u m p', skip(2); 16 17 do j = 1 to opntoksp; 18 put, column(7): symsds(opntoks(j)), a, skip; 19 end do; 20 21 22 ..qp 23 end subr odump; 1 .=member overfl 2 subr overfl(msg); 3 4$ this routine is called when a compiler array overflows. we print 5$ a message and abort. 6 7 8 size msg(.sds. 100); 9 10 11 put ,skip; $ emit blank line 12 13 call contlpr(27, yes); $ start to echo to terminal 14 put, '*** compiler table overflow - ': msg, a, ' ***', skip; 15 call contlpr(27, no); $ stop to echo to terminal 16 17 put, skip; 18 19 call prstrm(yes); 20 21 22 end subr overfl; 1 .=member usratp 2 subr usratp; 3 4$ this routine is called if the system detects a fatal error. 5 6 put ,skip(2) ,'*** fatal error detected by system ***' ,skip(2); 7 8 if (^ et_flag) return; 9 10 call sdump; 11 call pdump; 12 call xpdump; 13 call odump; 14 15 16 end subr usratp; 1 .=member prstrm 2 subr prstrm(abnormal); 3 4$ this is the main termination routine. 'abnormal' is true for 5$ abnormal termination. 6 7 8 size abnormal(1); 9 10 size tok(ps); $ next token on input file 11 12$ the following nameset is used to pass the number of lines in the 13$ source to the overlay executive so that it can compute statistics. 14 15 nameset lexcard; 16 size line_tot(ps); 17 end nameset; 18 19 sunb 39 if lcs_flag then $ print statistics sunb 40 put ,skip; $ emit blank line 21 sunb 41 if error_count = 0 then sunb 42 put, 'no errors were detected.', skip; sunb 43 else sunb 44 put, 'number of errors detected = ': error_count, i, skip; sunb 45 end if; sunb 46 end if; 27 28 if abnormal then $ abort sunb 47 if (lcs_flag) put ,skip ,'abnormal termination.' ,skip; 30 31 call ltlfin(1, 0); 32 33 elseif tp_flag then $ terminate sunb 48 if (lcs_flag) put ,skip ,'compilation terminated.' ,skip; 35 36 call ltlfin(0, 0); 37 38 else sunb 49 if (lcs_flag) put ,skip ,'normal termination.' ,skip; 40 41 $ write end-of-file markers 42 putp(0, pol_end); putxp(0, pol_end); 43 .-qp call puttbs; $ write remainder of tables 44 smfa 95 .+sq1. 46 if (slen ssm_title) then 47 putbhdr(bt_tuple, 1) 48 putbhdr(bt_tuple, 1) 49 putbhdr(bt_map, 1) 50 file ssm_file access = release; 51 end if; 52 ..sq1 53 54 file pol_file access = release; $ close all scratch files 55 file xpol_file access = release; 56 file syn_file access = release; 57 58 line_tot = line_no; $ pass to overlay exec 59 60 .-qp call ltlterm(1, 0); 61 .+qp call ltlfin (0, 0); 62 end if; 63 64 65 end subr prstrm;