parsing - What kind of lexer/parser was used in the very first C compiler? -
in 1970s, dennis ritchie wrote first c compiler.
in year 2017, wanted write c compiler. books deep c secrets (peter van der linden) c was, above else, designed easy compile. i've been having inordinate amount of trouble it.
for starters, it's relatively difficult come lex/yacc specifications c language, , these tools didn't exist yet when ritchie made his compiler!
plus, there great many examples of surprisingly small c compilers not use lex & yacc. (check out this tiny obfuscated c compiler fabrice bellard. note "production" tinycc source quite bit longer, in effort accommodate more architectures, , more readable)
so what missing here? kind of lexer/parser did ritchie use in compiler? there easier way of writing compilers haven't stumbled onto?
yacc's name abbreviation "yet compiler compiler", suggests neither first nor second such tool.
indeed, wikipedia article on history of compiler construction notes that
in 1960s, robert mcclure @ texas instruments invented compiler-compiler called tmg, name taken "transmogrification". in following years tmg ported several univac , ibm mainframe computers.
…
not long after ken thompson wrote first version of unix pdp-7 in 1969, doug mcilroy created new system's first higher-level language: implementation of mcclure's tmg. tmg compiler definition tool used ken thompson write compiler b language on pdp-7 in 1970. b immediate ancestor of c.
that's not quite answer question, provides possibilities.
original answer:
i wouldn't @ surprised if ritchie banged hand-built top-down or operator precedence parser. techniques well-known, , original c language presented few challenges. parser generating tools existed.
postscript:
a comment on op alexey frunze points early version of c compiler. it's recursive-descent top-down parser, point expressions need parsed @ point uses shunting-yard-like operator precedence grammar. (see function tree
in first source file expression parser.) style of starting top-down algorithm , switching bottom-up algorithm (such operator-precedence) when needed called "left corner" (lc) parsing.
so that's architecture said wouldn't surprise me, , didn't :).
it's worth noting compiler unearthed alexey (and @torek in comment post) not handle close consider c language these days. in particular, handles small subset of declaration syntax (no structs or unions, example), complicated part of k&r c grammar. not answer question how produce "simple" parser c.
c (mostly) parseable lalr(1) grammar, although need implement version of "lexer hack" in order correctly parse cast expressions. input parser (translation phase 7) stream of tokens produced preprocessing code (translation phase 4, incorporating phases 5 , 6), may draw upon (f)lex tokenizer (phase 3) input have been sanitized in fashion according phases 1 , 2. (see § 5.1.1.2 precise definition of phases).
sadly, (f)lex not designed part of pipeline; want handle task of reading source. however, flex can convinced let provide chunks of input redefining yy_input macro. handling trigraphs (if chose that) , line continuations can done using simple state machine; it's convenient these transformations shrink input, simplifying handling of maximum input length parameter yy_input
. (don't provide input 1 character @ time suggested example in flex manual.)
since preprocessor must produce stream of tokens (at point, whitespace no longer important), convenient use bison's push-parser interface. (indeed, more convenient use push api.) if take suggestion, end phase 4 top-level driver of parse.
you hand-build preprocessor-directive parser, getting #if
expressions , pragma
s right suggests use of separate bison parser preprocessing.
if want learn how build compiler, might want start simpler language such tiger, language used running example in andrew appel's excellent textbooks on compiler construction.
Comments
Post a Comment