Introduction to YACC
YACC (Yet Another Compiler-Compiler) is a powerful tool used in the field of computer science to generate parsers for compilers. It was designed to produce a parser from a given grammar specification, which is a set of rules defining the syntax of a particular language. The parser generated by YACC is an LALR (Look-Ahead, Left-to-Right, Rightmost derivation with 1 lookahead token) parser, operating from left to right and trying to derive the rightmost element of a sentence in a sentence structure according to the grammar.
YACC works in three main parts: declarations, translation rules, and supporting C routines. Each part plays a crucial role in the process of generating a parser for a given language.
Declarations
The declarations section of a YACC specification includes information about the tokens used in the syntax definition. Tokens are the smallest units of meaningful data in a programming language, and they could be keywords, identifiers, operators, literals, etc. YACC automatically assigns numbers for tokens, but this can be overridden by specifying a number after the token name. For example, %token NUMBER 621
. YACC also recognizes single characters as tokens, so the assigned token numbers should not overlap with ASCII codes.
Translation Rules
The translation rules section contains grammar definitions in a modified BNF (Backus-Naur Form) form. These rules define how the parser should interpret sequences of tokens. Each rule in YACC has a string specification that resembles a production of a grammar. It has a nonterminal on the left-hand side (LHS) and a few alternatives on the right-hand side (RHS). YACC generates an LALR(1) parser for the language from the productions, which is a bottom-up parser.
Supporting C Routines
The supporting C routines section includes C code external to the definition of the parser and variable declarations. It can also include the specification of the starting symbol in the grammar: %start nonterminal
. If the yylex()
function is not defined in the auxiliary routines sections, then it should be included with #include "lex.yy.c"
. If the YACC file contains the main()
definition, it must be compiled to be executable.
Building a C Compiler using Lex and Yacc
Building a C compiler involves several steps, including writing the lexical analyzer (Lex), writing the syntax analyzer (Yacc), and integrating the two. The Lex program reads the source code and breaks it down into tokens, while the Yacc program takes these tokens and checks if they conform to the grammar of the language.
To build a C compiler, you start by writing the Lex file, which defines the tokens for the C language. After that, you write the Yacc file, which defines the grammar of the C language and specifies how the tokens should be parsed. Once you have both Lex and Yacc files ready, you can integrate them by adding the #include "lex.yy.c"
statement in the Yacc file. Then, you can compile the Yacc file using the yacc -v -d parser1.y
command and link it with the Lex library using the gcc -ll y.tab.c
command.
YACC Exercises
- Implement both versions of simple and advanced calculator compilers with YACC or GNU BISON.
- Add the exponent operator, ∧, to your calculator.
- Add error recovery methods discussed in the previous section to your calculator compiler.
- Discuss other bottom-up parser generators (e.g., GNU BISON) within your groups.