Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Discuss target code generation. Explain register allocation and the issues in the design of a code generator.

Target Code Generation

The code generator is the final phase of a compiler. It takes the optimized intermediate representation (IR) of the source program (e.g. three-address code) together with information in the symbol table and produces equivalent target code (assembly or machine code). The generated code must be correct and of high quality (fast and compact), and the generator itself should run efficiently.

Input: IR (three-address code, DAGs, etc.)\text{IR (three-address code, DAGs, etc.)} + symbol table. Output: relocatable machine code / assembly.

Register Allocation and Assignment

Because register operations are far faster than memory operations, keeping frequently used values in registers improves performance. This is handled in two sub-problems:

  • Register allocation — decide which variables/values will reside in registers at each point in the program.
  • Register assignment — decide the specific physical register each selected value will occupy.

Finding an optimal assignment is NP-complete, so heuristics are used. A common technique is graph coloring: build an interference graph whose nodes are values whose live ranges overlap; two nodes are connected if they are simultaneously live and therefore cannot share a register. Coloring the graph with kk colors (where kk = number of available registers) gives a valid assignment; if it cannot be colored, some values are spilled to memory. Usage counts ((uses+defs)×cloop depth\sum (\text{uses} + \text{defs}) \times c^{\text{loop depth}}) help decide which values are most beneficial to keep in registers.

Issues in the Design of a Code Generator

  1. Input to the code generator — form of IR (postfix, three-address code, syntax trees, DAGs) and the assumption that the front end has done type checking.
  2. Target program / target language — absolute machine code, relocatable machine code, or assembly language; affects how addresses and linking are handled.
  3. Instruction selection — choosing machine instructions that implement each IR operation; must consider instruction speed, cost, and addressing modes (e.g. using INC a instead of ADD a, #1).
  4. Register allocation and assignment — choosing which values stay in registers (discussed above).
  5. Evaluation order — the order in which computations are performed affects the number of registers needed and the efficiency of the code.
  6. Memory management — mapping names in the symbol table to run-time addresses and handling activation records.
  7. Approach / target machine details — the instruction set, number of registers, and cost model strongly influence the design.

A good code generator balances correctness, code quality, and its own efficiency, given that optimal code generation is theoretically intractable.

code-generation
2long10 marks

Explain semantic analysis and type checking. Discuss type expressions, type systems and type conversion.

Semantic Analysis

Semantic analysis is the compiler phase that checks the source program for semantic (meaning-level) consistency that cannot be captured by context-free grammar alone. It uses the syntax tree and the symbol table to verify that the program is meaningful — e.g. variables are declared before use, the number/types of arguments match, operators are applied to compatible operands, and break is used only inside a loop. It also gathers type information for the code-generation phase. It is commonly implemented using a syntax-directed definition with semantic rules attached to grammar productions.

Type Checking

Type checking is the major task of semantic analysis. The type checker verifies that each operation in the program receives operands of permissible types according to the language's type rules; otherwise it reports a type error. It can be static (done at compile time) or dynamic (done at run time). Example rule:

e1:inte2:inte1+e2:int\frac{e_1 : \text{int} \quad e_2 : \text{int}}{e_1 + e_2 : \text{int}}

Type Expressions

A type expression denotes the type of a language construct, built from basic types and type constructors:

  • Basic types: int, char, boolean, real, plus type_error and void.
  • Type constructors applied to type expressions:
    • Array: array(I, T) — array with index set II and element type TT.
    • Product: T1×T2T_1 \times T_2.
    • Record: a product with field names.
    • Pointer: pointer(T).
    • Function: DRD \rightarrow R (domain type to range type).

For example, int x[10] has type array(0..9, int), and int f(char a, char b) has type char×charintchar \times char \rightarrow int.

Type Systems

A type system is the collection of rules that assigns type expressions to the parts of a program. It determines well-typed programs and is the basis of the type checker.

  • Statically typed: types are checked at compile time (C, Java, Pascal) → catches errors early, faster code.
  • Dynamically typed: types are checked at run time (Python, Lisp) → more flexible but slower and errors appear later.
  • A type system is sound if it never accepts a program that would commit a type error at run time. A strongly typed language guarantees no type errors escape detection.

Type Conversion (Coercion vs Casting)

When an operator is applied to operands of different types, a conversion is needed:

  • Implicit conversion (coercion) — performed automatically by the compiler, e.g. in x = i + 3.14 the integer i is converted to real (widening) before the addition.
  • Explicit conversion (casting) — written by the programmer, e.g. (int) y in C.
  • Widening (e.g. int → real) is usually safe; narrowing (e.g. real → int) may lose information and often needs an explicit cast.

The type checker inserts conversion operations and records the result type so that subsequent phases generate correct code.

semantic-analysistype-checking
3long10 marks

Explain the different phases of a compiler with a suitable diagram, showing the output of each phase for the statement position = initial + rate * 60.

Phases of a Compiler

A compiler is organized into a sequence of phases, each transforming the program from one representation to another. The first four constitute the analysis (front end) and the last two the synthesis (back end). The symbol table and error handler interact with all phases.

Source program
      |
[1] Lexical Analyzer  ---> tokens
      |
[2] Syntax Analyzer   ---> parse/syntax tree     <----> Symbol Table
      |                                            <----> Error Handler
[3] Semantic Analyzer ---> annotated syntax tree
      |
[4] Intermediate Code Generator ---> three-address code
      |
[5] Code Optimizer    ---> optimized IR
      |
[6] Target Code Generator ---> target machine code
  1. Lexical Analysis (Scanner) — reads source characters and groups them into tokens, stripping whitespace and comments; enters identifiers into the symbol table.
  2. Syntax Analysis (Parser) — groups tokens into grammatical phrases, producing a parse tree / syntax tree, and reports syntax errors.
  3. Semantic Analysis — checks meaning (type checking, declaration before use) and annotates the tree; inserts type-conversion operations.
  4. Intermediate Code Generation — produces a machine-independent IR, typically three-address code.
  5. Code Optimization — improves the IR to make it faster/smaller without changing its meaning.
  6. Target Code Generation — produces the final relocatable machine/assembly code, performing register allocation.

Output of Each Phase for position = initial + rate * 60

Lexical Analyzer (tokens):

id1 = id2 + id3 * 60

where id1=position, id2=initial, id3=rate (entered in the symbol table) and 60 is a number constant.

Syntax Analyzer (syntax tree):

        =
       / \
     id1   +
          /  \
        id2    *
              /  \
            id3   60

Semantic Analyzer60 (integer) is coerced to real to match rate:

        =
       / \
     id1   +
          /  \
        id2    *
              /  \
            id3   inttofloat
                     |
                     60

Intermediate Code Generator (three-address code):

t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3

Code Optimizer:

t1 = id3 * 60.0
id1 = id2 + t1

Target Code Generator (assembly, register R1/R2):

LDF  R2, id3
MULF R2, R2, #60.0
LDF  R1, id2
ADDF R1, R1, R2
STF  id1, R1
phasescompiler
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Differentiate between a compiler and an interpreter.

Compiler vs Interpreter

CompilerInterpreter
Translates the whole source program into machine code at once, before execution.Translates and executes the program statement by statement.
Produces a separate object/executable file.Produces no intermediate object file.
Execution is faster (translation done once).Execution is slower (re-translates each time/line).
Reports all errors after scanning the whole program.Stops at the first error encountered.
Needs more memory (stores object code).Needs less memory.
Debugging is comparatively harder.Debugging is easier (line by line).
Examples: C, C++, Pascal compilers.Examples: Python, BASIC, JavaScript interpreters.

In short: a compiler translates the entire program once and then runs the fast object code, whereas an interpreter translates and runs one statement at a time.

compiler
5short5 marks

Eliminate left recursion and perform left factoring on the grammar A -> Aab | Ac | bd | f.

Given Grammar

AAabAcbdfA \rightarrow Aab \mid Ac \mid bd \mid f

Step 1: Eliminate Left Recursion

The rule has the form AAα1Aα2β1β2A \rightarrow A\alpha_1 \mid A\alpha_2 \mid \beta_1 \mid \beta_2 with

  • α1=ab\alpha_1 = ab, α2=c\alpha_2 = c (left-recursive parts)
  • β1=bd\beta_1 = bd, β2=f\beta_2 = f (non-left-recursive parts)

Using AβAA \rightarrow \beta A', AαAεA' \rightarrow \alpha A' \mid \varepsilon:

AbdAfAA \rightarrow bd\,A' \mid f\,A' AabAcAεA' \rightarrow ab\,A' \mid c\,A' \mid \varepsilon

Step 2: Left Factoring

Look for common prefixes among alternatives.

  • In AbdAfAA \rightarrow bd\,A' \mid f\,A' — the first symbols are b and f, no common prefix, so no factoring needed.
  • In AabAcAεA' \rightarrow ab\,A' \mid c\,A' \mid \varepsilon — first symbols a, c, ε\varepsilon, no common prefix, so no factoring needed.

Final Grammar (left-recursion-free, left-factored)

AbdAfAA \rightarrow bd\,A' \mid f\,A' AabAcAεA' \rightarrow ab\,A' \mid c\,A' \mid \varepsilon

This grammar is now suitable for top-down (LL/predictive) parsing.

grammarparsing
6short5 marks

What is a symbol table? Explain its role in a compiler.

Symbol Table

A symbol table is a data structure maintained by the compiler that stores information about each identifier (names of variables, functions, objects, classes, etc.) appearing in the source program. For each name it keeps attributes such as:

  • name (lexeme)
  • type and data type
  • scope and visibility
  • memory location / offset / address
  • for functions: number and types of parameters, return type.

It is usually implemented as a hash table (or linked list / tree) for fast insertion and lookup.

Role in a Compiler

The symbol table is created in early phases and used by almost every phase:

  1. Lexical analyzer — inserts identifiers into the table as it recognizes them.
  2. Syntax analyzer — adds structural/scope information.
  3. Semantic analyzer — checks declarations, scope rules, and uses stored types for type checking.
  4. Intermediate / target code generation — retrieves addresses, sizes, and types to allocate storage and generate correct code.
  5. Error handling — detects undeclared variables, multiple declarations, and type mismatches.

Thus the symbol table is the central repository that links the phases together and enables efficient storage and retrieval of identifier information throughout compilation.

symbol-table
7short5 marks

Differentiate between a parse tree and a syntax tree.

Parse Tree vs Syntax Tree

Parse Tree (Concrete Syntax Tree)Syntax Tree (Abstract Syntax Tree)
Shows the complete derivation according to the grammar.Shows the abstract structure of the program.
Contains all grammar symbols, including non-terminals and intermediate productions.Contains only the essential elements; intermediate non-terminals are removed.
Operators and keywords appear as leaf nodes.Operators appear as internal (root) nodes, operands as children.
Contains redundant information (e.g. single-production chains).Compact / condensed; redundancy removed.
Produced directly during parsing.Derived from the parse tree; used by later phases.

Example for a + b * c:

Parse tree (using E -> E + T | T, T -> T * F | F, F -> id) expands every non-terminal E, T, F. The syntax tree is simply:

      +
     / \
    a   *
       / \
      b   c

The syntax tree is more convenient for semantic analysis and code generation.

parsing
8short5 marks

What is an ambiguous grammar? Show with an example.

Ambiguous Grammar

A grammar is ambiguous if there exists at least one string in its language that has more than one distinct parse tree (equivalently, more than one leftmost or rightmost derivation). Ambiguity is undesirable because the meaning/precedence of a construct becomes unclear and parsers cannot decide which tree to build.

Example

Consider the grammar:

EE+EEEidE \rightarrow E + E \mid E * E \mid id

For the string id + id * id, two different parse trees are possible:

Tree 1 (interpreting as id + (id * id)):

      +
     / \
   id    *
        / \
      id   id

Tree 2 (interpreting as (id + id) * id):

        *
       / \
      +   id
     / \
   id   id

Since the same string id + id * id has two distinct parse trees, the grammar is ambiguous. It can be made unambiguous by introducing precedence and associativity, e.g.

EE+TT,TTFF,FidE \rightarrow E + T \mid T, \quad T \rightarrow T * F \mid F, \quad F \rightarrow id
grammar
9short5 marks

Explain the role of the lexical analyzer in a compiler.

Role of the Lexical Analyzer

The lexical analyzer (scanner) is the first phase of a compiler. It reads the source program as a stream of characters and groups them into meaningful units called tokens (e.g. keywords, identifiers, constants, operators, punctuation). Its main roles are:

  1. Tokenization — scan the input and produce a sequence of <token-name, attribute-value> pairs that are passed to the parser (usually one token per request, on demand).
  2. Remove non-essential characters — strip out whitespace, tabs, newlines, and comments.
  3. Symbol table interaction — enter identifiers and constants into the symbol table and store/return pointers to their entries.
  4. Error reporting — detect and report lexical errors (e.g. illegal characters, malformed tokens) along with the line number.
  5. Bookkeeping — keep track of the line number (for error messages) and may expand macros / handle preprocessor directives.

It is typically specified using regular expressions and implemented with finite automata (DFA). By separating scanning from parsing, the compiler design becomes simpler, more efficient, and more portable.

Source chars --> [ Lexical Analyzer ] --token--> [ Parser ]
                        ^  |
                        |  v
                    Symbol Table
lexical-analysis
10short5 marks

Differentiate between LL(1) and LR parsers.

LL(1) vs LR Parsers

LL(1) ParserLR Parser
Top-down parsing.Bottom-up parsing.
Builds the parse tree from the root down to the leaves.Builds the parse tree from the leaves up to the root.
Produces a leftmost derivation.Produces a rightmost derivation in reverse.
First L = scan left-to-right, second L = leftmost derivation, 1 = one token lookahead.First L = scan left-to-right, R = rightmost derivation, with kk lookahead.
Cannot handle left-recursive or non-left-factored grammars (must be removed first).Can handle left-recursive grammars directly.
Less powerful — accepts a smaller class of grammars.More powerful — accepts a larger class of grammars (every LL(1) grammar is LR(1)).
Uses a predictive parsing table with a stack; simpler to construct (often by hand).Uses action and goto tables (SLR, LALR, CLR); harder to build, usually via tools like YACC.
Easier error detection/recovery.Detects errors as early as a left-to-right scan permits, but tables are complex.

Summary: LL(1) is top-down, simpler, and weaker; LR is bottom-up, more powerful, and handles a wider class of grammars but needs larger parsing tables.

parsing
11short5 marks

What are regular expressions? Write a regular expression for identifiers.

Regular Expressions

A regular expression (RE) is a notation used to specify patterns of strings, i.e. it describes a regular language (a set of strings over an alphabet). Regular expressions are used in the lexical-analysis phase to define the structure of tokens such as identifiers, numbers, and keywords.

Basic rules (over alphabet Σ\Sigma), with the operators in order of decreasing precedence: closure *, concatenation, alternation |:

  • ε\varepsilon is an RE denoting {ε}\{\varepsilon\} (empty string).
  • For each symbol aΣa \in \Sigma, aa is an RE denoting {a}\{a\}.
  • If rr and ss are REs: rsr \mid s (union), rsrs (concatenation), rr^* (zero or more), and (r)(r) are REs.

Extensions: r+r^+ = one or more, r?r? = optional.

Regular Expression for Identifiers

An identifier begins with a letter or underscore, followed by any number of letters, digits, or underscores. Define:

letter[A-Za-z]  (i.e. ABz)letter \rightarrow [A\text{-}Za\text{-}z]\ \ \text{(i.e. } A\mid B\mid\dots\mid z) digit[0-9]  (i.e. 019)digit \rightarrow [0\text{-}9]\ \ \text{(i.e. } 0\mid 1\mid\dots\mid 9)

Then the regular expression for an identifier is:

identifier(letter_)(letterdigit_)identifier \rightarrow (letter \mid \_)\,(letter \mid digit \mid \_)^*

For example, this matches count, _temp, x1, total_sum but not 9abc or a-b.

regular-expressionlexical-analysis
12short5 marks

Explain the handle and handle pruning in shift-reduce parsing.

Handle and Handle Pruning in Shift-Reduce Parsing

Shift-reduce parsing is a bottom-up parsing technique that builds the parse tree from the leaves to the root by repeatedly shifting input symbols onto a stack and reducing a substring on top of the stack to a non-terminal.

Handle

A handle of a right-sentential form is a substring that matches the right-hand side of a production, and whose reduction to the left-hand-side non-terminal represents one step of a rightmost derivation in reverse. Formally, if SrmαAwrmαβwS \Rightarrow_{rm}^* \alpha A w \Rightarrow_{rm} \alpha \beta w, then the production AβA \rightarrow \beta at the position following α\alpha is a handle of αβw\alpha \beta w. Choosing the correct handle is the central decision of a shift-reduce parser.

Handle Pruning

Handle pruning is the process of obtaining the rightmost derivation in reverse by repeatedly finding a handle and reducing (replacing) it with the non-terminal on the left of the matching production, until the start symbol is reached. The sequence of reductions traces the parse bottom-up.

Example

Grammar: EE+EEEidE \rightarrow E + E \mid E * E \mid id. Parse id + id * id:

Right-sentential formHandleReducing production
id + id * idfirst idEidE \rightarrow id
E + id * ididEidE \rightarrow id
E + E * ididEidE \rightarrow id
E + E * EE * EEEEE \rightarrow E * E
E + EE + EEE+EE \rightarrow E + E
E(start symbol reached)

Each highlighted substring is a handle; replacing it is one step of handle pruning, ultimately reducing the input to the start symbol EE.

parsing

Frequently asked questions

Where can I find the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) question paper 2081?
The full BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2081 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Compiler Design and Construction (BSc CSIT, CSC365) 2081 paper come with solutions?
Yes. Every question on this Compiler Design and Construction (BSc CSIT, CSC365) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2081 paper?
The BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Compiler Design and Construction (BSc CSIT, CSC365) past paper free?
Yes — reading and attempting this Compiler Design and Construction (BSc CSIT, CSC365) past paper on Kekkei is completely free.