Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What is a directed acyclic graph (DAG)? Represent the expression a + a * (b - c) + (b - c) * d using three-address code, quadruples and triples.

Directed Acyclic Graph (DAG)

A Directed Acyclic Graph (DAG) is a directed graph that contains no cycles. For expressions it is similar to a syntax tree but common sub-expressions are shared: identical sub-trees are represented by a single node having more than one parent. DAGs help the compiler identify common sub-expressions so that redundant computation is eliminated during optimization of basic blocks.

DAG for a + a * (b - c) + (b - c) * d

The sub-expression (b - c) occurs twice, so it is computed only once and shared.

Nodes (with the operands/labels attached):

  • t1 = b - c (shared by two parents)
  • t2 = a * t1
  • t3 = a + t2
  • t4 = t1 * d
  • t5 = t3 + t4 (root)

Description of the DAG: a single - node with leaves b and c has two outgoing parents — the * node (a * (b-c)) and the * node ((b-c) * d). The leaf a is also shared by the + and * nodes.

Three-Address Code

t1 = b - c
t2 = a * t1
t3 = a + t2
t4 = t1 * d
t5 = t3 + t4

Quadruples

Format: (op, arg1, arg2, result)

#oparg1arg2result
(0)-bct1
(1)*at1t2
(2)+at2t3
(3)*t1dt4
(4)+t3t4t5

Triples

Results are referred to by the instruction index (i) instead of temporary names.

#oparg1arg2
(0)-bc
(1)*a(0)
(2)+a(1)
(3)*(0)d
(4)+(2)(3)

Note how triple (0) (the value of b - c) is referenced by both (1) and (3), reflecting the shared node of the DAG.

dagintermediate-code
2long10 marks

Explain the general structure of an LR parser. Construct the LR(0) item sets for the grammar S -> AA, A -> aA | b.

General Structure of an LR Parser

An LR parser is a bottom-up, shift-reduce parser that scans the input Left-to-right and produces a Rightmost derivation in reverse. It is table-driven and consists of:

  1. Input buffer – holds the string to be parsed, terminated by $.
  2. Stack – holds a sequence of states (and grammar symbols); the top state is used to index the parsing tables.
  3. Driver (parsing) program – the same for all LR parsers; it consults the tables and performs actions.
  4. Parsing table – two parts:
    • ACTION[state, terminal] → shift, reduce, accept, or error.
    • GOTO[state, nonterminal] → next state after a reduction.

Working: The driver looks at the top-of-stack state s and the current input symbol a.

  • ACTION[s,a] = shift s' → push a and state s'.
  • ACTION[s,a] = reduce A→β → pop 2*|β| symbols, then push A and GOTO[top, A].
  • ACTION[s,a] = accept → parsing succeeds.
  • otherwise → syntax error.

LR(0) Item Sets for S → AA, A → aA | b

Augment the grammar with S' → S. An LR(0) item is a production with a dot .

I0 = CLOSURE({S' → •S})

S' → •S
S  → •AA
A  → •aA
A  → •b

Transitions from I0:

  • GOTO(I0, S) = I1: S' → S•
  • GOTO(I0, A) = I2: S → A•A, A → •aA, A → •b
  • GOTO(I0, a) = I3: A → a•A, A → •aA, A → •b
  • GOTO(I0, b) = I4: A → b•

I2 transitions:

  • GOTO(I2, A) = I5: S → AA•
  • GOTO(I2, a) = I3
  • GOTO(I2, b) = I4

I3 transitions:

  • GOTO(I3, A) = I6: A → aA•
  • GOTO(I3, a) = I3
  • GOTO(I3, b) = I4

Summary of states:

StateItems
I0S'→•S, S→•AA, A→•aA, A→•b
I1S'→S•
I2S→A•A, A→•aA, A→•b
I3A→a•A, A→•aA, A→•b
I4A→b•
I5S→AA•
I6A→aA•

These 7 item sets form the canonical collection of LR(0) items (the DFA of viable prefixes) for the grammar.

lr-parsing
3long10 marks

What is syntax-directed translation? Explain syntax-directed definitions with an example of a translation scheme.

Syntax-Directed Translation (SDT)

Syntax-Directed Translation is a method of compiler implementation in which the source-language translation is driven by the grammar: each grammar production is associated with a set of semantic rules or semantic actions that compute attribute values or generate output (intermediate code, type information, etc.). It connects the syntax (parse tree) to its meaning.

Syntax-Directed Definitions (SDD)

A Syntax-Directed Definition is a context-free grammar in which each grammar symbol has a set of attributes, and each production has semantic rules to compute the values of those attributes. Attributes are of two kinds:

  • Synthesized attribute – computed from the attributes of the children (and the node itself). Used in S-attributed definitions and evaluated bottom-up.
  • Inherited attribute – computed from the parent and/or sibling nodes; flows top-down/sideways.

An SDD only specifies what to compute, not when; it has no side effects and order is implied by attribute dependencies.

Example SDD (evaluating an arithmetic expression)

ProductionSemantic Rule
L → E nprint(E.val)
E → E1 + TE.val = E1.val + T.val
E → TE.val = T.val
T → T1 * FT.val = T1.val * F.val
T → FT.val = F.val
F → ( E )F.val = E.val
F → digitF.val = digit.lexval

Here val is a synthesized attribute; this is an S-attributed definition.

Translation Scheme

A translation scheme is an SDD in which semantic actions are embedded within the right-hand side of productions using braces { ... }, indicating exactly when the action is executed during parsing (left-to-right). It can have side effects.

Example: binary-to-output translation scheme

E → E1 + T   { E.val = E1.val + T.val }
E → T        { E.val = T.val }
T → T1 * F   { T.val = T1.val * F.val }
T → F        { T.val = F.val }
F → ( E )    { F.val = E.val }
F → digit    { F.val = digit.lexval }

For input 3 * 4 + 5, the actions evaluate the synthesized val bottom-up to produce 17. Such schemes are the basis of action code in tools like yacc/bison.

syntax-directed-translation
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Differentiate between a compiler and an interpreter.

Compiler vs Interpreter

CompilerInterpreter
Translates the entire source program into machine/target code at once before execution.Translates and executes the program line by line (statement by statement).
Generates an independent object/executable file.Does not produce a separate object file.
Execution is faster (translation done beforehand).Execution is slower (translation happens at run time).
Reports all errors after scanning the whole program.Reports the first error and stops; debugging is interactive.
Needs more memory (intermediate object code).Needs less memory.
Examples: C, C++ compilers.Examples: Python, BASIC, classic LISP.

In short, a compiler trades a slower, one-time translation for fast repeated execution, whereas an interpreter offers easier debugging at the cost of slower runtime.

compiler
5short5 marks

Eliminate left recursion and perform left factoring on the grammar A -> Aab | Ac | bd | f.

Given Grammar

A → Aab | Ac | bd | f

Step 1: Eliminate Left Recursion

The grammar has immediate left recursion of the form A → Aα | β, where:

  • α-alternatives (left-recursive): ab, c
  • β-alternatives (non-left-recursive): bd, f

Using the rule A → βA' and A' → αA' | ε:

A  → bd A' | f A'
A' → ab A' | c A' | ε

Step 2: Left Factoring

In the rule for A, the two alternatives bd A' and f A' begin with different symbols (b vs f), so no common prefix — no factoring needed there.

In the rule for A', the alternatives ab A', c A', ε also have no common prefix (start with a, c, or ε), so no factoring is needed.

Final Grammar

A  → bd A' | f A'
A' → ab A' | c A' | ε

This grammar is free of left recursion and is already left-factored, making it suitable for predictive (LL(1)) top-down parsing.

grammarparsing
6short5 marks

What is a symbol table? Explain its role in a compiler.

Symbol Table

A symbol table is a data structure maintained by the compiler that stores information about each identifier (variables, constants, functions, procedures, objects, labels, etc.) appearing in the source program. For every entry it records attributes such as the name, type, scope, memory location/offset, size, parameter list, and whether it is initialized.

It is typically implemented using hash tables, linked lists, or trees, and supports two main operations: insert (add a new identifier) and lookup (search for an existing one).

Role in a Compiler

The symbol table is used by almost every phase of the compiler:

  • Lexical analysis – inserts identifiers (or their names) when tokens are recognized.
  • Syntax analysis – associates names with their declarations.
  • Semantic analysis – performs type checking and scope/declaration checking (e.g., undeclared or doubly-declared variables) using stored types.
  • Intermediate & code generation – provides addresses, offsets and sizes for generating target code and allocating storage.
  • Optimization – supplies information about variables used.
  • Error detection – flags undeclared identifiers, type mismatches and scope violations.

Thus the symbol table acts as the central repository of identifier information, enabling consistent and correct translation across all compilation phases.

symbol-table
7short5 marks

Differentiate between a parse tree and a syntax tree.

Parse Tree vs Syntax Tree

Parse Tree (Concrete Syntax Tree): A tree that depicts the complete derivation of a string from the start symbol of a grammar. Every grammar symbol — including all non-terminals, terminals and intermediate productions (like E → T → F) — appears as a node. Leaves are terminals; interior nodes are non-terminals.

Syntax Tree (Abstract Syntax Tree, AST): A condensed/abstract form of the parse tree that retains only the essential structure. Operators become internal nodes and operands become leaves; redundant non-terminals, single-child chains, and grammar-only symbols (parentheses, intermediate non-terminals) are removed.

Parse TreeSyntax Tree
Contains every non-terminal and terminal of the grammar.Contains only operators (interior) and operands (leaves).
Shows the full derivation / grammar structure.Shows the abstract structure / meaning of the construct.
Larger, includes redundant nodes.Compact, redundant nodes removed.
Used during/just after parsing.Used for semantic analysis and intermediate code generation.

Example: a + b * c

  • Parse tree includes nodes like E → E + T, T → T * F, F → id, etc.
  • Syntax tree is simply:
      +
     / \
    a   *
       / \
      b   c
parsing
8short5 marks

What is an ambiguous grammar? Show with an example.

Ambiguous Grammar

A context-free grammar is said to be ambiguous if there exists at least one string in its language that can be derived by more than one distinct parse tree (equivalently, more than one leftmost derivation or more than one rightmost derivation). Ambiguity is undesirable because it makes the meaning of a program unclear and complicates parser construction.

Example

Consider the grammar:

E → E + E | E * E | id

The string id + id * id has two different parse trees:

Derivation 1 (treats + first):

E → E + E → id + E → id + (E * E) → id + id * id

Structure: id + (id * id)

Derivation 2 (treats * first):

E → E * E → (E + E) * E → id + id * id

Structure: (id + id) * id

Since the same string has two leftmost derivations / two parse trees with different meanings, the grammar is ambiguous.

Removal: Ambiguity can usually be removed by rewriting the grammar to enforce precedence and associativity, e.g.:

E → E + T | T
T → T * F | F
F → id
grammar
9short5 marks

Explain the role of the lexical analyzer in a compiler.

Role of the Lexical Analyzer

The lexical analyzer (also called scanner) is the first phase of a compiler. Its primary job is to read the stream of characters of the source program and group them into meaningful sequences called lexemes, producing a stream of tokens for the syntax analyzer (parser).

Main Functions

  1. Tokenization: Identifies tokens such as keywords, identifiers, constants, operators and punctuation, each represented as <token-name, attribute-value> (e.g., <id, pointer-to-symbol-table>).
  2. Removing whitespace and comments: Strips blanks, tabs, newlines and comments that have no role in parsing.
  3. Interaction with symbol table: Inserts identifiers/constants into the symbol table and returns a pointer as the token attribute.
  4. Correlating error messages: Keeps track of line numbers / positions so that errors can be reported accurately.
  5. Macro expansion / preprocessing (in some compilers).
  6. Error reporting: Detects lexical errors such as illegal characters or malformed tokens.

Working

The parser calls the lexical analyzer with getNextToken(); the analyzer scans characters, matches them against patterns (specified by regular expressions and implemented as a DFA/finite automaton), and returns the next token. This separation simplifies the parser and improves compiler efficiency and portability.

lexical-analysis
10short5 marks

Differentiate between LL(1) and LR parsers.

LL(1) vs LR Parsers

LL(1) ParserLR Parser
Top-down parsing.Bottom-up parsing.
First L = scan left-to-right, second L = Leftmost derivation.L = scan left-to-right, R = Rightmost derivation (in reverse).
Builds the parse tree from the root down to leaves.Builds the parse tree from the leaves up to the root.
Uses 1 lookahead symbol; predictive (table-driven).Uses shift-reduce actions with ACTION/GOTO tables.
Cannot handle left recursion; needs left-factoring; grammar must be LL(1).Handles a larger class of grammars (left recursion is fine).
Simpler to build by hand; smaller tables.More powerful but tables are larger and harder to build manually.
Variants: recursive-descent, predictive parser.Variants: SLR, CLR (LR(1)), LALR.

Summary: Every LL(1) grammar is LR(1), but not vice-versa. LR parsers are strictly more powerful and recognize most programming-language constructs, while LL(1) parsers are simpler and easier to implement.

parsing
11short5 marks

What are regular expressions? Write a regular expression for identifiers.

Regular Expressions

A regular expression (RE) is a notation used to describe regular languages — i.e., the set of strings (patterns) that form the lexemes/tokens recognized by a lexical analyzer. Over an alphabet Σ, regular expressions are built from the following rules:

  1. ε is a RE denoting the language {ε}.
  2. Each symbol a ∈ Σ is a RE denoting {a}.
  3. If r and s are REs, then:
    • r | s (union/alternation) denotes L(r) ∪ L(s),
    • rs (concatenation) denotes L(r)L(s),
    • r* (Kleene closure) denotes zero or more repetitions of L(r).
  4. ( r ) denotes L(r) (grouping).

Operator precedence (highest to lowest): *, concatenation, |. Regular expressions are equivalent in power to finite automata and are the basis of token specification in tools like lex/flex.

Regular Expression for Identifiers

An identifier is a letter (or underscore) followed by any number of letters, digits, or underscores:

letter[A-Za-z_]\textbf{letter} \rightarrow [A\text{-}Za\text{-}z\_] digit[0-9]\textbf{digit} \rightarrow [0\text{-}9] idletter(letterdigit)\textbf{id} \rightarrow \textbf{letter}\,(\textbf{letter} \mid \textbf{digit})^{*}

Equivalently, in expanded form:

id = (letter | _) (letter | digit | _)*

This matches valid identifiers such as count, _temp1, myVar2 but rejects strings starting with a digit like 2var.

regular-expressionlexical-analysis
12short5 marks

Explain the handle and handle pruning in shift-reduce parsing.

Handle and Handle Pruning in Shift-Reduce Parsing

Handle

During bottom-up (shift-reduce) parsing, a handle is a substring of the current right-sentential form that matches the right-hand side of a production, and whose reduction to the left-hand side represents one step in the reverse of a rightmost derivation.

Formally, if S ⇒*rm αAw ⇒rm αβw, then A → β (at the position right after α) is a handle of the right-sentential form αβw. The string w to the right of the handle contains only terminals.

Handle Pruning

Handle pruning is the process of obtaining a rightmost derivation in reverse by repeatedly:

  1. Locating the handle β in the current right-sentential form, and
  2. Reducing it by replacing β with the left-hand side A of the matching production A → β.

Repeating this until the form reduces to the start symbol gives the bottom-up parse.

Example

Grammar:

E → E + E | E * E | id

Parse id + id * id by handle pruning (reductions shown bottom-up):

Right-sentential formHandleProduction
id + id * idid (1st)E → id
E + id * idid (2nd)E → id
E + E * idid (3rd)E → id
E + E * EE * EE → E * E
E + EE + EE → E + E
E(start symbol reached)

This reverse-rightmost sequence of handle reductions is exactly what a shift-reduce parser performs using its stack (shift symbols until a handle appears on top, then reduce).

parsing

Frequently asked questions

Where can I find the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) question paper 2079?
The full BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Compiler Design and Construction (BSc CSIT, CSC365) 2079 paper come with solutions?
Yes. Every question on this Compiler Design and Construction (BSc CSIT, CSC365) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2079 paper?
The BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Compiler Design and Construction (BSc CSIT, CSC365) past paper free?
Yes — reading and attempting this Compiler Design and Construction (BSc CSIT, CSC365) past paper on Kekkei is completely free.