Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Explain the general structure of an LR parser. Construct the LR(0) item sets for the grammar S -> AA, A -> aA | b.

General Structure of an LR Parser

An LR parser is a bottom-up, shift-reduce parser that scans input Left-to-right and produces a Rightmost derivation in reverse. It consists of:

  1. Input buffer — holds the remaining input string followed by the end marker $.
  2. Stack — stores a sequence of states (and grammar symbols). The top state summarises all the information needed to make parsing decisions.
  3. Parsing program (driver) — a fixed routine, the same for every LR grammar.
  4. Parsing table — divided into two parts:
    • ACTION[state, terminal]shift, reduce, accept, or error.
    • GOTO[state, nonterminal] → next state after a reduction.

Working: Using the state on top of the stack and the current input symbol, the driver consults ACTION. On shift it pushes the symbol/state; on reduce A → β it pops 2·|β| entries, then pushes GOTO[top, A]; on accept parsing succeeds.

LR(0) Item Sets for the Grammar

Augmented grammar:

SS,SAA,AaAbS' \to S,\quad S \to AA,\quad A \to aA \mid b

Using CLOSURE and GOTO:

  • I0 = CLOSURE({S'→·S}): S'→·S, S→·AA, A→·aA, A→·b
  • I1 = GOTO(I0,S): S'→S·
  • I2 = GOTO(I0,A): S→A·A, A→·aA, A→·b
  • I3 = GOTO(I0,a): A→a·A, A→·aA, A→·b
  • I4 = GOTO(I0,b): A→b·
  • I5 = GOTO(I2,A): S→AA·
  • I6 = GOTO(I3,A): A→aA·
  • GOTO(I2,a)=I3, GOTO(I2,b)=I4, GOTO(I3,a)=I3, GOTO(I3,b)=I4

There are 7 LR(0) states (I0–I6). The DFA of items recognises viable prefixes; states I1, I4, I5, I6 contain reducing/accepting items.

lr-parsing
2long10 marks

What is syntax-directed translation? Explain syntax-directed definitions with an example of a translation scheme.

Syntax-Directed Translation (SDT)

Syntax-directed translation is a method of translating a source program in which semantic rules (actions) are attached to the productions of a context-free grammar. As the parser recognises grammar productions, the associated semantic rules are evaluated to compute attribute values, build the syntax tree, generate intermediate code, etc. Translation is thus driven by the syntax of the language.

Syntax-Directed Definitions (SDD)

An SDD is a CFG together with attributes associated with grammar symbols and semantic rules associated with productions. Attributes are of two kinds:

  • Synthesized attribute — computed from the attributes of the children (bottom-up).
  • Inherited attribute — computed from the parent and/or sibling attributes (top-down).

An SDD specifies what to compute without specifying when/order; the dependency graph determines evaluation order.

Translation Scheme (Example)

A translation scheme is an SDD in which semantic actions are enclosed in { } and embedded at specific positions within the right side of productions, indicating the order of evaluation.

Example — evaluating an arithmetic expression (synthesized attribute val):

L → E         { print(E.val) }
E → E1 + T    { E.val = E1.val + T.val }
E → T         { E.val = T.val }
T → T1 * F    { T.val = T1.val * F.val }
T → F         { T.val = F.val }
F → ( E )     { F.val = E.val }
F → digit     { F.val = digit.lexval }

For input 3 + 4 * 5, the actions compute T.val = 20, then E.val = 3 + 20 = 23, and L prints 23. The actions fire as each production is reduced, demonstrating bottom-up syntax-directed translation.

syntax-directed-translation
3long10 marks

Discuss the specification and recognition of tokens. Explain the role of finite automata and input buffering in lexical analysis.

Specification of Tokens

Tokens are specified using regular expressions (and regular definitions). A token is a category (e.g. id, num, relop), a pattern is the rule describing its form, and a lexeme is the actual string matched. Example regular definitions:

digit  → [0-9]
letter → [A-Za-z]
id     → letter (letter | digit)*
num    → digit+ ( . digit+ )? ( E [+-]? digit+ )?

Recognition of Tokens

Regular expressions are converted to a transition diagram / finite automaton that the scanner simulates to recognise lexemes. On reaching an accepting state the longest matching lexeme is reported as a token; ties are broken by the longest-match and rule-priority (first listed) conventions.

Role of Finite Automata

The construction pipeline is:

RE    NFA (Thompson)    DFA (subset construction)    minimised DFA\text{RE} \;\to\; \text{NFA (Thompson)} \;\to\; \text{DFA (subset construction)} \;\to\; \text{minimised DFA}
  • An NFA allows ε-moves and is easy to build from an RE but is non-deterministic.
  • A DFA has exactly one move per (state, input) and is what the lexer actually executes — fast, O(n) in input length.

Tools like LEX/Flex automatically build a DFA from the supplied regular expressions.

Role of Input Buffering

Reading the source one character at a time from disk is slow, and a lexer often needs look-ahead (e.g. to tell = from ==). To make this efficient, the lexer uses a two-buffer scheme with two halves of size N. Two pointers are maintained:

  • lexemeBegin — start of the current lexeme.
  • forward — scans ahead to find the end of the lexeme.

When forward crosses the end of one half, the other half is reloaded. The sentinel (eof) technique places an eof marker at the end of each half so that a single test detects both end-of-buffer and end-of-input, reducing comparisons per character.

lexical-analysis
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

What is three-address code? Write the three-address code for x = (a + b) * (c + d).

Three-address code (TAC) is an intermediate representation in which each instruction has at most three addresses (operands) — typically of the form x = y op z, using at most one operator on the right-hand side. Complex expressions are broken into a sequence of such instructions using temporary variables, which makes it easy to rearrange, optimise, and generate target code.

Three-address code for x = (a + b) * (c + d):

t1 = a + b
t2 = c + d
t3 = t1 * t2
x  = t3

Here t1, t2, t3 are compiler-generated temporaries.

intermediate-code
5short5 marks

Explain peephole optimization with an example.

Peephole Optimization

Peephole optimization is a simple, local code-improvement technique that examines a short sliding window (the peephole) of a few consecutive target/intermediate instructions and replaces them with a shorter or faster equivalent sequence. It is applied repeatedly until no further improvement is possible.

Common transformations:

  • Redundant load/store elimination — remove LD R, a immediately after ST a, R.
  • Constant folding — replace x = 2 * 3 with x = 6.
  • Algebraic simplification — remove x = x + 0 or x = x * 1.
  • Strength reduction — replace x = y * 2 with x = y + y or a shift.
  • Dead-code / unreachable-code elimination.
  • Flow-of-control optimization — remove jumps to jumps.

Example (redundant store/load):

MOV R0, a      MOV R0, a
MOV a,  R0  →  ... (second MOV removed)

The instruction MOV a, R0 is unnecessary because a already holds the value just loaded into R0, so it is deleted, producing faster code.

code-optimization
6short5 marks

Differentiate between synthesized and inherited attributes.

Synthesized vs Inherited Attributes

AspectSynthesized AttributeInherited Attribute
DefinitionValue computed from attributes of the children of the nodeValue computed from attributes of the parent and/or siblings
Direction of flowBottom-up (leaves → root)Top-down / sideways (root, siblings → node)
Production formFor A → X Y Z, A.s depends on X,Y,ZFor A → X Y Z, X.i (or Y.i, Z.i) depends on A and other RHS symbols
Parsing fitNaturally evaluated during bottom-up (LR) parsingNaturally suited to top-down parsing / passing context down
Grammar classAn SDD using only synthesized attributes is S-attributedA grammar with restricted inherited attributes is L-attributed
ExampleE → E1 + T { E.val = E1.val + T.val }D → T L { L.type = T.type } (type passed down to the list)

In short, synthesized attributes carry information up the parse tree, whereas inherited attributes carry context down (or across) the tree.

syntax-directed-translation
7short5 marks

What is bootstrapping in compiler construction?

Bootstrapping

Bootstrapping is the technique of writing a compiler (or part of it) for a language in the same language it is meant to compile, and then using a smaller existing compiler to build a full self-hosting compiler in stages.

The idea is described with T-diagrams showing source language S, target/implementation language, and the implementing language. A typical bootstrapping process:

  1. Write a small/simple compiler for a subset of language L in an existing language (e.g. assembly or C). Call it C0C_0.
  2. Rewrite the compiler for L in L itself, and compile this source using C0C_0 to obtain a working compiler C1C_1.
  3. Use C1C_1 to compile its own source again, optionally adding optimisations, producing an improved, fully self-hosting compiler C2C_2.

Advantages: the compiler becomes self-hosting, it serves as a large test program for the language, and improvements to the compiler automatically benefit later versions.

Example: an early C compiler was first written in assembly, then rewritten in C and compiled by the earlier version.

bootstrapping
8short5 marks

Construct an NFA for the regular expression (a|b)*abb.

NFA for (a|b)*abb

Using Thompson's construction, an NFA (with ε-moves) for the regular expression (a|b)*abb accepts any string of a's and b's that ends in the suffix abb.

A compact equivalent NFA (after collapsing the (a|b)* loop into a single self-looping start state) has states 0–3:

Stateon aon b
→0 (start){0, 1}{0}
1{2}
2{3}
*3 (accept)

Description of the transition diagram:

  • State 0 has a self-loop on both a and b (this realises (a|b)*).
  • On a, state 0 can also move to state 1 (begin matching abb).
  • State 1 --b--> State 2, State 2 --b--> State 3.
  • State 3 is the single accepting state.

The nondeterminism at state 0 (staying in the loop vs. starting abb) is exactly what an NFA permits. For example, the string aabb is accepted: 0 --a--> 0 --a--> 1 --b--> 2 --b--> 3.

finite-automatalexical-analysis
9short5 marks

What is operator precedence parsing?

Operator Precedence Parsing

Operator-precedence parsing is a bottom-up (shift-reduce) parsing technique applicable to operator grammars — grammars in which no production has two adjacent nonterminals on the right side and no ε-production. It is mainly used for parsing arithmetic and boolean expressions.

The parser defines three precedence relations between pairs of terminals:

  • aba \lessdot b : a yields precedence to b (b has higher precedence / starts a handle)
  • aba \doteq b : a and b have equal precedence (same handle)
  • aba \gtrdot b : a takes precedence over b (end of a handle — reduce)

These relations are stored in an operator-precedence table. During parsing, the relation between the top-of-stack terminal and the current input terminal decides whether to shift (\lessdot or \doteq) or reduce (\gtrdot) the handle delimited by ⋖ … ⋗.

Advantages: simple and easy to construct by hand; Disadvantages: works only for a small class of grammars, cannot handle the unary minus / token ambiguity well, and may accept some erroneous inputs.

parsing
10short5 marks

Explain activation records and their role in run-time storage management.

Activation Records and Run-Time Storage Management

An activation record (or stack frame) is a contiguous block of storage that holds all the information needed for a single execution (activation) of a procedure. Each time a procedure is called, a new activation record is pushed onto the run-time control stack, and it is popped when the procedure returns. This naturally supports nested and recursive calls.

Typical fields of an activation record (from top to bottom):

  • Return value — space for the value returned to the caller.
  • Actual parameters — arguments supplied by the caller.
  • Control link (dynamic link) — pointer to the caller's activation record.
  • Access link (static link) — pointer used to reach non-local data in enclosing scopes.
  • Saved machine status — return address and saved registers.
  • Local data — local variables of the procedure.
  • Temporaries — values generated during expression evaluation.

Role in storage management: activation records implement stack-based allocation. The stack pointer / frame pointer is adjusted on call and return, giving automatic, efficient allocation and deallocation of locals, parameters, and bookkeeping data, and enabling recursion since each call gets its own private frame.

runtime-environment
11short5 marks

What is a basic block and a flow graph?

Basic Block and Flow Graph

Basic block: A basic block is a maximal sequence of consecutive intermediate-code (three-address) instructions such that:

  • Control enters only at the first instruction (the leader), and
  • Control leaves only at the last instruction,

with no jumps into or out of the middle. Thus the whole block executes as a straight-line unit.

Leaders are identified as: (1) the first instruction; (2) any target of a (conditional/unconditional) jump; (3) any instruction immediately following a jump. A basic block runs from a leader up to (but not including) the next leader.

Flow graph: A flow graph (control-flow graph) is a directed graph whose nodes are basic blocks and whose edges represent the flow of control. There is an edge from block B1B_1 to block B2B_2 if control can pass from the last statement of B1B_1 to the first statement of B2B_2 (i.e. B2B_2 is the jump target, or B2B_2 immediately follows B1B_1 and B1B_1 does not end in an unconditional jump). An entry and sometimes an exit node are added.

Flow graphs are the basis for control-flow and data-flow analysis used in code optimization (e.g. loop detection, dead-code elimination).

code-optimization
12short5 marks

Differentiate between top-down and bottom-up parsing.

Top-Down vs Bottom-Up Parsing

AspectTop-Down ParsingBottom-Up Parsing
DirectionBuilds the parse tree from the root (start symbol) down to the leavesBuilds the parse tree from the leaves (input) up to the root
DerivationProduces a leftmost derivationProduces a rightmost derivation in reverse
Basic operationExpand a nonterminal by predicting/matching productionsShift symbols and reduce handles
Also calledPredictive / recursive-descent parsingShift-reduce parsing
Grammar restrictionsCannot handle left recursion; needs left-factoringHandles left recursion; more powerful
ExamplesRecursive-descent, LL(1)LR(0), SLR, LALR, CLR, operator-precedence
PowerAccepts a smaller class of grammarsAccepts a larger class of grammars

In short, top-down parsing starts from the start symbol and tries to derive the input (predictive), whereas bottom-up parsing starts from the input tokens and reduces them back to the start symbol (shift-reduce).

parsing

Frequently asked questions

Where can I find the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) question paper 2075?
The full BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2075 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Compiler Design and Construction (BSc CSIT, CSC365) 2075 paper come with solutions?
Yes. Every question on this Compiler Design and Construction (BSc CSIT, CSC365) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2075 paper?
The BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Compiler Design and Construction (BSc CSIT, CSC365) past paper free?
Yes — reading and attempting this Compiler Design and Construction (BSc CSIT, CSC365) past paper on Kekkei is completely free.