Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Discuss the specification and recognition of tokens. Explain the role of finite automata and input buffering in lexical analysis.

Specification and Recognition of Tokens

Lexical analysis is the first phase of a compiler. It reads the source program character-by-character and groups characters into meaningful units called tokens.

Specification of Tokens

Tokens are specified using regular expressions (REs), which describe the lexical patterns of a language.

  • Alphabet (Σ\Sigma): finite set of symbols.
  • String / Lexeme: an actual sequence of characters matching a pattern.
  • Pattern: rule describing the set of lexemes a token can represent.
  • Token: a category (identifier, number, keyword, operator, etc.).

Example regular definitions:

letterABz,digit019letter \rightarrow A|B|\dots|z, \quad digit \rightarrow 0|1|\dots|9 idletter(letterdigit)id \rightarrow letter\,(letter\,|\,digit)^{*} numdigit+(.digit+)?(E(+)?digit+)?num \rightarrow digit^{+}(.\,digit^{+})?(E(+|-)?digit^{+})?

Recognition of Tokens

Once specified, tokens are recognized using transition diagrams / finite automata. The lexer matches the longest possible prefix of the input against the patterns (the longest-match / maximal-munch rule) and, on ties, uses the rule listed first. Recognition steps:

  1. Convert each RE to an NFA (Thompson's construction).
  2. Convert the NFA to a DFA (subset construction) and minimize it.
  3. Use the DFA as a table-driven scanner that emits <token, attribute-value> pairs.

Role of Finite Automata

  • NFA / DFA are the computational models used to recognize the regular languages described by token patterns.
  • A DFA is preferred for the actual scanner because it has exactly one transition per (state, symbol), giving deterministic, O(n)O(n) scanning.
  • Tools like Lex/Flex internally build a DFA from the supplied regular expressions to drive token recognition.

Role of Input Buffering

Reading one character at a time from disk is slow, so the scanner uses input buffering to read blocks of characters efficiently.

  • Two-buffer scheme: two halves of size NN each; when one half is exhausted, the other is reloaded, so the scanner can look ahead without losing earlier characters (needed because some tokens are recognized only after seeing extra characters, e.g. < vs <=).
  • Sentinels: a special end-of-buffer marker (e.g. eof) is placed at the end of each half so a single test per character handles both end-of-buffer and end-of-input, halving the number of comparisons.
  • Two pointers are maintained: lexemeBegin (start of current lexeme) and forward (scans ahead). On recognizing a token, lexemeBegin is moved to forward.

Conclusion

Tokens are specified with regular expressions and recognized by finite automata (DFA), while input buffering with sentinels makes the character-level scanning fast and supports the look-ahead that token recognition requires.

lexical-analysis
2long10 marks

Explain top-down parsing. Construct the predictive (LL(1)) parsing table for the grammar E -> E+T | T, T -> T*F | F, F -> (E) | id after removing left recursion.

Top-Down Parsing

Top-down parsing builds the parse tree from the root (start symbol) down to the leaves (terminals), attempting a leftmost derivation of the input. The two main approaches are:

  • Recursive-descent parsing (may use backtracking).
  • Predictive / LL(1) parsing — a non-backtracking, table-driven method that uses one symbol of look-ahead. It requires the grammar to be free of left recursion and to be left-factored.

Step 1: Remove Left Recursion

Given grammar:

EE+TT,TTFF,F(E)idE \to E+T \mid T, \quad T \to T*F \mid F, \quad F \to (E) \mid id

After eliminating left recursion:

ETEE \to T\,E' E+TEεE' \to +T\,E' \mid \varepsilon TFTT \to F\,T' TFTεT' \to *F\,T' \mid \varepsilon F(E)idF \to (E) \mid id

Step 2: FIRST and FOLLOW Sets

Non-terminalFIRSTFOLLOW
EE{ (, id }{ ), $ }
EE'{ +, ε\varepsilon }{ ), $ }
TT{ (, id }{ +, ), $ }
TT'{ *, ε\varepsilon }{ +, ), $ }
FF{ (, id }{ +, *, ), $ }

Step 3: LL(1) Predictive Parsing Table

Entry M[A, a] holds the production used when non-terminal AA is on the stack and aa is the look-ahead.

id+*()$
EETEE\to TE'ETEE\to TE'
E'E+TEE'\to +TE'EεE'\to\varepsilonEεE'\to\varepsilon
TTFTT\to FT'TFTT\to FT'
T'TεT'\to\varepsilonTFTT'\to *FT'TεT'\to\varepsilonTεT'\to\varepsilon
FFidF\to idF(E)F\to (E)

Blank cells denote error entries. Since no cell has more than one production, the grammar is LL(1) and can be parsed top-down without backtracking.

top-down-parsingll1
3long10 marks

Explain code optimization. Discuss the principal sources of optimization and the various machine-independent optimization techniques.

Code Optimization

Code optimization is the compiler phase that transforms intermediate (or target) code into a form that runs faster and/or uses fewer resources, while preserving the program's meaning (semantic equivalence). Optimization should never change the output of the program; it only improves efficiency.

Principal Sources of Optimization

These are the situations the optimizer exploits, mostly local and global transformations on basic blocks and flow graphs:

  1. Common Sub-expression Elimination — reuse an already-computed value instead of recomputing it. e.g. compute t = a*b once.
  2. Copy Propagation — after x = y, replace later uses of x by y.
  3. Dead-Code Elimination — remove instructions whose results are never used.
  4. Constant Folding & Constant Propagation — evaluate constant expressions at compile time (e.g. 3*4 → 12) and propagate constants.
  5. Loop Optimizations, which are especially valuable because loops run repeatedly:
    • Code Motion / Loop-Invariant Code Motion — move computations that don't change inside the loop to outside it.
    • Induction-Variable Elimination — simplify variables that change in lock-step.
    • Strength Reduction — replace costly operations by cheaper ones (e.g. replace multiplication i*4 by repeated addition).

Machine-Independent Optimization Techniques

These improve intermediate code without knowledge of the target machine:

TechniqueDescription
Common sub-expression eliminationAvoid recomputing identical expressions
Constant foldingPre-compute constant expressions
Copy propagationSubstitute equals for equals
Dead-code eliminationDrop unreachable / unused code
Loop-invariant code motionHoist invariant code out of loops
Strength reductionReplace expensive ops with cheaper ones
Algebraic simplificationUse identities, e.g. x+0=x, x*1=x, x*0=0
Loop unrolling / jammingReduce loop overhead

(In contrast, machine-dependent optimizations such as register allocation, instruction selection and peephole optimization depend on the target architecture.)

Conclusion

Code optimization improves execution time and memory use; its principal sources lie in redundant, constant, dead and loop computations, and machine-independent techniques remove these inefficiencies at the intermediate-code level.

code-optimization
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

What is three-address code? Write the three-address code for x = (a + b) * (c + d).

Three-Address Code (TAC)

Three-address code is an intermediate representation in which each instruction has at most three operands and one operator, usually in the form:

x=y  op  zx = y \; op \; z

where x is the result and y, z are operands or constants. Complex expressions are broken down using temporary variables (t1,t2,t_1, t_2, \dots). It is easy to optimize and to translate to machine code.

TAC for x = (a + b) * (c + d)

t1 = a + b
t2 = c + d
t3 = t1 * t2
x  = t3

Each line has at most three addresses, and temporaries hold the intermediate results.

intermediate-code
5short5 marks

Explain peephole optimization with an example.

Peephole Optimization

Peephole optimization is a simple, machine-dependent optimization that examines a small sliding window (the peephole) of a few consecutive target/intermediate instructions and replaces them with a shorter or faster equivalent sequence. It is usually applied repeatedly until no further improvement is possible.

Common peephole transformations

  • Redundant load/store elimination — remove a store immediately followed by a load of the same value.
  • Elimination of unreachable / dead code.
  • Flow-of-control optimization — collapse jumps to jumps.
  • Algebraic simplificationx = x + 0, x = x * 1 removed.
  • Strength reduction — replace x = x * 2 with x = x + x or a shift.

Example (redundant load/store)

Before:

MOV R0, a
MOV a, R0

The second instruction stores back the value just loaded, so it is redundant and can be deleted. After:

MOV R0, a

This shortens the code and removes an unnecessary memory access.

code-optimization
6short5 marks

Differentiate between synthesized and inherited attributes.

Synthesized vs. Inherited Attributes

In syntax-directed translation, every grammar symbol can carry attributes whose values are computed by semantic rules attached to productions.

AspectSynthesized AttributeInherited Attribute
Value computed fromAttributes of the node's children (and itself)Attributes of the node's parent and/or siblings
Direction of flowBottom-up (leaves → root)Top-down / sideways (root → leaves)
Associated withTypically the left-hand side of a productionTypically a right-hand-side non-terminal
Grammar classAn S-attributed grammar uses only synthesized attributesL-attributed grammars allow both (inherited from left siblings)
EvaluationCan be evaluated during bottom-up (LR) parsingSuited to top-down (LL) parsing
ExampleE.val computed from E.val + T.valType info passed down in int id, id;

Summary: A synthesized attribute gets its value from below (children), whereas an inherited attribute gets its value from above or beside (parent/siblings).

syntax-directed-translation
7short5 marks

What is bootstrapping in compiler construction?

Bootstrapping

Bootstrapping is the technique of writing a compiler for a language in (a subset of) the same language it is meant to compile, and then using an initial/simple version of the compiler to build the full, self-hosting compiler.

Idea

To compile a language LL to target machine MM using a compiler written in LL, we first need a way to run some compiler — this is solved in stages.

T-diagram notation

A compiler is described by a T-diagram STS \rightarrow T written in II, meaning it translates source SS to target TT and is itself implemented in language II.

Typical bootstrapping steps

  1. Write a small compiler for a subset SS of language LL in an existing available language (e.g. assembly or C). Call it C0C_0.
  2. Use C0C_0 to compile a fuller compiler for LL that is written in the subset SS, producing a working compiler C1C_1.
  3. Use C1C_1 to compile the complete compiler written in LL itself, yielding the final self-hosting compiler.

Advantages

  • Demonstrates the language is powerful enough to write its own compiler.
  • Reduces dependence on other languages; eases porting to new machines (cross-compilation + bootstrapping).

Example: The GCC C compiler and many language compilers are bootstrapped — an early C compiler compiles a C-written C compiler.

bootstrapping
8short5 marks

Construct an NFA for the regular expression (a|b)*abb.

NFA for (a|b)*abb

We build the NFA using Thompson's construction. The expression accepts any string over {a, b} that ends in abb.

States and transitions (described)

Let states be q0,q1,q2,q3q_0, q_1, q_2, q_3 where q0q_0 is the start state and q3q_3 is the only accepting (final) state.

  • q0q_0 (start): on a go to {q0,q1}\{q_0, q_1\}; on b go to {q0}\{q_0\}. (The self-loops on both a and b realize the (a|b)* prefix.)
  • q1q_1: on b go to q2q_2.
  • q2q_2: on b go to q3q_3.
  • q3q_3 (final): no outgoing transitions needed.

Transition table

Stateab
q0q_0{q0,q1}\{q_0, q_1\}{q0}\{q_0\}
q1q_1{q2}\{q_2\}
q2q_2{q3}\{q_3\}
*q3q_3

Diagram (in words)

q0q_0 has self-loops labelled a and b. From q0q_0 an a edge goes to q1q_1, a b edge from q1q_1 to q2q_2, and a b edge from q2q_2 to q3q_3 (accepting). Thus the automaton can consume any number of a/b symbols in q0q_0 and then accept only after reading the final abb.

This NFA correctly recognizes the language (ab)abb(a|b)^*abb.

finite-automatalexical-analysis
9short5 marks

What is operator precedence parsing?

Operator Precedence Parsing

Operator precedence parsing is a bottom-up (shift-reduce) parsing technique applicable to a restricted class of grammars called operator grammars — grammars in which no production has two adjacent non-terminals on its right-hand side and no ε\varepsilon-production.

Key idea

Parsing decisions are guided by precedence relations defined between pairs of terminals (operators). Three relations are used:

RelationMeaning
aba \lessdot baa yields precedence to bb (shift)
aba \doteq baa has equal precedence with bb
aba \gtrdot baa takes precedence over bb (reduce)

These relations are stored in a precedence table. During parsing, the relation between the top-of-stack terminal and the input terminal decides whether to shift or to reduce the handle (the substring between \lessdot and \gtrdot).

Characteristics

  • Simple to implement by hand; suited to expression grammars with operators like +, *.
  • Advantages: easy, fast, no large parse tables.
  • Limitations: works only for a small class of grammars; cannot handle grammars with ε\varepsilon-productions or adjacent non-terminals; difficulty handling the unary minus.
parsing
10short5 marks

Explain activation records and their role in run-time storage management.

Activation Records

An activation record (also called a stack frame) is a block of memory allocated on the run-time stack for each invocation (activation) of a procedure/function. It holds all the information needed to manage that single call.

Typical contents (fields)

FieldPurpose
Return valueSpace to return a result to the caller
Actual parametersArguments passed by the caller
Control linkPointer to the caller's activation record (dynamic link)
Access / static linkPointer for accessing non-local data (for nested scopes)
Saved machine statusReturn address, saved registers
Local variablesLocals declared in the procedure
TemporariesIntermediate values during expression evaluation

Role in Run-Time Storage Management

  • Stack allocation: when a procedure is called, its activation record is pushed onto the stack; when it returns, the record is popped. This naturally supports the last-in-first-out nesting of calls, including recursion (each recursive call gets its own fresh record).
  • The control link restores the caller's frame on return, and the access link lets a nested procedure reference variables in enclosing scopes.
  • The compiler computes the offset of each local/parameter within the record so they can be addressed relative to a frame pointer.

Thus activation records are the basic unit of the run-time stack and enable correct memory management for procedure calls and returns.

runtime-environment
11short5 marks

What is a basic block and a flow graph?

Basic Block and Flow Graph

Basic Block

A basic block is a maximal sequence of consecutive intermediate-code (three-address) statements with the properties:

  • Control enters only at the first statement (no jumps into the middle).
  • Control leaves only at the last statement (no jumps out of the middle). Thus once execution starts at the top, all statements run in order without branching.

Identifying leaders (first statement of a block):

  1. The very first statement of the program.
  2. Any statement that is the target of a jump.
  3. Any statement immediately following a jump (conditional or unconditional). Each leader and the statements up to (but not including) the next leader form one basic block.

Flow Graph

A flow graph is a directed graph in which:

  • Nodes are the basic blocks.
  • Edges represent the possible flow of control — an edge from block B1B_1 to B2B_2 exists if execution can pass from the end of B1B_1 to the start of B2B_2 (via fall-through or a jump to B2B_2's leader). It has a designated initial (entry) node (the block with the program's first statement).

Use

Basic blocks and flow graphs are the foundation of control-flow analysis and code optimization (loop detection, data-flow analysis, register allocation).

code-optimization
12short5 marks

Differentiate between top-down and bottom-up parsing.

Top-Down vs. Bottom-Up Parsing

AspectTop-Down ParsingBottom-Up Parsing
Parse-tree constructionFrom the root (start symbol) to leavesFrom the leaves to the root
Derivation producedLeftmost derivationReverse of a rightmost derivation
Basic actionPredict / expand a productionShift–reduce (reduce handles)
Look-ahead useChooses production using look-ahead (LL)Decides when to reduce a handle (LR)
Grammar restrictionsNeeds no left recursion and left-factoringHandles a larger class of grammars, including left recursion
Typical methodsRecursive-descent, LL(1) predictive parsingOperator-precedence, LR(0)/SLR/LALR/CLR parsing
PowerLess powerfulMore powerful
ImplementationEasier to write by handUsually generated by tools (e.g. YACC)

Summary: Top-down parsing predicts the input by expanding the start symbol (leftmost derivation), while bottom-up parsing reduces the input tokens back to the start symbol (reverse rightmost derivation) and accepts a wider class of grammars.

parsing

Frequently asked questions

Where can I find the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) question paper 2080?
The full BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper come with solutions?
Yes. Every question on this Compiler Design and Construction (BSc CSIT, CSC365) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper?
The BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Compiler Design and Construction (BSc CSIT, CSC365) past paper free?
Yes — reading and attempting this Compiler Design and Construction (BSc CSIT, CSC365) past paper on Kekkei is completely free.