Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Explain top-down parsing. Construct the predictive (LL(1)) parsing table for the grammar E -> E+T | T, T -> T*F | F, F -> (E) | id after removing left recursion.

Top-Down Parsing

Top-down parsing builds the parse tree from the root (start symbol) down to the leaves (terminals), trying to find a leftmost derivation of the input string. At each step it predicts which production to apply for a non-terminal. Predictive parsing (LL(1)) is a non-backtracking, table-driven form of top-down parsing that uses one lookahead symbol and the FIRST/FOLLOW sets to decide the production deterministically.

Step 1: Remove Left Recursion

Given grammar:

EE+TT,TTFF,F(E)idE \to E+T \mid T,\quad T \to T*F \mid F,\quad F \to (E) \mid id

Applying the rule AAαβAβA, AαAεA \to A\alpha \mid \beta \Rightarrow A \to \beta A',\ A' \to \alpha A' \mid \varepsilon:

ETEE \to T\,E' E+TEεE' \to +T\,E' \mid \varepsilon TFTT \to F\,T' TFTεT' \to *F\,T' \mid \varepsilon F(E)idF \to (E) \mid id

Step 2: Compute FIRST and FOLLOW

SymbolFIRSTFOLLOW
EE{(, id}\{\,(,\ id\,\}\{\,),\ \,}$
EE'{+, ε}\{\,+,\ \varepsilon\,\}\{\,),\ \,}$
TT{(, id}\{\,(,\ id\,\}\{\,+,\ ),\ \,}$
TT'{, ε}\{\,*,\ \varepsilon\,\}\{\,+,\ ),\ \,}$
FF{(, id}\{\,(,\ id\,\}\{\,*,\ +,\ ),\ \,}$

Step 3: LL(1) Predictive Parsing Table

For each production AαA\to\alpha, place it under terminals in FIRST(α)FIRST(\alpha); if εFIRST(α)\varepsilon\in FIRST(\alpha), also under FOLLOW(A)FOLLOW(A).

NTid+*()$
EETEE\to TE'ETEE\to TE'
E'E+TEE'\to +TE'EεE'\to\varepsilonEεE'\to\varepsilon
TTFTT\to FT'TFTT\to FT'
T'TεT'\to\varepsilonTFTT'\to *FT'TεT'\to\varepsilonTεT'\to\varepsilon
FFidF\to idF(E)F\to (E)

Since no cell contains more than one entry, the grammar is LL(1) and can be parsed top-down without backtracking.

top-down-parsingll1
2long10 marks

Explain code optimization. Discuss the principal sources of optimization and the various machine-independent optimization techniques.

Code Optimization

Code optimization is the compiler phase that transforms intermediate (or target) code into a form that runs faster and/or uses less memory, while preserving the program's meaning (semantics). A good optimization must be correct, beneficial on average, and worth its compile-time cost.

Principal Sources of Optimization

Optimizations exploit redundancy and predictable behaviour in code:

  1. Common Sub-expression Elimination — if an expression is already computed and operands are unchanged, reuse the result instead of recomputing.
  2. Dead-code Elimination — remove code whose results are never used.
  3. Copy Propagation — replace uses of a variable after x = y with y.
  4. Constant Folding / Propagation — evaluate constant expressions at compile time (e.g. 3*4 → 12).
  5. Loop Optimizations — the biggest payoff, since inner loops run most often:
    • Code Motion (Loop-invariant code motion): move computations that don't change inside a loop to before it.
    • Induction Variable Elimination: simplify variables that change by a fixed amount each iteration.
    • Strength Reduction: replace expensive operations with cheaper ones (e.g. i*4 → i+i+i+i or repeated addition instead of multiplication).

Machine-Independent Optimization Techniques

These do not depend on the target machine's registers or instruction set:

TechniqueEffect
Constant foldingCompute constant expressions at compile time
Common sub-expression eliminationAvoid recomputing identical expressions
Copy propagationSubstitute copies to expose further optimization
Dead-code eliminationDelete unreachable/unused code
Loop-invariant code motionHoist invariant code out of loops
Strength reductionReplace costly operators with cheaper ones
Induction-variable eliminationRemove redundant loop counters

Levels: optimization can be local (within a basic block), global (across blocks using a flow graph and data-flow analysis), or loop-level. In contrast, machine-dependent optimizations (register allocation, instruction selection, peephole optimization) depend on the target hardware.

code-optimization
3long10 marks

Discuss target code generation. Explain register allocation and the issues in the design of a code generator.

Target Code Generation

Code generation is the final phase of a compiler. It takes the optimized intermediate representation (e.g. three-address code) and produces equivalent target code (assembly or machine code). The generated code must be correct, of high quality (fast/compact), and produced efficiently.

A typical instruction has the form:

LD  reg, mem      ; load
ST  mem, reg      ; store
OP  reg, reg, reg ; arithmetic

Register Allocation and Assignment

Registers are the fastest storage, but few in number, so their use must be planned:

  • Register allocation decides which variables/values reside in registers at each point in the program.
  • Register assignment decides the specific physical register for each chosen value.

Good allocation minimizes loads/stores (memory traffic). Common approaches include usage counts, graph-coloring of the register-interference graph, and keeping the most frequently used values (especially loop variables) in registers. The problem of optimal allocation is NP-hard, so heuristics are used.

Issues in the Design of a Code Generator

  1. Input to the code generator — the IR (three-address code, trees, DAGs) plus the symbol table for addresses; assumes earlier phases produced error-free, type-checked input.
  2. Target program / instruction set — output may be absolute machine code, relocatable code, or assembly; the richness and regularity of the instruction set affects code quality.
  3. Instruction selection — map IR operations to machine instructions; uniformity and cost of instructions matter (e.g. INC vs ADD #1).
  4. Register allocation and assignment — using limited registers efficiently (as above).
  5. Evaluation order — the order in which computations are performed affects the number of registers needed and the efficiency of the code.
  6. Memory management / addressing modes — mapping names to runtime addresses; exploiting available addressing modes.
  7. Approach / target-machine details — the design must handle the specifics of the target architecture while keeping the generator maintainable.

A simple code generator processes one basic block at a time, using a register descriptor (what each register holds) and address descriptor (where each value's current value can be found) to emit minimal load/store instructions.

code-generation
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Differentiate between a compiler and an interpreter.

Compiler vs Interpreter

AspectCompilerInterpreter
TranslationTranslates the whole source program into machine/target code at onceTranslates and executes the program line by line
OutputProduces a separate object/executable fileProduces no separate object file
Execution speedFaster (translated once, runs many times)Slower (re-translated each run)
Error reportingReports all errors after scanning the whole programStops at the first error it encounters
MemoryNeeds more memory (generates object code)Generally needs less memory
ExamplesC, C++ compilersPython, BASIC interpreters

Summary: A compiler converts the entire program before execution, giving faster runtime; an interpreter translates and runs statement by statement, making debugging easier but execution slower.

compiler
5short5 marks

Eliminate left recursion and perform left factoring on the grammar A -> Aab | Ac | bd | f.

Grammar: AAabAcbdfA \to Aab \mid Ac \mid bd \mid f

Step 1: Eliminate Left Recursion

The grammar is left recursive (productions AAabA\to Aab and AAcA\to Ac). Using AAα1Aα2β1β2A \to A\alpha_1 \mid A\alpha_2 \mid \beta_1 \mid \beta_2 where:

  • α1=ab, α2=c\alpha_1 = ab,\ \alpha_2 = c (recursive parts)
  • β1=bd, β2=f\beta_1 = bd,\ \beta_2 = f (non-recursive parts)

The rule gives Aβ1Aβ2AA \to \beta_1 A' \mid \beta_2 A' and Aα1Aα2AεA' \to \alpha_1 A' \mid \alpha_2 A' \mid \varepsilon:

AbdAfAA \to bd\,A' \mid f\,A' AabAcAεA' \to ab\,A' \mid c\,A' \mid \varepsilon

Step 2: Left Factoring

Examine each non-terminal for common prefixes among alternatives.

  • AbdAfAA \to bd\,A' \mid f\,A' — the alternatives begin with bb and ff, which are different, so no common prefix; no factoring needed.
  • AabAcAεA' \to ab\,A' \mid c\,A' \mid \varepsilon — alternatives begin with aa, cc, and ε\varepsilon, all distinct; no common prefix.

Final Grammar

AbdAfAA \to bd\,A' \mid f\,A' AabAcAεA' \to ab\,A' \mid c\,A' \mid \varepsilon

The grammar is now free of left recursion, and no left factoring is required because no two alternatives share a common prefix.

grammarparsing
6short5 marks

What is a symbol table? Explain its role in a compiler.

Symbol Table

A symbol table is a data structure maintained by the compiler that stores information about each identifier (variables, functions, constants, objects, etc.) occurring in the source program. For each name it records attributes such as the name, type, scope, memory location/offset, dimensions/size, and number/type of arguments (for functions).

Role in a Compiler

The symbol table is used by almost every phase of the compiler:

  1. Lexical analysis — enters newly seen identifiers (names) into the table.
  2. Syntax analysis — adds structural/scope information as declarations are parsed.
  3. Semantic analysis — uses stored types to perform type checking and to detect undeclared or multiply-declared variables.
  4. Intermediate / code generation — supplies the storage addresses, offsets, and types needed to generate correct code.
  5. Optimization & error handling — provides scope and lifetime information.

It supports operations like insert (add a name with attributes) and lookup (retrieve attributes). It is commonly implemented using a hash table (for fast access), linked lists, or balanced trees, and must handle scope and nesting (e.g. using a stack of tables for block-structured languages).

symbol-table
7short5 marks

Differentiate between a parse tree and a syntax tree.

Parse Tree vs Syntax Tree

Both represent the structure of a string, but at different levels of abstraction.

AspectParse Tree (Concrete Syntax Tree)Syntax Tree (Abstract Syntax Tree)
ContentsContains every grammar symbol — all non-terminals and terminals used in the derivationContains only the essential operators and operands
Interior nodesNon-terminals of the grammarOperators
Leaf nodesTerminals (tokens)Operands (identifiers, constants)
RedundancyIncludes redundant info (e.g. single productions, parentheses, intermediate non-terminals)Removes redundant details; more compact
PurposeShows the full derivation according to the grammarA condensed IR used for semantic analysis and code generation

Example: a + b * c

  • The parse tree shows the full chain EE+TE\to E+T\to \dots with all intermediate non-terminals (E,T,FE, T, F).
  • The syntax tree is simply:
      +
     / \
    a   *
       / \
      b   c

showing only the operators +, * and operands a, b, c.

parsing
8short5 marks

What is an ambiguous grammar? Show with an example.

Ambiguous Grammar

A grammar is ambiguous if there exists at least one string in its language that has more than one distinct parse tree (equivalently, more than one leftmost or more than one rightmost derivation). Ambiguity is undesirable because the compiler cannot uniquely determine the structure/meaning of a statement.

Example

Consider the grammar:

EE+EEEidE \to E + E \mid E * E \mid id

The string id+ididid + id * id has two different parse trees:

Derivation 1 (interpreting as id+(idid)id + (id * id)):

EE+Eid+Eid+EEid+ididE \Rightarrow E + E \Rightarrow id + E \Rightarrow id + E*E \Rightarrow id + id*id

Derivation 2 (interpreting as (id+id)id(id + id) * id):

EEEE+EEid+ididE \Rightarrow E * E \Rightarrow E+E * E \Rightarrow id+id*id

Since the same string yields two distinct leftmost derivations / parse trees, the grammar is ambiguous. Ambiguity can usually be removed by rewriting the grammar to enforce operator precedence and associativity (e.g. EE+T, TTF, FidE\to E+T,\ T\to T*F,\ F\to id).

grammar
9short5 marks

Explain the role of the lexical analyzer in a compiler.

Role of the Lexical Analyzer

The lexical analyzer (scanner) is the first phase of a compiler. It reads the source program as a stream of characters and groups them into meaningful units called tokens (e.g. keywords, identifiers, constants, operators, punctuation). Its main roles are:

  1. Tokenization — read the input character by character and produce a sequence of tokens in the form <token-name, attribute-value> (e.g. <id, pointer-to-symbol-table-entry>). The actual character string is the lexeme.
  2. Removing whitespace and comments — strip blanks, tabs, newlines, and comments that are not needed by later phases.
  3. Interface with the symbol table — enter identifiers and constants into the symbol table.
  4. Correlate error messages — keep track of line numbers so syntax/semantic errors can be reported accurately.
  5. Supply tokens to the parser — usually works as a subroutine called by the syntax analyzer, returning the next token on demand.

Lexical analysis is typically specified using regular expressions and implemented with finite automata (DFA); tools like Lex/Flex generate scanners automatically. It also detects simple lexical errors such as illegal characters or malformed tokens.

lexical-analysis
10short5 marks

Differentiate between LL(1) and LR parsers.

LL(1) vs LR Parsers

AspectLL(1) ParserLR Parser
TypeTop-down parsingBottom-up parsing
Derivation producedLeftmost derivation (in reverse none — it builds leftmost)Rightmost derivation in reverse
First L / scanningLeft-to-right scanLeft-to-right scan
Second letterL = Leftmost derivationR = Rightmost derivation
PowerHandles a smaller class of grammarsHandles a larger class (all LL grammars and more)
Left recursionCannot handle left-recursive grammarsCan handle left recursion
ConstructionSimpler; uses FIRST/FOLLOW and a predictive tableMore complex; uses item sets / DFA of LR(0) items and ACTION/GOTO tables
Error detectionDetects errors reasonablyDetects errors as early as possible (viable-prefix property)
ExamplesRecursive-descent, predictive parserSLR, CLR (LR(1)), LALR; tools like YACC/Bison

Summary: LL(1) is top-down, easier to build by hand, but weaker; LR is bottom-up, more powerful (handles left recursion and more grammars), but harder to construct and usually generated by tools.

parsing
11short5 marks

What are regular expressions? Write a regular expression for identifiers.

Regular Expressions

A regular expression (RE) is a notation for describing regular languages — the patterns of strings (lexemes) that form tokens. REs are built from symbols of an alphabet using three basic operations:

  • Union / alternation: rsr \mid s — a string matching rr or ss.
  • Concatenation: rsrs — a string of rr followed by ss.
  • Kleene closure: rr^{*}zero or more repetitions of rr (and r+r^{+} = one or more).

The base cases are ε\varepsilon (empty string) and each single symbol aa. Regular expressions are used in lexical analysis to specify the patterns of tokens, and each RE can be converted to a finite automaton that recognizes it.

Regular Expression for Identifiers

An identifier begins with a letter or underscore, followed by any number of letters, digits, or underscores. Let:

letterABZabz_letter \to A \mid B \mid \dots \mid Z \mid a \mid b \mid \dots \mid z \mid \_ digit019digit \to 0 \mid 1 \mid \dots \mid 9

Then:

idletter(letterdigit)id \to letter\,(\,letter \mid digit\,)^{*}

In regex shorthand: [A-Za-z_][A-Za-z0-9_]*

regular-expressionlexical-analysis
12short5 marks

Explain the handle and handle pruning in shift-reduce parsing.

Handle and Handle Pruning in Shift-Reduce Parsing

Shift-reduce parsing is a bottom-up technique that builds the parse tree from the leaves to the root by repeatedly reducing a string to the start symbol. It uses a stack and the input buffer, performing shift (push the next input symbol) and reduce (replace a handle on top of the stack by a non-terminal) actions.

Handle

A handle is a substring of the current right-sentential form that:

  1. matches the right side of some production AβA \to \beta, and
  2. whose reduction to the left-side non-terminal AA represents one step of a reverse rightmost derivation.

Formally, if SrmαAwrmαβwS \xRightarrow{*}_{rm} \alpha A w \xRightarrow{}_{rm} \alpha\beta w, then AβA\to\beta at the position right after α\alpha is a handle. (It is not just any substring matching a production body — it must be the correct one for the rightmost derivation.)

Handle Pruning

Handle pruning is the process of obtaining a rightmost derivation in reverse: repeatedly locate the handle in the current right-sentential form and replace (reduce) it by the left side of its production, until the start symbol SS is reached.

Example

Grammar: EE+EEEidE \to E+E \mid E*E \mid id, input id1+id2id3id_1 + id_2 * id_3:

Right-sentential formHandleReducing production
id1+id2id3id_1 + id_2 * id_3id1id_1EidE \to id
E+id2id3E + id_2 * id_3id2id_2EidE \to id
E+Eid3E + E * id_3id3id_3EidE \to id
E+EEE + E * EEEE*EEEEE \to E*E
E+EE + EE+EE+EEE+EE \to E+E
EE(start symbol reached)

Reducing each handle in turn yields the start symbol, completing the parse.

parsing

Frequently asked questions

Where can I find the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) question paper 2077?
The full BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2077 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Compiler Design and Construction (BSc CSIT, CSC365) 2077 paper come with solutions?
Yes. Every question on this Compiler Design and Construction (BSc CSIT, CSC365) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2077 paper?
The BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2077 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Compiler Design and Construction (BSc CSIT, CSC365) past paper free?
Yes — reading and attempting this Compiler Design and Construction (BSc CSIT, CSC365) past paper on Kekkei is completely free.