BSc CSIT (TU) Science Compiler Design and Construction (BSc CSIT, CSC365) Question Paper 2080 Nepal

Q: Where can I find the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) question paper 2080?

The full BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper come with solutions?

Yes. Every question on this Compiler Design and Construction (BSc CSIT, CSC365) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper?

The BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Compiler Design and Construction (BSc CSIT, CSC365) past paper free?

Yes — reading and attempting this Compiler Design and Construction (BSc CSIT, CSC365) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Discuss the specification and recognition of tokens. Explain the role of finite automata and input buffering in lexical analysis.

lexical-analysis

Answer 1

Specification and Recognition of Tokens

Lexical analysis is the first phase of a compiler. It reads the source program character-by-character and groups characters into meaningful units called tokens.

Specification of Tokens

Tokens are specified using regular expressions (REs), which describe the lexical patterns of a language.

Alphabet ( $\Sigma$ ): finite set of symbols.
String / Lexeme: an actual sequence of characters matching a pattern.
Pattern: rule describing the set of lexemes a token can represent.
Token: a category (identifier, number, keyword, operator, etc.).

Example regular definitions:

letter \rightarrow A|B|\dots|z, \quad digit \rightarrow 0|1|\dots|9

id \rightarrow letter\,(letter\,|\,digit)^{*}

num \rightarrow digit^{+}(.\,digit^{+})?(E(+|-)?digit^{+})?

Recognition of Tokens

Once specified, tokens are recognized using transition diagrams / finite automata. The lexer matches the longest possible prefix of the input against the patterns (the longest-match / maximal-munch rule) and, on ties, uses the rule listed first. Recognition steps:

Convert each RE to an NFA (Thompson's construction).
Convert the NFA to a DFA (subset construction) and minimize it.
Use the DFA as a table-driven scanner that emits <token, attribute-value> pairs.

Role of Finite Automata

NFA / DFA are the computational models used to recognize the regular languages described by token patterns.
A DFA is preferred for the actual scanner because it has exactly one transition per (state, symbol), giving deterministic, $O(n)$ scanning.
Tools like Lex/Flex internally build a DFA from the supplied regular expressions to drive token recognition.

Role of Input Buffering

Reading one character at a time from disk is slow, so the scanner uses input buffering to read blocks of characters efficiently.

Two-buffer scheme: two halves of size $N$ each; when one half is exhausted, the other is reloaded, so the scanner can look ahead without losing earlier characters (needed because some tokens are recognized only after seeing extra characters, e.g. < vs <=).
Sentinels: a special end-of-buffer marker (e.g. eof) is placed at the end of each half so a single test per character handles both end-of-buffer and end-of-input, halving the number of comparisons.
Two pointers are maintained: lexemeBegin (start of current lexeme) and forward (scans ahead). On recognizing a token, lexemeBegin is moved to forward.

Conclusion

Tokens are specified with regular expressions and recognized by finite automata (DFA), while input buffering with sentinels makes the character-level scanning fast and supports the look-ahead that token recognition requires.

Answer 2

Top-Down Parsing

Top-down parsing builds the parse tree from the root (start symbol) down to the leaves (terminals), attempting a leftmost derivation of the input. The two main approaches are:

Recursive-descent parsing (may use backtracking).
Predictive / LL(1) parsing — a non-backtracking, table-driven method that uses one symbol of look-ahead. It requires the grammar to be free of left recursion and to be left-factored.

Step 1: Remove Left Recursion

Given grammar:

E \to E+T \mid T, \quad T \to T*F \mid F, \quad F \to (E) \mid id

After eliminating left recursion:

E \to T\,E'

E' \to +T\,E' \mid \varepsilon

T \to F\,T'

T' \to *F\,T' \mid \varepsilon

F \to (E) \mid id

Step 2: FIRST and FOLLOW Sets

Non-terminal	FIRST	FOLLOW
$E$	{ (, id }	{ ), $ }
$E'$	{ +, $\varepsilon$ }	{ ), $ }
$T$	{ (, id }	{ +, ), $ }
$T'$	{ *, $\varepsilon$ }	{ +, ), $ }
$F$	{ (, id }	{ +, *, ), $ }

Step 3: LL(1) Predictive Parsing Table

Entry M[A, a] holds the production used when non-terminal $A$ is on the stack and $a$ is the look-ahead.

	id	+	*	(	)	$
E	$E\to TE'$			$E\to TE'$
E'		$E'\to +TE'$			$E'\to\varepsilon$	$E'\to\varepsilon$
T	$T\to FT'$			$T\to FT'$
T'		$T'\to\varepsilon$	$T'\to *FT'$		$T'\to\varepsilon$	$T'\to\varepsilon$
F	$F\to id$			$F\to (E)$

Blank cells denote error entries. Since no cell has more than one production, the grammar is LL(1) and can be parsed top-down without backtracking.

Answer 3

Code Optimization

Code optimization is the compiler phase that transforms intermediate (or target) code into a form that runs faster and/or uses fewer resources, while preserving the program's meaning (semantic equivalence). Optimization should never change the output of the program; it only improves efficiency.

Principal Sources of Optimization

These are the situations the optimizer exploits, mostly local and global transformations on basic blocks and flow graphs:

Common Sub-expression Elimination — reuse an already-computed value instead of recomputing it. e.g. compute t = a*b once.
Copy Propagation — after x = y, replace later uses of x by y.
Dead-Code Elimination — remove instructions whose results are never used.
Constant Folding & Constant Propagation — evaluate constant expressions at compile time (e.g. 3*4 → 12) and propagate constants.
Loop Optimizations, which are especially valuable because loops run repeatedly:
- Code Motion / Loop-Invariant Code Motion — move computations that don't change inside the loop to outside it.
- Induction-Variable Elimination — simplify variables that change in lock-step.
- Strength Reduction — replace costly operations by cheaper ones (e.g. replace multiplication i*4 by repeated addition).

Machine-Independent Optimization Techniques

These improve intermediate code without knowledge of the target machine:

Technique	Description
Common sub-expression elimination	Avoid recomputing identical expressions
Constant folding	Pre-compute constant expressions
Copy propagation	Substitute equals for equals
Dead-code elimination	Drop unreachable / unused code
Loop-invariant code motion	Hoist invariant code out of loops
Strength reduction	Replace expensive ops with cheaper ones
Algebraic simplification	Use identities, e.g. `x+0=x`, `x1=x`, `x0=0`
Loop unrolling / jamming	Reduce loop overhead

(In contrast, machine-dependent optimizations such as register allocation, instruction selection and peephole optimization depend on the target architecture.)

Conclusion

Code optimization improves execution time and memory use; its principal sources lie in redundant, constant, dead and loop computations, and machine-independent techniques remove these inefficiencies at the intermediate-code level.

Answer 4

Three-Address Code (TAC)

Three-address code is an intermediate representation in which each instruction has at most three operands and one operator, usually in the form:

x = y \; op \; z

where x is the result and y, z are operands or constants. Complex expressions are broken down using temporary variables ( $t_1, t_2, \dots$ ). It is easy to optimize and to translate to machine code.

TAC for `x = (a + b) * (c + d)`

t1 = a + b
t2 = c + d
t3 = t1 * t2
x  = t3

Each line has at most three addresses, and temporaries hold the intermediate results.

Answer 5

Peephole Optimization

Peephole optimization is a simple, machine-dependent optimization that examines a small sliding window (the peephole) of a few consecutive target/intermediate instructions and replaces them with a shorter or faster equivalent sequence. It is usually applied repeatedly until no further improvement is possible.

Common peephole transformations

Redundant load/store elimination — remove a store immediately followed by a load of the same value.
Elimination of unreachable / dead code.
Flow-of-control optimization — collapse jumps to jumps.
Algebraic simplification — x = x + 0, x = x * 1 removed.
Strength reduction — replace x = x * 2 with x = x + x or a shift.

Example (redundant load/store)

Before:

MOV R0, a
MOV a, R0

The second instruction stores back the value just loaded, so it is redundant and can be deleted. After:

MOV R0, a

This shortens the code and removes an unnecessary memory access.

Answer 6

Synthesized vs. Inherited Attributes

In syntax-directed translation, every grammar symbol can carry attributes whose values are computed by semantic rules attached to productions.

Aspect	Synthesized Attribute	Inherited Attribute
Value computed from	Attributes of the node's children (and itself)	Attributes of the node's parent and/or siblings
Direction of flow	Bottom-up (leaves → root)	Top-down / sideways (root → leaves)
Associated with	Typically the left-hand side of a production	Typically a right-hand-side non-terminal
Grammar class	An S-attributed grammar uses only synthesized attributes	L-attributed grammars allow both (inherited from left siblings)
Evaluation	Can be evaluated during bottom-up (LR) parsing	Suited to top-down (LL) parsing
Example	`E.val` computed from `E.val + T.val`	Type info passed down in `int id, id;`

Summary: A synthesized attribute gets its value from below (children), whereas an inherited attribute gets its value from above or beside (parent/siblings).

Answer 7

Bootstrapping

Bootstrapping is the technique of writing a compiler for a language in (a subset of) the same language it is meant to compile, and then using an initial/simple version of the compiler to build the full, self-hosting compiler.

Idea

To compile a language $L$ to target machine $M$ using a compiler written in $L$ , we first need a way to run some compiler — this is solved in stages.

T-diagram notation

A compiler is described by a T-diagram $S \rightarrow T$ written in $I$ , meaning it translates source $S$ to target $T$ and is itself implemented in language $I$ .

Typical bootstrapping steps

Write a small compiler for a subset $S$ of language $L$ in an existing available language (e.g. assembly or C). Call it $C_0$ .
Use $C_0$ to compile a fuller compiler for $L$ that is written in the subset $S$ , producing a working compiler $C_1$ .
Use $C_1$ to compile the complete compiler written in $L$ itself, yielding the final self-hosting compiler.

Advantages

Demonstrates the language is powerful enough to write its own compiler.
Reduces dependence on other languages; eases porting to new machines (cross-compilation + bootstrapping).

Example: The GCC C compiler and many language compilers are bootstrapped — an early C compiler compiles a C-written C compiler.

Answer 8

NFA for `(a|b)*abb`

We build the NFA using Thompson's construction. The expression accepts any string over {a, b} that ends in abb.

States and transitions (described)

Let states be $q_0, q_1, q_2, q_3$ where $q_0$ is the start state and $q_3$ is the only accepting (final) state.

$q_0$ (start): on a go to $\{q_0, q_1\}$ ; on b go to $\{q_0\}$ . (The self-loops on both a and b realize the (a|b)* prefix.)
$q_1$ : on b go to $q_2$ .
$q_2$ : on b go to $q_3$ .
$q_3$ (final): no outgoing transitions needed.

Transition table

State	a	b
→ $q_0$	$\{q_0, q_1\}$	$\{q_0\}$
$q_1$	–	$\{q_2\}$
$q_2$	–	$\{q_3\}$
* $q_3$	–	–

Diagram (in words)

$q_0$ has self-loops labelled a and b. From $q_0$ an a edge goes to $q_1$ , a b edge from $q_1$ to $q_2$ , and a b edge from $q_2$ to $q_3$ (accepting). Thus the automaton can consume any number of a/b symbols in $q_0$ and then accept only after reading the final abb.

This NFA correctly recognizes the language $(a|b)^*abb$ .

Answer 9

Operator Precedence Parsing

Operator precedence parsing is a bottom-up (shift-reduce) parsing technique applicable to a restricted class of grammars called operator grammars — grammars in which no production has two adjacent non-terminals on its right-hand side and no $\varepsilon$ -production.

Key idea

Parsing decisions are guided by precedence relations defined between pairs of terminals (operators). Three relations are used:

Relation	Meaning
$a \lessdot b$	$a$ yields precedence to $b$ (shift)
$a \doteq b$	$a$ has equal precedence with $b$
$a \gtrdot b$	$a$ takes precedence over $b$ (reduce)

These relations are stored in a precedence table. During parsing, the relation between the top-of-stack terminal and the input terminal decides whether to shift or to reduce the handle (the substring between $\lessdot$ and $\gtrdot$ ).

Characteristics

Simple to implement by hand; suited to expression grammars with operators like +, *.
Advantages: easy, fast, no large parse tables.
Limitations: works only for a small class of grammars; cannot handle grammars with $\varepsilon$ -productions or adjacent non-terminals; difficulty handling the unary minus.

Answer 10

Activation Records

An activation record (also called a stack frame) is a block of memory allocated on the run-time stack for each invocation (activation) of a procedure/function. It holds all the information needed to manage that single call.

Typical contents (fields)

Field	Purpose
Return value	Space to return a result to the caller
Actual parameters	Arguments passed by the caller
Control link	Pointer to the caller's activation record (dynamic link)
Access / static link	Pointer for accessing non-local data (for nested scopes)
Saved machine status	Return address, saved registers
Local variables	Locals declared in the procedure
Temporaries	Intermediate values during expression evaluation

Role in Run-Time Storage Management

Stack allocation: when a procedure is called, its activation record is pushed onto the stack; when it returns, the record is popped. This naturally supports the last-in-first-out nesting of calls, including recursion (each recursive call gets its own fresh record).
The control link restores the caller's frame on return, and the access link lets a nested procedure reference variables in enclosing scopes.
The compiler computes the offset of each local/parameter within the record so they can be addressed relative to a frame pointer.

Thus activation records are the basic unit of the run-time stack and enable correct memory management for procedure calls and returns.

Answer 11

Basic Block and Flow Graph

Basic Block

A basic block is a maximal sequence of consecutive intermediate-code (three-address) statements with the properties:

Control enters only at the first statement (no jumps into the middle).
Control leaves only at the last statement (no jumps out of the middle). Thus once execution starts at the top, all statements run in order without branching.

Identifying leaders (first statement of a block):

The very first statement of the program.
Any statement that is the target of a jump.
Any statement immediately following a jump (conditional or unconditional). Each leader and the statements up to (but not including) the next leader form one basic block.

Flow Graph

A flow graph is a directed graph in which:

Nodes are the basic blocks.
Edges represent the possible flow of control — an edge from block $B_1$ to $B_2$ exists if execution can pass from the end of $B_1$ to the start of $B_2$ (via fall-through or a jump to $B_2$ 's leader). It has a designated initial (entry) node (the block with the program's first statement).

Use

Basic blocks and flow graphs are the foundation of control-flow analysis and code optimization (loop detection, data-flow analysis, register allocation).

Answer 12

Top-Down vs. Bottom-Up Parsing

Aspect	Top-Down Parsing	Bottom-Up Parsing
Parse-tree construction	From the root (start symbol) to leaves	From the leaves to the root
Derivation produced	Leftmost derivation	Reverse of a rightmost derivation
Basic action	Predict / expand a production	Shift–reduce (reduce handles)
Look-ahead use	Chooses production using look-ahead (LL)	Decides when to reduce a handle (LR)
Grammar restrictions	Needs no left recursion and left-factoring	Handles a larger class of grammars, including left recursion
Typical methods	Recursive-descent, LL(1) predictive parsing	Operator-precedence, LR(0)/SLR/LALR/CLR parsing
Power	Less powerful	More powerful
Implementation	Easier to write by hand	Usually generated by tools (e.g. YACC)

Summary: Top-down parsing predicts the input by expanding the start symbol (leftmost derivation), while bottom-up parsing reduces the input tokens back to the start symbol (reverse rightmost derivation) and accepts a wider class of grammars.

Level	BSc CSIT (TU)
Stream	Science
Subject	Compiler Design and Construction (BSc CSIT, CSC365)
Year	2080 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

Section A: Long Answer Questions

Specification and Recognition of Tokens

Specification of Tokens

Recognition of Tokens

Role of Finite Automata

Role of Input Buffering

Conclusion

Top-Down Parsing

Step 1: Remove Left Recursion

Step 2: FIRST and FOLLOW Sets

Step 3: LL(1) Predictive Parsing Table

Code Optimization

Principal Sources of Optimization

Machine-Independent Optimization Techniques

Conclusion

Section B: Short Answer Questions

Three-Address Code (TAC)

TAC for x = (a + b) * (c + d)

Peephole Optimization

Common peephole transformations

Example (redundant load/store)

Synthesized vs. Inherited Attributes

Bootstrapping

Idea

T-diagram notation

Typical bootstrapping steps

Advantages

NFA for (a|b)*abb

States and transitions (described)

Transition table

Diagram (in words)

Operator Precedence Parsing

Key idea

Characteristics

Activation Records

Typical contents (fields)

Role in Run-Time Storage Management

Basic Block and Flow Graph

Basic Block

Flow Graph

Use

Top-Down vs. Bottom-Up Parsing

Frequently asked questions

TAC for `x = (a + b) * (c + d)`

NFA for `(a|b)*abb`