BSc CSIT (TU) Science Compiler Design and Construction (BSc CSIT, CSC365) Question Paper 2080 Nepal
This is the official BSc CSIT (TU) (Science stream) Compiler Design and Construction (BSc CSIT, CSC365) question paper for 2080, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Compiler Design and Construction (BSc CSIT, CSC365) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) exam or solving previous years' question papers, this 2080 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
Discuss the specification and recognition of tokens. Explain the role of finite automata and input buffering in lexical analysis.
Specification and Recognition of Tokens
Lexical analysis is the first phase of a compiler. It reads the source program character-by-character and groups characters into meaningful units called tokens.
Specification of Tokens
Tokens are specified using regular expressions (REs), which describe the lexical patterns of a language.
- Alphabet (): finite set of symbols.
- String / Lexeme: an actual sequence of characters matching a pattern.
- Pattern: rule describing the set of lexemes a token can represent.
- Token: a category (identifier, number, keyword, operator, etc.).
Example regular definitions:
Recognition of Tokens
Once specified, tokens are recognized using transition diagrams / finite automata. The lexer matches the longest possible prefix of the input against the patterns (the longest-match / maximal-munch rule) and, on ties, uses the rule listed first. Recognition steps:
- Convert each RE to an NFA (Thompson's construction).
- Convert the NFA to a DFA (subset construction) and minimize it.
- Use the DFA as a table-driven scanner that emits
<token, attribute-value>pairs.
Role of Finite Automata
- NFA / DFA are the computational models used to recognize the regular languages described by token patterns.
- A DFA is preferred for the actual scanner because it has exactly one transition per (state, symbol), giving deterministic, scanning.
- Tools like Lex/Flex internally build a DFA from the supplied regular expressions to drive token recognition.
Role of Input Buffering
Reading one character at a time from disk is slow, so the scanner uses input buffering to read blocks of characters efficiently.
- Two-buffer scheme: two halves of size each; when one half is exhausted, the other is reloaded, so the scanner can look ahead without losing earlier characters (needed because some tokens are recognized only after seeing extra characters, e.g.
<vs<=). - Sentinels: a special end-of-buffer marker (e.g.
eof) is placed at the end of each half so a single test per character handles both end-of-buffer and end-of-input, halving the number of comparisons. - Two pointers are maintained:
lexemeBegin(start of current lexeme) andforward(scans ahead). On recognizing a token,lexemeBeginis moved toforward.
Conclusion
Tokens are specified with regular expressions and recognized by finite automata (DFA), while input buffering with sentinels makes the character-level scanning fast and supports the look-ahead that token recognition requires.
Explain top-down parsing. Construct the predictive (LL(1)) parsing table for the grammar E -> E+T | T, T -> T*F | F, F -> (E) | id after removing left recursion.
Top-Down Parsing
Top-down parsing builds the parse tree from the root (start symbol) down to the leaves (terminals), attempting a leftmost derivation of the input. The two main approaches are:
- Recursive-descent parsing (may use backtracking).
- Predictive / LL(1) parsing — a non-backtracking, table-driven method that uses one symbol of look-ahead. It requires the grammar to be free of left recursion and to be left-factored.
Step 1: Remove Left Recursion
Given grammar:
After eliminating left recursion:
Step 2: FIRST and FOLLOW Sets
| Non-terminal | FIRST | FOLLOW |
|---|---|---|
| { (, id } | { ), $ } | |
| { +, } | { ), $ } | |
| { (, id } | { +, ), $ } | |
| { *, } | { +, ), $ } | |
| { (, id } | { +, *, ), $ } |
Step 3: LL(1) Predictive Parsing Table
Entry M[A, a] holds the production used when non-terminal is on the stack and is the look-ahead.
| id | + | * | ( | ) | $ | |
|---|---|---|---|---|---|---|
| E | ||||||
| E' | ||||||
| T | ||||||
| T' | ||||||
| F |
Blank cells denote error entries. Since no cell has more than one production, the grammar is LL(1) and can be parsed top-down without backtracking.
Explain code optimization. Discuss the principal sources of optimization and the various machine-independent optimization techniques.
Code Optimization
Code optimization is the compiler phase that transforms intermediate (or target) code into a form that runs faster and/or uses fewer resources, while preserving the program's meaning (semantic equivalence). Optimization should never change the output of the program; it only improves efficiency.
Principal Sources of Optimization
These are the situations the optimizer exploits, mostly local and global transformations on basic blocks and flow graphs:
- Common Sub-expression Elimination — reuse an already-computed value instead of recomputing it. e.g. compute
t = a*bonce. - Copy Propagation — after
x = y, replace later uses ofxbyy. - Dead-Code Elimination — remove instructions whose results are never used.
- Constant Folding & Constant Propagation — evaluate constant expressions at compile time (e.g.
3*4 → 12) and propagate constants. - Loop Optimizations, which are especially valuable because loops run repeatedly:
- Code Motion / Loop-Invariant Code Motion — move computations that don't change inside the loop to outside it.
- Induction-Variable Elimination — simplify variables that change in lock-step.
- Strength Reduction — replace costly operations by cheaper ones (e.g. replace multiplication
i*4by repeated addition).
Machine-Independent Optimization Techniques
These improve intermediate code without knowledge of the target machine:
| Technique | Description |
|---|---|
| Common sub-expression elimination | Avoid recomputing identical expressions |
| Constant folding | Pre-compute constant expressions |
| Copy propagation | Substitute equals for equals |
| Dead-code elimination | Drop unreachable / unused code |
| Loop-invariant code motion | Hoist invariant code out of loops |
| Strength reduction | Replace expensive ops with cheaper ones |
| Algebraic simplification | Use identities, e.g. x+0=x, x*1=x, x*0=0 |
| Loop unrolling / jamming | Reduce loop overhead |
(In contrast, machine-dependent optimizations such as register allocation, instruction selection and peephole optimization depend on the target architecture.)
Conclusion
Code optimization improves execution time and memory use; its principal sources lie in redundant, constant, dead and loop computations, and machine-independent techniques remove these inefficiencies at the intermediate-code level.
Section B: Short Answer Questions
Attempt any EIGHT questions.
What is three-address code? Write the three-address code for x = (a + b) * (c + d).
Three-Address Code (TAC)
Three-address code is an intermediate representation in which each instruction has at most three operands and one operator, usually in the form:
where x is the result and y, z are operands or constants. Complex expressions are broken down using temporary variables (). It is easy to optimize and to translate to machine code.
TAC for x = (a + b) * (c + d)
t1 = a + b
t2 = c + d
t3 = t1 * t2
x = t3
Each line has at most three addresses, and temporaries hold the intermediate results.
Explain peephole optimization with an example.
Peephole Optimization
Peephole optimization is a simple, machine-dependent optimization that examines a small sliding window (the peephole) of a few consecutive target/intermediate instructions and replaces them with a shorter or faster equivalent sequence. It is usually applied repeatedly until no further improvement is possible.
Common peephole transformations
- Redundant load/store elimination — remove a store immediately followed by a load of the same value.
- Elimination of unreachable / dead code.
- Flow-of-control optimization — collapse jumps to jumps.
- Algebraic simplification —
x = x + 0,x = x * 1removed. - Strength reduction — replace
x = x * 2withx = x + xor a shift.
Example (redundant load/store)
Before:
MOV R0, a
MOV a, R0
The second instruction stores back the value just loaded, so it is redundant and can be deleted. After:
MOV R0, a
This shortens the code and removes an unnecessary memory access.
Differentiate between synthesized and inherited attributes.
Synthesized vs. Inherited Attributes
In syntax-directed translation, every grammar symbol can carry attributes whose values are computed by semantic rules attached to productions.
| Aspect | Synthesized Attribute | Inherited Attribute |
|---|---|---|
| Value computed from | Attributes of the node's children (and itself) | Attributes of the node's parent and/or siblings |
| Direction of flow | Bottom-up (leaves → root) | Top-down / sideways (root → leaves) |
| Associated with | Typically the left-hand side of a production | Typically a right-hand-side non-terminal |
| Grammar class | An S-attributed grammar uses only synthesized attributes | L-attributed grammars allow both (inherited from left siblings) |
| Evaluation | Can be evaluated during bottom-up (LR) parsing | Suited to top-down (LL) parsing |
| Example | E.val computed from E.val + T.val | Type info passed down in int id, id; |
Summary: A synthesized attribute gets its value from below (children), whereas an inherited attribute gets its value from above or beside (parent/siblings).
What is bootstrapping in compiler construction?
Bootstrapping
Bootstrapping is the technique of writing a compiler for a language in (a subset of) the same language it is meant to compile, and then using an initial/simple version of the compiler to build the full, self-hosting compiler.
Idea
To compile a language to target machine using a compiler written in , we first need a way to run some compiler — this is solved in stages.
T-diagram notation
A compiler is described by a T-diagram written in , meaning it translates source to target and is itself implemented in language .
Typical bootstrapping steps
- Write a small compiler for a subset of language in an existing available language (e.g. assembly or C). Call it .
- Use to compile a fuller compiler for that is written in the subset , producing a working compiler .
- Use to compile the complete compiler written in itself, yielding the final self-hosting compiler.
Advantages
- Demonstrates the language is powerful enough to write its own compiler.
- Reduces dependence on other languages; eases porting to new machines (cross-compilation + bootstrapping).
Example: The GCC C compiler and many language compilers are bootstrapped — an early C compiler compiles a C-written C compiler.
Construct an NFA for the regular expression (a|b)*abb.
NFA for (a|b)*abb
We build the NFA using Thompson's construction. The expression accepts any string over {a, b} that ends in abb.
States and transitions (described)
Let states be where is the start state and is the only accepting (final) state.
- (start): on a go to ; on b go to . (The self-loops on both
aandbrealize the(a|b)*prefix.) - : on b go to .
- : on b go to .
- (final): no outgoing transitions needed.
Transition table
| State | a | b |
|---|---|---|
| → | ||
| – | ||
| – | ||
| * | – | – |
Diagram (in words)
has self-loops labelled a and b. From an a edge goes to , a b edge from to , and a b edge from to (accepting). Thus the automaton can consume any number of a/b symbols in and then accept only after reading the final abb.
This NFA correctly recognizes the language .
What is operator precedence parsing?
Operator Precedence Parsing
Operator precedence parsing is a bottom-up (shift-reduce) parsing technique applicable to a restricted class of grammars called operator grammars — grammars in which no production has two adjacent non-terminals on its right-hand side and no -production.
Key idea
Parsing decisions are guided by precedence relations defined between pairs of terminals (operators). Three relations are used:
| Relation | Meaning |
|---|---|
| yields precedence to (shift) | |
| has equal precedence with | |
| takes precedence over (reduce) |
These relations are stored in a precedence table. During parsing, the relation between the top-of-stack terminal and the input terminal decides whether to shift or to reduce the handle (the substring between and ).
Characteristics
- Simple to implement by hand; suited to expression grammars with operators like
+,*. - Advantages: easy, fast, no large parse tables.
- Limitations: works only for a small class of grammars; cannot handle grammars with -productions or adjacent non-terminals; difficulty handling the unary minus.
Explain activation records and their role in run-time storage management.
Activation Records
An activation record (also called a stack frame) is a block of memory allocated on the run-time stack for each invocation (activation) of a procedure/function. It holds all the information needed to manage that single call.
Typical contents (fields)
| Field | Purpose |
|---|---|
| Return value | Space to return a result to the caller |
| Actual parameters | Arguments passed by the caller |
| Control link | Pointer to the caller's activation record (dynamic link) |
| Access / static link | Pointer for accessing non-local data (for nested scopes) |
| Saved machine status | Return address, saved registers |
| Local variables | Locals declared in the procedure |
| Temporaries | Intermediate values during expression evaluation |
Role in Run-Time Storage Management
- Stack allocation: when a procedure is called, its activation record is pushed onto the stack; when it returns, the record is popped. This naturally supports the last-in-first-out nesting of calls, including recursion (each recursive call gets its own fresh record).
- The control link restores the caller's frame on return, and the access link lets a nested procedure reference variables in enclosing scopes.
- The compiler computes the offset of each local/parameter within the record so they can be addressed relative to a frame pointer.
Thus activation records are the basic unit of the run-time stack and enable correct memory management for procedure calls and returns.
What is a basic block and a flow graph?
Basic Block and Flow Graph
Basic Block
A basic block is a maximal sequence of consecutive intermediate-code (three-address) statements with the properties:
- Control enters only at the first statement (no jumps into the middle).
- Control leaves only at the last statement (no jumps out of the middle). Thus once execution starts at the top, all statements run in order without branching.
Identifying leaders (first statement of a block):
- The very first statement of the program.
- Any statement that is the target of a jump.
- Any statement immediately following a jump (conditional or unconditional). Each leader and the statements up to (but not including) the next leader form one basic block.
Flow Graph
A flow graph is a directed graph in which:
- Nodes are the basic blocks.
- Edges represent the possible flow of control — an edge from block to exists if execution can pass from the end of to the start of (via fall-through or a jump to 's leader). It has a designated initial (entry) node (the block with the program's first statement).
Use
Basic blocks and flow graphs are the foundation of control-flow analysis and code optimization (loop detection, data-flow analysis, register allocation).
Differentiate between top-down and bottom-up parsing.
Top-Down vs. Bottom-Up Parsing
| Aspect | Top-Down Parsing | Bottom-Up Parsing |
|---|---|---|
| Parse-tree construction | From the root (start symbol) to leaves | From the leaves to the root |
| Derivation produced | Leftmost derivation | Reverse of a rightmost derivation |
| Basic action | Predict / expand a production | Shift–reduce (reduce handles) |
| Look-ahead use | Chooses production using look-ahead (LL) | Decides when to reduce a handle (LR) |
| Grammar restrictions | Needs no left recursion and left-factoring | Handles a larger class of grammars, including left recursion |
| Typical methods | Recursive-descent, LL(1) predictive parsing | Operator-precedence, LR(0)/SLR/LALR/CLR parsing |
| Power | Less powerful | More powerful |
| Implementation | Easier to write by hand | Usually generated by tools (e.g. YACC) |
Summary: Top-down parsing predicts the input by expanding the start symbol (leftmost derivation), while bottom-up parsing reduces the input tokens back to the start symbol (reverse rightmost derivation) and accepts a wider class of grammars.
Frequently asked questions
- Where can I find the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) question paper 2080?
- The full BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper come with solutions?
- Yes. Every question on this Compiler Design and Construction (BSc CSIT, CSC365) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper?
- The BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Compiler Design and Construction (BSc CSIT, CSC365) past paper free?
- Yes — reading and attempting this Compiler Design and Construction (BSc CSIT, CSC365) past paper on Kekkei is completely free.