BSc CSIT (TU) Science Compiler Design and Construction (BSc CSIT, CSC365) Question Paper 2077 Nepal
This is the official BSc CSIT (TU) (Science stream) Compiler Design and Construction (BSc CSIT, CSC365) question paper for 2077, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Compiler Design and Construction (BSc CSIT, CSC365) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) exam or solving previous years' question papers, this 2077 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
Explain top-down parsing. Construct the predictive (LL(1)) parsing table for the grammar E -> E+T | T, T -> T*F | F, F -> (E) | id after removing left recursion.
Top-Down Parsing
Top-down parsing builds the parse tree from the root (start symbol) down to the leaves (terminals), trying to find a leftmost derivation of the input string. At each step it predicts which production to apply for a non-terminal. Predictive parsing (LL(1)) is a non-backtracking, table-driven form of top-down parsing that uses one lookahead symbol and the FIRST/FOLLOW sets to decide the production deterministically.
Step 1: Remove Left Recursion
Given grammar:
Applying the rule :
Step 2: Compute FIRST and FOLLOW
| Symbol | FIRST | FOLLOW |
|---|---|---|
| \{\,),\ \,}$ | ||
| \{\,),\ \,}$ | ||
| \{\,+,\ ),\ \,}$ | ||
| \{\,+,\ ),\ \,}$ | ||
| \{\,*,\ +,\ ),\ \,}$ |
Step 3: LL(1) Predictive Parsing Table
For each production , place it under terminals in ; if , also under .
| NT | id | + | * | ( | ) | $ |
|---|---|---|---|---|---|---|
| E | ||||||
| E' | ||||||
| T | ||||||
| T' | ||||||
| F |
Since no cell contains more than one entry, the grammar is LL(1) and can be parsed top-down without backtracking.
Explain code optimization. Discuss the principal sources of optimization and the various machine-independent optimization techniques.
Code Optimization
Code optimization is the compiler phase that transforms intermediate (or target) code into a form that runs faster and/or uses less memory, while preserving the program's meaning (semantics). A good optimization must be correct, beneficial on average, and worth its compile-time cost.
Principal Sources of Optimization
Optimizations exploit redundancy and predictable behaviour in code:
- Common Sub-expression Elimination — if an expression is already computed and operands are unchanged, reuse the result instead of recomputing.
- Dead-code Elimination — remove code whose results are never used.
- Copy Propagation — replace uses of a variable after
x = ywithy. - Constant Folding / Propagation — evaluate constant expressions at compile time (e.g.
3*4 → 12). - Loop Optimizations — the biggest payoff, since inner loops run most often:
- Code Motion (Loop-invariant code motion): move computations that don't change inside a loop to before it.
- Induction Variable Elimination: simplify variables that change by a fixed amount each iteration.
- Strength Reduction: replace expensive operations with cheaper ones (e.g.
i*4 → i+i+i+ior repeated addition instead of multiplication).
Machine-Independent Optimization Techniques
These do not depend on the target machine's registers or instruction set:
| Technique | Effect |
|---|---|
| Constant folding | Compute constant expressions at compile time |
| Common sub-expression elimination | Avoid recomputing identical expressions |
| Copy propagation | Substitute copies to expose further optimization |
| Dead-code elimination | Delete unreachable/unused code |
| Loop-invariant code motion | Hoist invariant code out of loops |
| Strength reduction | Replace costly operators with cheaper ones |
| Induction-variable elimination | Remove redundant loop counters |
Levels: optimization can be local (within a basic block), global (across blocks using a flow graph and data-flow analysis), or loop-level. In contrast, machine-dependent optimizations (register allocation, instruction selection, peephole optimization) depend on the target hardware.
Discuss target code generation. Explain register allocation and the issues in the design of a code generator.
Target Code Generation
Code generation is the final phase of a compiler. It takes the optimized intermediate representation (e.g. three-address code) and produces equivalent target code (assembly or machine code). The generated code must be correct, of high quality (fast/compact), and produced efficiently.
A typical instruction has the form:
LD reg, mem ; load
ST mem, reg ; store
OP reg, reg, reg ; arithmetic
Register Allocation and Assignment
Registers are the fastest storage, but few in number, so their use must be planned:
- Register allocation decides which variables/values reside in registers at each point in the program.
- Register assignment decides the specific physical register for each chosen value.
Good allocation minimizes loads/stores (memory traffic). Common approaches include usage counts, graph-coloring of the register-interference graph, and keeping the most frequently used values (especially loop variables) in registers. The problem of optimal allocation is NP-hard, so heuristics are used.
Issues in the Design of a Code Generator
- Input to the code generator — the IR (three-address code, trees, DAGs) plus the symbol table for addresses; assumes earlier phases produced error-free, type-checked input.
- Target program / instruction set — output may be absolute machine code, relocatable code, or assembly; the richness and regularity of the instruction set affects code quality.
- Instruction selection — map IR operations to machine instructions; uniformity and cost of instructions matter (e.g.
INCvsADD #1). - Register allocation and assignment — using limited registers efficiently (as above).
- Evaluation order — the order in which computations are performed affects the number of registers needed and the efficiency of the code.
- Memory management / addressing modes — mapping names to runtime addresses; exploiting available addressing modes.
- Approach / target-machine details — the design must handle the specifics of the target architecture while keeping the generator maintainable.
A simple code generator processes one basic block at a time, using a register descriptor (what each register holds) and address descriptor (where each value's current value can be found) to emit minimal load/store instructions.
Section B: Short Answer Questions
Attempt any EIGHT questions.
Differentiate between a compiler and an interpreter.
Compiler vs Interpreter
| Aspect | Compiler | Interpreter |
|---|---|---|
| Translation | Translates the whole source program into machine/target code at once | Translates and executes the program line by line |
| Output | Produces a separate object/executable file | Produces no separate object file |
| Execution speed | Faster (translated once, runs many times) | Slower (re-translated each run) |
| Error reporting | Reports all errors after scanning the whole program | Stops at the first error it encounters |
| Memory | Needs more memory (generates object code) | Generally needs less memory |
| Examples | C, C++ compilers | Python, BASIC interpreters |
Summary: A compiler converts the entire program before execution, giving faster runtime; an interpreter translates and runs statement by statement, making debugging easier but execution slower.
Eliminate left recursion and perform left factoring on the grammar A -> Aab | Ac | bd | f.
Grammar:
Step 1: Eliminate Left Recursion
The grammar is left recursive (productions and ). Using where:
- (recursive parts)
- (non-recursive parts)
The rule gives and :
Step 2: Left Factoring
Examine each non-terminal for common prefixes among alternatives.
- — the alternatives begin with and , which are different, so no common prefix; no factoring needed.
- — alternatives begin with , , and , all distinct; no common prefix.
Final Grammar
The grammar is now free of left recursion, and no left factoring is required because no two alternatives share a common prefix.
What is a symbol table? Explain its role in a compiler.
Symbol Table
A symbol table is a data structure maintained by the compiler that stores information about each identifier (variables, functions, constants, objects, etc.) occurring in the source program. For each name it records attributes such as the name, type, scope, memory location/offset, dimensions/size, and number/type of arguments (for functions).
Role in a Compiler
The symbol table is used by almost every phase of the compiler:
- Lexical analysis — enters newly seen identifiers (names) into the table.
- Syntax analysis — adds structural/scope information as declarations are parsed.
- Semantic analysis — uses stored types to perform type checking and to detect undeclared or multiply-declared variables.
- Intermediate / code generation — supplies the storage addresses, offsets, and types needed to generate correct code.
- Optimization & error handling — provides scope and lifetime information.
It supports operations like insert (add a name with attributes) and lookup (retrieve attributes). It is commonly implemented using a hash table (for fast access), linked lists, or balanced trees, and must handle scope and nesting (e.g. using a stack of tables for block-structured languages).
Differentiate between a parse tree and a syntax tree.
Parse Tree vs Syntax Tree
Both represent the structure of a string, but at different levels of abstraction.
| Aspect | Parse Tree (Concrete Syntax Tree) | Syntax Tree (Abstract Syntax Tree) |
|---|---|---|
| Contents | Contains every grammar symbol — all non-terminals and terminals used in the derivation | Contains only the essential operators and operands |
| Interior nodes | Non-terminals of the grammar | Operators |
| Leaf nodes | Terminals (tokens) | Operands (identifiers, constants) |
| Redundancy | Includes redundant info (e.g. single productions, parentheses, intermediate non-terminals) | Removes redundant details; more compact |
| Purpose | Shows the full derivation according to the grammar | A condensed IR used for semantic analysis and code generation |
Example: a + b * c
- The parse tree shows the full chain with all intermediate non-terminals ().
- The syntax tree is simply:
+
/ \
a *
/ \
b c
showing only the operators +, * and operands a, b, c.
What is an ambiguous grammar? Show with an example.
Ambiguous Grammar
A grammar is ambiguous if there exists at least one string in its language that has more than one distinct parse tree (equivalently, more than one leftmost or more than one rightmost derivation). Ambiguity is undesirable because the compiler cannot uniquely determine the structure/meaning of a statement.
Example
Consider the grammar:
The string has two different parse trees:
Derivation 1 (interpreting as ):
Derivation 2 (interpreting as ):
Since the same string yields two distinct leftmost derivations / parse trees, the grammar is ambiguous. Ambiguity can usually be removed by rewriting the grammar to enforce operator precedence and associativity (e.g. ).
Explain the role of the lexical analyzer in a compiler.
Role of the Lexical Analyzer
The lexical analyzer (scanner) is the first phase of a compiler. It reads the source program as a stream of characters and groups them into meaningful units called tokens (e.g. keywords, identifiers, constants, operators, punctuation). Its main roles are:
- Tokenization — read the input character by character and produce a sequence of tokens in the form
<token-name, attribute-value>(e.g.<id, pointer-to-symbol-table-entry>). The actual character string is the lexeme. - Removing whitespace and comments — strip blanks, tabs, newlines, and comments that are not needed by later phases.
- Interface with the symbol table — enter identifiers and constants into the symbol table.
- Correlate error messages — keep track of line numbers so syntax/semantic errors can be reported accurately.
- Supply tokens to the parser — usually works as a subroutine called by the syntax analyzer, returning the next token on demand.
Lexical analysis is typically specified using regular expressions and implemented with finite automata (DFA); tools like Lex/Flex generate scanners automatically. It also detects simple lexical errors such as illegal characters or malformed tokens.
Differentiate between LL(1) and LR parsers.
LL(1) vs LR Parsers
| Aspect | LL(1) Parser | LR Parser |
|---|---|---|
| Type | Top-down parsing | Bottom-up parsing |
| Derivation produced | Leftmost derivation (in reverse none — it builds leftmost) | Rightmost derivation in reverse |
| First L / scanning | Left-to-right scan | Left-to-right scan |
| Second letter | L = Leftmost derivation | R = Rightmost derivation |
| Power | Handles a smaller class of grammars | Handles a larger class (all LL grammars and more) |
| Left recursion | Cannot handle left-recursive grammars | Can handle left recursion |
| Construction | Simpler; uses FIRST/FOLLOW and a predictive table | More complex; uses item sets / DFA of LR(0) items and ACTION/GOTO tables |
| Error detection | Detects errors reasonably | Detects errors as early as possible (viable-prefix property) |
| Examples | Recursive-descent, predictive parser | SLR, CLR (LR(1)), LALR; tools like YACC/Bison |
Summary: LL(1) is top-down, easier to build by hand, but weaker; LR is bottom-up, more powerful (handles left recursion and more grammars), but harder to construct and usually generated by tools.
What are regular expressions? Write a regular expression for identifiers.
Regular Expressions
A regular expression (RE) is a notation for describing regular languages — the patterns of strings (lexemes) that form tokens. REs are built from symbols of an alphabet using three basic operations:
- Union / alternation: — a string matching or .
- Concatenation: — a string of followed by .
- Kleene closure: — zero or more repetitions of (and = one or more).
The base cases are (empty string) and each single symbol . Regular expressions are used in lexical analysis to specify the patterns of tokens, and each RE can be converted to a finite automaton that recognizes it.
Regular Expression for Identifiers
An identifier begins with a letter or underscore, followed by any number of letters, digits, or underscores. Let:
Then:
In regex shorthand: [A-Za-z_][A-Za-z0-9_]*
Explain the handle and handle pruning in shift-reduce parsing.
Handle and Handle Pruning in Shift-Reduce Parsing
Shift-reduce parsing is a bottom-up technique that builds the parse tree from the leaves to the root by repeatedly reducing a string to the start symbol. It uses a stack and the input buffer, performing shift (push the next input symbol) and reduce (replace a handle on top of the stack by a non-terminal) actions.
Handle
A handle is a substring of the current right-sentential form that:
- matches the right side of some production , and
- whose reduction to the left-side non-terminal represents one step of a reverse rightmost derivation.
Formally, if , then at the position right after is a handle. (It is not just any substring matching a production body — it must be the correct one for the rightmost derivation.)
Handle Pruning
Handle pruning is the process of obtaining a rightmost derivation in reverse: repeatedly locate the handle in the current right-sentential form and replace (reduce) it by the left side of its production, until the start symbol is reached.
Example
Grammar: , input :
| Right-sentential form | Handle | Reducing production |
|---|---|---|
| — | (start symbol reached) |
Reducing each handle in turn yields the start symbol, completing the parse.
Frequently asked questions
- Where can I find the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) question paper 2077?
- The full BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2077 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Compiler Design and Construction (BSc CSIT, CSC365) 2077 paper come with solutions?
- Yes. Every question on this Compiler Design and Construction (BSc CSIT, CSC365) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2077 paper?
- The BSc CSIT (TU) Compiler Design and Construction (BSc CSIT, CSC365) 2077 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Compiler Design and Construction (BSc CSIT, CSC365) past paper free?
- Yes — reading and attempting this Compiler Design and Construction (BSc CSIT, CSC365) past paper on Kekkei is completely free.