BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) Question Paper 2079 Nepal
This is the official BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) question paper for 2079, as set in the regular annual examination. It carries 80 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) exam or solving previous years' question papers, this 2079 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt all / any as specified.
(a) Draw the hardware block diagram for the Booth's multiplication algorithm and explain the function of each register. [6]
(b) Using Booth's algorithm, multiply the signed numbers (-7) × (+3) in 4-bit 2's complement representation. Show the contents of the A, Q and Q₋₁ registers after each step. [6]
(a) Booth's Multiplication Hardware [6]
Block diagram (described in words):
+-----------+
| Sequence | (counts n bits)
| Counter |
+-----------+
|
+------+ +------+ +-----+
| A | | Q | Q-1 |
+------+ +------+-----+
| Arithmetic/Logic |
| Unit (ALU) |
+-------------------+
|
+-----------+
| Register M | (Multiplicand BR)
+-----------+
Function of each register:
- M (BR): holds the multiplicand. Added to or subtracted from the partial product depending on the recoded bit pair.
- Q (QR): initially holds the multiplier; the low-order half of the product accumulates here at the end.
- A (AC): accumulator, initialized to 0; holds the high-order part of the partial product.
- Q₋₁: a 1-bit flip-flop appended to the right of Q. Together controls the operation.
- Sequence Counter (SC): loaded with (number of multiplier bits); decremented each cycle; stops when 0.
Booth's rule (examine ):
10→01→00or11→ no add/subtract
Then arithmetic shift right the combined register.
(b) Compute , 4-bit [6]
, . Multiplier . , , .
| Step | Operation | A | Q | |
|---|---|---|---|---|
| Init | 0000 | 1001 | 0 | |
| 1 | 1101 | 1001 | 0 | |
| ASR | 1110 | 1100 | 1 | |
| 2 | 0001 | 1100 | 1 | |
| ASR | 0000 | 1110 | 0 | |
| 3 | no-op | 0000 | 1110 | 0 |
| ASR | 0000 | 0111 | 0 | |
| 4 | 1101 | 0111 | 0 | |
| ASR | 1110 | 1011 | 1 |
Result . As 8-bit 2's complement this is .
Check: ✓
Marking: 6 marks for diagram + register roles; 6 marks for correct step-by-step trace reaching .
A control unit can be implemented as hardwired or microprogrammed.
(a) Draw the block diagram of a microprogrammed control unit and explain the role of the control address register (CAR), control memory and sequencer. [7]
(b) Compare hardwired and microprogrammed control units on the basis of speed, flexibility, cost and implementation complexity. [5]
(a) Microprogrammed Control Unit [7]
Block diagram (described):
Instruction opcode
|
v
[ Mapping Logic ] --> [ Control Address Register (CAR) ]
|
v
[ Control Memory (ROM) ]
|
v
[ Control Data Register / micro-instruction ]
/ \
control signals to datapath next-address field
|
v
[ Sequencer / Next-Address
Generator ] --> back to CAR
(uses status/condition bits)
Roles:
- Control Address Register (CAR): holds the address of the next microinstruction to be read from control memory; acts like a microprogram counter (μPC).
- Control Memory: a ROM/PROM that stores the microprogram — the sequence of microinstructions. Each word emits the control signals for one micro-operation plus the next-address information.
- Sequencer (next-address generator): determines the next value loaded into CAR — it may increment CAR, take a branch address from the microinstruction, use mapping logic for the opcode, or use a subroutine register — selecting based on status/condition flags.
(b) Hardwired vs Microprogrammed [5]
| Basis | Hardwired | Microprogrammed |
|---|---|---|
| Speed | Faster (pure logic) | Slower (control-memory fetch each cycle) |
| Flexibility | Rigid; design change needs rewiring | Flexible; change microprogram in control memory |
| Cost | Lower for simple, fixed control | Higher base cost; economical for complex sets |
| Implementation complexity | Hard to design/modify for large instruction sets | Systematic, easier to design and debug |
| Suited to | RISC, simple control | CISC, complex instruction sets |
Marking: diagram 3, CAR/CM/sequencer roles 4; comparison table 5 (1 per basis).
(a) Explain the four-segment instruction pipeline with a space-time diagram, and discuss the different types of pipeline hazards (structural, data and control) along with at least one technique to handle each. [8]
(b) A non-pipelined processor executes a task in 80 ns. Suppose the same task is divided into a 5-stage pipeline with equal stage delays and a clock skew/latch overhead of 2 ns per stage. Compute the clock period, the speedup and the throughput of the pipeline for executing 1000 instructions. [6]
(a) Four-Segment Instruction Pipeline [8]
The instruction cycle is split into four segments:
- FI – Fetch Instruction
- DA – Decode instruction & calculate effective Address
- FO – Fetch Operand
- EX – Execute
Space-time diagram (rows = instructions, columns = clock cycles):
| Instr \ Clock | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| I1 | FI | DA | FO | EX | |||
| I2 | FI | DA | FO | EX | |||
| I3 | FI | DA | FO | EX | |||
| I4 | FI | DA | FO | EX |
After the pipeline fills, one instruction completes every clock cycle.
Hazards and remedies:
- Structural hazard: two segments need the same resource at once (e.g. FI and FO both access memory). Fix: separate instruction/data memories or caches (Harvard) or stall one cycle.
- Data hazard: an instruction needs a result not yet produced (RAW dependency). Fix: operand forwarding/bypassing, or stall/insert bubbles, or compiler reordering.
- Control (branch) hazard: a taken branch invalidates prefetched instructions. Fix: branch prediction, delayed branch, or flushing the pipeline.
(b) Numerical [6]
Non-pipelined time ns. 5 stages with equal delay ns; latch overhead ns per stage.
- Clock period ns.
- Time for 1000 instructions (pipelined): ns.
- Non-pipelined time: ns.
- Speedup .
- Throughput MIPS (i.e. instr/s).
Marking: 8 for pipeline diagram + three hazards with fixes; 6 for clock period 18 ns, speedup ≈4.43, throughput ≈55 MIPS.
Consider a main memory of size 4 K words and a cache of 128 words with a block size of 8 words.
(a) For direct mapping, determine the number of bits in the tag, line (block) and word fields of the CPU address, and draw the address format. [6]
(b) Explain set-associative mapping and, for a 2-way set-associative organization of the same cache, recompute the tag, set and word field widths. State one advantage it has over direct mapping. [6]
Main memory words → 12-bit physical address. Cache words, block size words.
(a) Direct Mapping [6]
- Word (offset) field: block size 3 bits.
- Number of cache lines Line (block) field = 4 bits.
- Tag field 5 bits.
Address format (12 bits):
| Tag (5) | Line (4) | Word (3) |
|---|
(Total bits. ✓)
(b) Set-Associative Mapping [6]
In set-associative mapping the cache is divided into sets, each set holding lines (k-way). A memory block maps to exactly one set (set = block number mod number-of-sets) but may occupy any line within that set. It is a compromise between direct (1 line/set) and fully associative.
For 2-way set-associative with the same cache:
- Total lines ; lines per set number of sets .
- Word field bits (unchanged).
- Set field bits.
- Tag field 6 bits.
| Tag (6) | Set (3) | Word (3) |
|---|
Advantage over direct mapping: because a block can go into any of the 2 lines of its set, fewer conflict (collision) misses occur when several blocks contend for the same line, giving a higher hit ratio.
Marking: (a) word=3, line=4, tag=5 with format — 6 marks; (b) explanation + word=3, set=3, tag=6 + one advantage — 6 marks.
Section B: Short Answer Questions
Attempt all / any as specified.
Write a program to evaluate the arithmetic statement X = (A + B) * (C + D) using three-address, two-address, one-address and zero-address (stack) instruction formats. Comment on the number of instructions and memory references required by each.
Evaluate X = (A + B) * (C + D).
Three-address:
ADD R1, A, B ; R1 = A + B
ADD R2, C, D ; R2 = C + D
MUL X, R1, R2 ; X = R1 * R2
3 instructions.
Two-address:
MOV R1, A ; R1 = A
ADD R1, B ; R1 = A + B
MOV R2, C ; R2 = C
ADD R2, D ; R2 = C + D
MUL R1, R2 ; R1 = R1 * R2
MOV X, R1 ; X = R1
6 instructions.
One-address (single accumulator AC):
LOAD A ; AC = A
ADD B ; AC = A + B
STORE T ; T = AC
LOAD C ; AC = C
ADD D ; AC = C + D
MUL T ; AC = (C+D)*(A+B)
STORE X ; X = AC
7 instructions.
Zero-address (stack):
PUSH A
PUSH B
ADD ; (A+B)
PUSH C
PUSH D
ADD ; (C+D)
MUL ; (A+B)*(C+D)
POP X
8 instructions.
Comment: As we move from three- to zero-address, the number of instructions increases (3 → 6 → 7 → 8) but each instruction becomes shorter (fewer/no address fields). Three-address needs the fewest instructions; the stack machine has the shortest instructions (operands implicit). Memory references for operands also differ: three-address makes the program compact, while one-/zero-address trade instruction count for smaller opcodes and simpler hardware.
Marking: ~1.5 marks per correct format program + comment on instruction count/length.
Explain the following addressing modes with a suitable example for each, and state one typical use: (a) immediate, (b) register indirect, (c) indexed, and (d) relative addressing. For the instruction LOAD 500, if PC = 200, R1 = 400 and memory[500] = 800, determine the effective address and the loaded operand in direct, indexed (using R1) and relative modes.
Addressing modes [4]
- (a) Immediate: the operand itself is in the instruction (the address field is the data). E.g.
MOV R1, #5→ R1 = 5. Use: loading constants/initializing registers. - (b) Register indirect: the instruction names a register that holds the address of the operand in memory. E.g.
LOAD (R1)loadsmemory[R1]. Use: pointer/array traversal. - (c) Indexed: effective address = (base/address in instruction) + (index register). E.g.
LOAD 500(R1)→ EA = 500 + R1. Use: accessing array elements with a varying index. - (d) Relative: effective address = (address field) + (program counter, PC). E.g. branch targets. Use: position-independent code and branch instructions.
Computation for LOAD 500, PC = 200, R1 = 400, memory[500] = 800 [2]
| Mode | Effective Address | Operand loaded |
|---|---|---|
| Direct | 500 | memory[500] = 800 |
| Indexed (R1) | 500 + 400 = 900 | memory[900] |
| Relative (PC) | 500 + 200 = 700 | memory[700] |
(For indexed and relative the operand is the value stored at EA 900 and 700 respectively; only memory[500] = 800 is given.)
Marking: 1 each for the four modes with example/use; 2 for the three effective-address computations 500/900/700.
(a) Differentiate between programmed I/O, interrupt-driven I/O and DMA on the basis of CPU involvement and data transfer speed. [4]
(b) Explain how daisy chaining is used for establishing interrupt priority among multiple I/O devices. [2]
(a) Programmed I/O vs Interrupt-driven I/O vs DMA [4]
| Basis | Programmed I/O | Interrupt-driven I/O | DMA |
|---|---|---|---|
| CPU involvement | Highest — CPU continuously polls the status flag and moves every word (busy-wait) | Moderate — CPU does other work; device interrupts when ready, CPU services each word | Lowest — CPU only sets up the transfer; DMA controller moves the whole block |
| Data-transfer speed | Slowest (CPU bottleneck) | Faster than polling but per-word interrupt overhead | Fastest — block transfer directly between device and memory |
| Best for | Few bytes, simple devices | Devices with sporadic data | High-volume transfers (disk, network) |
(b) Daisy Chaining for Interrupt Priority [2]
In daisy chaining, all devices share a single interrupt-request line to the CPU, and an acknowledge (INTA) signal is passed serially from one device to the next. Devices are connected in a chain ordered by priority — the device closest to the CPU has the highest priority. When the CPU acknowledges, the signal enters the first device; if that device requested the interrupt it places its vector on the bus and blocks the acknowledge from propagating; otherwise it passes the acknowledge to the next device. Thus priority is fixed by physical position in the chain.
Marking: 4 for the three-way comparison (CPU involvement + speed); 2 for daisy-chain priority mechanism.
Draw the block diagram of a DMA controller and explain the cycle stealing mode of data transfer. How does the DMA controller gain control of the system bus from the CPU?
DMA Controller Block Diagram [described]
Address bus <---+
Data bus <--->| +------------------------+
Read/Write <----| | DMA Controller |
| | +------------------+ |
CPU --BR/BG (bus req/ +-----| | Address Register | |
grant)----------| | +------------------+ |
DMA req/ack <---| | | Word-Count Reg. | |
I/O device <---------->| | +------------------+ |
| | Control Register | |
| +------------------+ |
+------------------------+
Key registers: an address register (points to the memory location), a word-count register (number of words remaining, decremented each transfer), and a control register (read/write, mode). The DMA also has DMA request/acknowledge lines to the device and bus request (BR)/bus grant (BG) lines to the CPU.
Cycle-Stealing Mode
In cycle stealing, the DMA controller transfers one word per bus cycle and then returns control of the bus to the CPU. It "steals" occasional memory cycles from the CPU: whenever the device is ready, DMA grabs the bus for a single cycle, moves one word, and releases it, so the CPU is only slowed slightly rather than halted. (Contrast with burst/block mode, where DMA holds the bus for the entire block.)
How DMA Gains the Bus
- The I/O device asserts DMA request to the controller.
- The DMA controller raises Bus Request (BR) to the CPU.
- The CPU finishes its current bus cycle, floats (tri-states) its address, data and control lines, and asserts Bus Grant (BG).
- The DMA controller now masters the bus, sends DMA acknowledge to the device, and performs the transfer directly between the device and memory.
- On completion (word count = 0) it drops BR, the CPU regains the bus, and the DMA may raise an interrupt to signal completion.
Marking: 2 for diagram + registers; 2 for cycle stealing; 1 for BR/BG bus-acquisition sequence.
(a) State Flynn's classification of computer architectures (SISD, SIMD, MISD, MIMD) with one example of each. [4]
(b) Differentiate between tightly coupled and loosely coupled multiprocessor systems. [2]
(a) Flynn's Classification [4]
Based on the number of concurrent instruction and data streams:
- SISD (Single Instruction, Single Data): one processor executes one instruction stream on one data stream. Example: a traditional uniprocessor (classic von Neumann PC).
- SIMD (Single Instruction, Multiple Data): one instruction operates on many data elements in parallel. Example: vector/array processors, GPUs.
- MISD (Multiple Instruction, Single Data): many instruction streams operate on the same data; rare. Example: systolic arrays / fault-tolerant pipelined systems (e.g. Space Shuttle flight control).
- MIMD (Multiple Instruction, Multiple Data): multiple processors execute different instructions on different data. Example: multicore/multiprocessor systems, clusters.
(b) Tightly vs Loosely Coupled Multiprocessors [2]
| Tightly coupled | Loosely coupled |
|---|---|
| Share a common (global) main memory | Each processor has its own local memory (distributed) |
| Communicate through shared memory; fast | Communicate by message passing over an interconnection network; slower |
| Higher memory contention; smaller scale | Scales to many nodes; e.g. clusters/distributed systems |
Marking: 1 per correct class + example (4); 2 for the tightly/loosely coupled distinction.
Explain the organization of a stack in a CPU. Differentiate between a register stack and a memory stack, and show how the PUSH and POP operations modify the stack pointer (SP) in each case.
Stack Organization in a CPU
A stack is a Last-In-First-Out (LIFO) storage. A stack pointer (SP) register holds the address of the top of stack (TOS). Two operations: PUSH (insert) and POP (remove).
Register Stack vs Memory Stack
| Register stack | Memory stack | |
|---|---|---|
| Storage | A fixed set of high-speed registers (finite depth) | A reserved region of main memory (RAM) |
| Capacity | Small, limited by hardware | Large, limited only by allocated memory |
| SP | Counts registers; FULL/EMPTY flags needed | An address register pointing into memory |
| Speed | Very fast | Slower (memory access) |
PUSH / POP and SP
Register stack (SP counts up on push; with FULL & EMTY flags):
PUSH: SP ← SP + 1
M[SP] ← DR ; store data at new TOS
if (SP = 0) FULL ← 1
EMTY ← 0
POP: DR ← M[SP] ; read TOS
SP ← SP - 1
if (SP = 0) EMTY ← 1
FULL ← 0
Memory stack (typical: stack grows toward lower addresses):
PUSH: SP ← SP - 1
M[SP] ← data
POP: data ← M[SP]
SP ← SP + 1
Thus in a memory stack PUSH decrements SP and POP increments it (for a downward-growing stack), whereas in the register stack of Mano's model PUSH increments and POP decrements SP.
Marking: 1 for stack/LIFO + SP; 2 for register vs memory differences; 2 for PUSH/POP SP updates.
Define virtual memory. Explain paging as an address-mapping technique and describe the function of the page table and the translation lookaside buffer (TLB) in speeding up address translation.
Virtual Memory
Virtual memory is a memory-management technique that gives a program the illusion of a very large, contiguous main memory by using main memory and secondary storage (disk) together. Programs use logical (virtual) addresses; only the actively used portions are kept in physical RAM, and the rest reside on disk, being brought in on demand. It lets programs larger than physical memory run, and provides protection/relocation.
Paging
In paging, the virtual address space is divided into fixed-size pages and physical memory into equal-size frames. A virtual address is split into a page number and a page offset:
The page number indexes the page table to obtain a frame number; the physical address = (frame number, same offset). This maps any page to any frame, removing external fragmentation.
Page Table
The page table is a data structure (one entry per virtual page) that stores the frame number for each page plus control bits (valid/present, dirty, reference, protection). On each access the page number is used to look up the corresponding frame.
Translation Lookaside Buffer (TLB)
The page table itself lives in main memory, so a naive translation needs an extra memory access per reference. The TLB is a small, fast associative cache that stores recently used (page → frame) translations.
- TLB hit: frame number obtained directly — no page-table access, fast translation.
- TLB miss: the page table in memory is consulted and the new entry is loaded into the TLB.
Because of locality of reference, the TLB hit ratio is high, so address translation is sped up significantly.
Marking: 1 VM definition; 2 paging + page table; 2 TLB function and hit/miss.
Compare RISC and CISC architectures with respect to instruction set size, addressing modes, instruction length, use of registers and control unit implementation. Give one example processor of each type.
RISC vs CISC
| Basis | RISC | CISC |
|---|---|---|
| Instruction-set size | Small set of simple instructions | Large set, many complex instructions |
| Addressing modes | Few (load/store; mostly register) | Many, complex addressing modes |
| Instruction length | Fixed length, easy to decode/pipeline | Variable length |
| Use of registers | Large register file; memory accessed only by LOAD/STORE | Fewer registers; instructions can operate directly on memory |
| Control unit | Mostly hardwired | Mostly microprogrammed |
| Execution / cycles | Usually 1 instruction per cycle; pipelining friendly | Multiple cycles per instruction |
| Code size vs hardware | Larger code, simpler hardware | Compact code, complex hardware |
| Example processor | ARM, MIPS, RISC-V, SPARC | Intel x86 (8086/Pentium), Motorola 68000, VAX |
Summary: RISC favors simple, uniform, fast instructions with compiler-managed registers and hardwired control; CISC packs complex operations into single instructions using microprogrammed control, reducing program size at the cost of decoding/hardware complexity.
Marking: ~0.5 per correctly contrasted attribute + 1 for correct example of each (ARM/MIPS vs x86).
Frequently asked questions
- Where can I find the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) question paper 2079?
- The full BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper come with solutions?
- Yes. Every question on this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper?
- The BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper carries 80 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper free?
- Yes — reading and attempting this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper on Kekkei is completely free.