BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) Question Paper 2079 Nepal

Q: Where can I find the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) question paper 2079?

The full BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper come with solutions?

Yes. Every question on this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper?

The BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper carries 80 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper free?

Yes — reading and attempting this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper on Kekkei is completely free.

Question

1Long answer12 marks

(a) Draw the hardware block diagram for the Booth's multiplication algorithm and explain the function of each register. [6]

(b) Using Booth's algorithm, multiply the signed numbers (-7) × (+3) in 4-bit 2's complement representation. Show the contents of the A, Q and Q₋₁ registers after each step. [6]

alu-organizationarithmetic-algorithms

Answer 1

(a) Booth's Multiplication Hardware [6]

Block diagram (described in words):

        +-----------+
        |  Sequence |  (counts n bits)
        |  Counter  |
        +-----------+
              |
   +------+   +------+   +-----+
   |  A   |   |  Q   | Q-1 |
   +------+   +------+-----+
      |  Arithmetic/Logic |
      |     Unit (ALU)    |
      +-------------------+
              |
        +-----------+
        | Register M | (Multiplicand BR)
        +-----------+

Function of each register:

M (BR): holds the multiplicand. Added to or subtracted from the partial product depending on the recoded bit pair.
Q (QR): initially holds the multiplier; the low-order half of the product accumulates here at the end.
A (AC): accumulator, initialized to 0; holds the high-order part of the partial product.
Q₋₁: a 1-bit flip-flop appended to the right of Q. Together $(Q_0, Q_{-1})$ controls the operation.
Sequence Counter (SC): loaded with $n$ (number of multiplier bits); decremented each cycle; stops when 0.

Booth's rule (examine $Q_0 Q_{-1}$ ):

10 → $A \leftarrow A - M$
01 → $A \leftarrow A + M$
00 or 11 → no add/subtract

Then arithmetic shift right the combined $A\,Q\,Q_{-1}$ register.

(b) Compute $(-7)\times(+3)$ , 4-bit [6]

$M = +3 = 0011$ , $-M = 1101$ . Multiplier $Q = -7 = 1001$ . $A=0000$ , $Q_{-1}=0$ , $SC=4$ .

Step	Operation	A	Q	$Q_{-1}$
Init		0000	1001	0
1	$Q_0Q_{-1}=10 \Rightarrow A-M$	1101	1001	0
	ASR	1110	1100	1
2	$Q_0Q_{-1}=01 \Rightarrow A+M$	0001	1100	1
	ASR	0000	1110	0
3	$Q_0Q_{-1}=00 \Rightarrow$ no-op	0000	1110	0
	ASR	0000	0111	0
4	$Q_0Q_{-1}=10 \Rightarrow A-M$	1101	0111	0
	ASR	1110	1011	1

Result $A\,Q = 1110\,1011$ . As 8-bit 2's complement this is $-(00010101_2) = -21$ .

Check: $(-7)\times(+3) = -21.$ ✓

Marking: 6 marks for diagram + register roles; 6 marks for correct step-by-step trace reaching $-21$ .

Answer 2

(a) Microprogrammed Control Unit [7]

Block diagram (described):

Instruction opcode
        |
        v
  [ Mapping Logic ] --> [ Control Address Register (CAR) ]
                              |
                              v
                     [ Control Memory (ROM) ]
                              |
                              v
                  [ Control Data Register / micro-instruction ]
               /                              \
   control signals to datapath          next-address field
                                              |
                                              v
                                     [ Sequencer / Next-Address
                                       Generator ] --> back to CAR
                                       (uses status/condition bits)

Roles:

Control Address Register (CAR): holds the address of the next microinstruction to be read from control memory; acts like a microprogram counter (μPC).
Control Memory: a ROM/PROM that stores the microprogram — the sequence of microinstructions. Each word emits the control signals for one micro-operation plus the next-address information.
Sequencer (next-address generator): determines the next value loaded into CAR — it may increment CAR, take a branch address from the microinstruction, use mapping logic for the opcode, or use a subroutine register — selecting based on status/condition flags.

(b) Hardwired vs Microprogrammed [5]

Basis	Hardwired	Microprogrammed
Speed	Faster (pure logic)	Slower (control-memory fetch each cycle)
Flexibility	Rigid; design change needs rewiring	Flexible; change microprogram in control memory
Cost	Lower for simple, fixed control	Higher base cost; economical for complex sets
Implementation complexity	Hard to design/modify for large instruction sets	Systematic, easier to design and debug
Suited to	RISC, simple control	CISC, complex instruction sets

Marking: diagram 3, CAR/CM/sequencer roles 4; comparison table 5 (1 per basis).

Answer 3

(a) Four-Segment Instruction Pipeline [8]

The instruction cycle is split into four segments:

FI – Fetch Instruction
DA – Decode instruction & calculate effective Address
FO – Fetch Operand
EX – Execute

Space-time diagram (rows = instructions, columns = clock cycles):

Instr \ Clock	1	2	3	4	5	6	7
I1	FI	DA	FO	EX
I2		FI	DA	FO	EX
I3			FI	DA	FO	EX
I4				FI	DA	FO	EX

After the pipeline fills, one instruction completes every clock cycle.

Hazards and remedies:

Structural hazard: two segments need the same resource at once (e.g. FI and FO both access memory). Fix: separate instruction/data memories or caches (Harvard) or stall one cycle.
Data hazard: an instruction needs a result not yet produced (RAW dependency). Fix: operand forwarding/bypassing, or stall/insert bubbles, or compiler reordering.
Control (branch) hazard: a taken branch invalidates prefetched instructions. Fix: branch prediction, delayed branch, or flushing the pipeline.

(b) Numerical [6]

Non-pipelined time $T = 80$ ns. 5 stages with equal delay $= 80/5 = 16$ ns; latch overhead $= 2$ ns per stage.

Clock period $t_p = 16 + 2 = 18$ ns.
Time for 1000 instructions (pipelined): $(k + n - 1)\,t_p = (5 + 1000 - 1)\times 18 = 1004 \times 18 = 18072$ ns.
Non-pipelined time: $n\times T = 1000 \times 80 = 80000$ ns.
Speedup $S = \dfrac{80000}{18072} \approx 4.43$ .
Throughput $= \dfrac{1000}{18072\text{ ns}} \approx 55.3$ MIPS (i.e. $5.53\times10^7$ instr/s).

Marking: 8 for pipeline diagram + three hazards with fixes; 6 for clock period 18 ns, speedup ≈4.43, throughput ≈55 MIPS.

Answer 4

Main memory $= 4\text{K} = 4096 = 2^{12}$ words → 12-bit physical address. Cache $= 128$ words, block size $= 8$ words.

(a) Direct Mapping [6]

Word (offset) field: block size $= 8 = 2^3 \Rightarrow$ 3 bits.
Number of cache lines $= 128/8 = 16 = 2^4 \Rightarrow$ Line (block) field = 4 bits.
Tag field $= 12 - 4 - 3 =$ 5 bits.

Address format (12 bits):

Tag (5)	Line (4)	Word (3)

(Total $= 5+4+3 = 12$ bits. ✓)

(b) Set-Associative Mapping [6]

In set-associative mapping the cache is divided into sets, each set holding $k$ lines (k-way). A memory block maps to exactly one set (set = block number mod number-of-sets) but may occupy any line within that set. It is a compromise between direct (1 line/set) and fully associative.

For 2-way set-associative with the same cache:

Total lines $= 16$ ; lines per set $= 2 \Rightarrow$ number of sets $= 16/2 = 8 = 2^3$ .
Word field $= 3$ bits (unchanged).
Set field $= 3$ bits.
Tag field $= 12 - 3 - 3 =$ 6 bits.

Tag (6)	Set (3)	Word (3)

Advantage over direct mapping: because a block can go into any of the 2 lines of its set, fewer conflict (collision) misses occur when several blocks contend for the same line, giving a higher hit ratio.

Marking: (a) word=3, line=4, tag=5 with format — 6 marks; (b) explanation + word=3, set=3, tag=6 + one advantage — 6 marks.

Answer 5

Evaluate X = (A + B) * (C + D).

Three-address:

ADD  R1, A, B    ; R1 = A + B
ADD  R2, C, D    ; R2 = C + D
MUL  X,  R1, R2  ; X  = R1 * R2

3 instructions.

Two-address:

MOV  R1, A       ; R1 = A
ADD  R1, B       ; R1 = A + B
MOV  R2, C       ; R2 = C
ADD  R2, D       ; R2 = C + D
MUL  R1, R2      ; R1 = R1 * R2
MOV  X,  R1      ; X  = R1

6 instructions.

One-address (single accumulator AC):

LOAD  A      ; AC = A
ADD   B      ; AC = A + B
STORE T      ; T  = AC
LOAD  C      ; AC = C
ADD   D      ; AC = C + D
MUL   T      ; AC = (C+D)*(A+B)
STORE X      ; X  = AC

7 instructions.

Zero-address (stack):

PUSH A
PUSH B
ADD          ; (A+B)
PUSH C
PUSH D
ADD          ; (C+D)
MUL          ; (A+B)*(C+D)
POP  X

8 instructions.

Comment: As we move from three- to zero-address, the number of instructions increases (3 → 6 → 7 → 8) but each instruction becomes shorter (fewer/no address fields). Three-address needs the fewest instructions; the stack machine has the shortest instructions (operands implicit). Memory references for operands also differ: three-address makes the program compact, while one-/zero-address trade instruction count for smaller opcodes and simpler hardware.

Marking: ~1.5 marks per correct format program + comment on instruction count/length.

Answer 6

Addressing modes [4]

(a) Immediate: the operand itself is in the instruction (the address field is the data). E.g. MOV R1, #5 → R1 = 5. Use: loading constants/initializing registers.
(b) Register indirect: the instruction names a register that holds the address of the operand in memory. E.g. LOAD (R1) loads memory[R1]. Use: pointer/array traversal.
(c) Indexed: effective address = (base/address in instruction) + (index register). E.g. LOAD 500(R1) → EA = 500 + R1. Use: accessing array elements with a varying index.
(d) Relative: effective address = (address field) + (program counter, PC). E.g. branch targets. Use: position-independent code and branch instructions.

Computation for `LOAD 500`, PC = 200, R1 = 400, memory[500] = 800 [2]

Mode	Effective Address	Operand loaded
Direct	500	memory[500] = 800
Indexed (R1)	500 + 400 = 900	memory[900]
Relative (PC)	500 + 200 = 700	memory[700]

(For indexed and relative the operand is the value stored at EA 900 and 700 respectively; only memory[500] = 800 is given.)

Marking: 1 each for the four modes with example/use; 2 for the three effective-address computations 500/900/700.

Answer 7

(a) Programmed I/O vs Interrupt-driven I/O vs DMA [4]

Basis	Programmed I/O	Interrupt-driven I/O	DMA
CPU involvement	Highest — CPU continuously polls the status flag and moves every word (busy-wait)	Moderate — CPU does other work; device interrupts when ready, CPU services each word	Lowest — CPU only sets up the transfer; DMA controller moves the whole block
Data-transfer speed	Slowest (CPU bottleneck)	Faster than polling but per-word interrupt overhead	Fastest — block transfer directly between device and memory
Best for	Few bytes, simple devices	Devices with sporadic data	High-volume transfers (disk, network)

(b) Daisy Chaining for Interrupt Priority [2]

In daisy chaining, all devices share a single interrupt-request line to the CPU, and an acknowledge (INTA) signal is passed serially from one device to the next. Devices are connected in a chain ordered by priority — the device closest to the CPU has the highest priority. When the CPU acknowledges, the signal enters the first device; if that device requested the interrupt it places its vector on the bus and blocks the acknowledge from propagating; otherwise it passes the acknowledge to the next device. Thus priority is fixed by physical position in the chain.

Marking: 4 for the three-way comparison (CPU involvement + speed); 2 for daisy-chain priority mechanism.

Answer 8

DMA Controller Block Diagram [described]

          Address bus  <---+
          Data bus    <--->|     +------------------------+
          Read/Write  <----|     |   DMA Controller       |
                           |     |  +------------------+  |
   CPU  --BR/BG (bus req/  +-----|  | Address Register |  |
          grant)----------|     |  +------------------+  |
          DMA req/ack <---|     |  | Word-Count Reg.  |  |
   I/O device <---------->|     |  +------------------+  |
                                |  | Control Register |  |
                                |  +------------------+  |
                                +------------------------+

Key registers: an address register (points to the memory location), a word-count register (number of words remaining, decremented each transfer), and a control register (read/write, mode). The DMA also has DMA request/acknowledge lines to the device and bus request (BR)/bus grant (BG) lines to the CPU.

Cycle-Stealing Mode

In cycle stealing, the DMA controller transfers one word per bus cycle and then returns control of the bus to the CPU. It "steals" occasional memory cycles from the CPU: whenever the device is ready, DMA grabs the bus for a single cycle, moves one word, and releases it, so the CPU is only slowed slightly rather than halted. (Contrast with burst/block mode, where DMA holds the bus for the entire block.)

How DMA Gains the Bus

The I/O device asserts DMA request to the controller.
The DMA controller raises Bus Request (BR) to the CPU.
The CPU finishes its current bus cycle, floats (tri-states) its address, data and control lines, and asserts Bus Grant (BG).
The DMA controller now masters the bus, sends DMA acknowledge to the device, and performs the transfer directly between the device and memory.
On completion (word count = 0) it drops BR, the CPU regains the bus, and the DMA may raise an interrupt to signal completion.

Marking: 2 for diagram + registers; 2 for cycle stealing; 1 for BR/BG bus-acquisition sequence.

Answer 9

(a) Flynn's Classification [4]

Based on the number of concurrent instruction and data streams:

SISD (Single Instruction, Single Data): one processor executes one instruction stream on one data stream. Example: a traditional uniprocessor (classic von Neumann PC).
SIMD (Single Instruction, Multiple Data): one instruction operates on many data elements in parallel. Example: vector/array processors, GPUs.
MISD (Multiple Instruction, Single Data): many instruction streams operate on the same data; rare. Example: systolic arrays / fault-tolerant pipelined systems (e.g. Space Shuttle flight control).
MIMD (Multiple Instruction, Multiple Data): multiple processors execute different instructions on different data. Example: multicore/multiprocessor systems, clusters.

(b) Tightly vs Loosely Coupled Multiprocessors [2]

Tightly coupled	Loosely coupled
Share a common (global) main memory	Each processor has its own local memory (distributed)
Communicate through shared memory; fast	Communicate by message passing over an interconnection network; slower
Higher memory contention; smaller scale	Scales to many nodes; e.g. clusters/distributed systems

Marking: 1 per correct class + example (4); 2 for the tightly/loosely coupled distinction.

Answer 10

Stack Organization in a CPU

A stack is a Last-In-First-Out (LIFO) storage. A stack pointer (SP) register holds the address of the top of stack (TOS). Two operations: PUSH (insert) and POP (remove).

Register Stack vs Memory Stack

	Register stack	Memory stack
Storage	A fixed set of high-speed registers (finite depth)	A reserved region of main memory (RAM)
Capacity	Small, limited by hardware	Large, limited only by allocated memory
SP	Counts registers; FULL/EMPTY flags needed	An address register pointing into memory
Speed	Very fast	Slower (memory access)

PUSH / POP and SP

Register stack (SP counts up on push; with FULL & EMTY flags):

PUSH:  SP ← SP + 1
       M[SP] ← DR        ; store data at new TOS
       if (SP = 0) FULL ← 1
       EMTY ← 0
POP:   DR ← M[SP]        ; read TOS
       SP ← SP - 1
       if (SP = 0) EMTY ← 1
       FULL ← 0

Memory stack (typical: stack grows toward lower addresses):

PUSH:  SP ← SP - 1
       M[SP] ← data
POP:   data ← M[SP]
       SP ← SP + 1

Thus in a memory stack PUSH decrements SP and POP increments it (for a downward-growing stack), whereas in the register stack of Mano's model PUSH increments and POP decrements SP.

Marking: 1 for stack/LIFO + SP; 2 for register vs memory differences; 2 for PUSH/POP SP updates.

Answer 11

Virtual Memory

Virtual memory is a memory-management technique that gives a program the illusion of a very large, contiguous main memory by using main memory and secondary storage (disk) together. Programs use logical (virtual) addresses; only the actively used portions are kept in physical RAM, and the rest reside on disk, being brought in on demand. It lets programs larger than physical memory run, and provides protection/relocation.

Paging

In paging, the virtual address space is divided into fixed-size pages and physical memory into equal-size frames. A virtual address is split into a page number and a page offset:

\text{virtual address} = (\text{page number},\ \text{offset})

The page number indexes the page table to obtain a frame number; the physical address = (frame number, same offset). This maps any page to any frame, removing external fragmentation.

Page Table

The page table is a data structure (one entry per virtual page) that stores the frame number for each page plus control bits (valid/present, dirty, reference, protection). On each access the page number is used to look up the corresponding frame.

Translation Lookaside Buffer (TLB)

The page table itself lives in main memory, so a naive translation needs an extra memory access per reference. The TLB is a small, fast associative cache that stores recently used (page → frame) translations.

TLB hit: frame number obtained directly — no page-table access, fast translation.
TLB miss: the page table in memory is consulted and the new entry is loaded into the TLB.

Because of locality of reference, the TLB hit ratio is high, so address translation is sped up significantly.

Marking: 1 VM definition; 2 paging + page table; 2 TLB function and hit/miss.

Answer 12

RISC vs CISC

Basis	RISC	CISC
Instruction-set size	Small set of simple instructions	Large set, many complex instructions
Addressing modes	Few (load/store; mostly register)	Many, complex addressing modes
Instruction length	Fixed length, easy to decode/pipeline	Variable length
Use of registers	Large register file; memory accessed only by LOAD/STORE	Fewer registers; instructions can operate directly on memory
Control unit	Mostly hardwired	Mostly microprogrammed
Execution / cycles	Usually 1 instruction per cycle; pipelining friendly	Multiple cycles per instruction
Code size vs hardware	Larger code, simpler hardware	Compact code, complex hardware
Example processor	ARM, MIPS, RISC-V, SPARC	Intel x86 (8086/Pentium), Motorola 68000, VAX

Summary: RISC favors simple, uniform, fast instructions with compiler-managed registers and hardwired control; CISC packs complex operations into single instructions using microprogrammed control, reducing program size at the cost of decoding/hardware complexity.

Marking: ~0.5 per correctly contrasted attribute + 1 for correct example of each (ARM/MIPS vs x86).

Level	BE Computer Engineering (IOE, TU)
Subject	Computer Organization and Architecture (IOE, CT 603 / ENCT 303)
Year	2079 BS
Exam session	Regular (annual)
Full marks	80
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) Question Paper 2079 Nepal

Section A: Long Answer Questions

(a) Booth's Multiplication Hardware [6]

(b) Compute $(-7)\times(+3)$ , 4-bit [6]

(a) Microprogrammed Control Unit [7]

(b) Hardwired vs Microprogrammed [5]

(a) Four-Segment Instruction Pipeline [8]

(b) Numerical [6]

(a) Direct Mapping [6]

(b) Set-Associative Mapping [6]

Section B: Short Answer Questions

Addressing modes [4]

Computation for `LOAD 500`, PC = 200, R1 = 400, memory[500] = 800 [2]

(a) Programmed I/O vs Interrupt-driven I/O vs DMA [4]

(b) Daisy Chaining for Interrupt Priority [2]

DMA Controller Block Diagram [described]

Cycle-Stealing Mode

How DMA Gains the Bus

(a) Flynn's Classification [4]

(b) Tightly vs Loosely Coupled Multiprocessors [2]

Stack Organization in a CPU

Register Stack vs Memory Stack

PUSH / POP and SP

Virtual Memory

Paging

Page Table

Translation Lookaside Buffer (TLB)

RISC vs CISC

Frequently asked questions

Section A: Long Answer Questions

(a) Booth's Multiplication Hardware [6]

(b) Compute (−7)×(+3)(-7)\times(+3)(−7)×(+3), 4-bit [6]

(a) Microprogrammed Control Unit [7]

(b) Hardwired vs Microprogrammed [5]

(a) Four-Segment Instruction Pipeline [8]

(b) Numerical [6]

(a) Direct Mapping [6]

(b) Set-Associative Mapping [6]

Section B: Short Answer Questions

Addressing modes [4]

Computation for LOAD 500, PC = 200, R1 = 400, memory[500] = 800 [2]

(a) Programmed I/O vs Interrupt-driven I/O vs DMA [4]

(b) Daisy Chaining for Interrupt Priority [2]

DMA Controller Block Diagram [described]

Cycle-Stealing Mode

How DMA Gains the Bus

(a) Flynn's Classification [4]

(b) Tightly vs Loosely Coupled Multiprocessors [2]

Stack Organization in a CPU

Register Stack vs Memory Stack

PUSH / POP and SP

Virtual Memory

Paging

Page Table

Translation Lookaside Buffer (TLB)

RISC vs CISC

Frequently asked questions

(b) Compute $(-7)\times(+3)$ , 4-bit [6]

Computation for `LOAD 500`, PC = 200, R1 = 400, memory[500] = 800 [2]