Browse papers
A

Section A: Long Answer Questions

Attempt all / any as specified.

4 questions
1long12 marks

(a) Draw the hardware block diagram for the Booth's multiplication algorithm and explain the function of each register. [6]

(b) Using Booth's algorithm, multiply the signed numbers (-7) × (+3) in 4-bit 2's complement representation. Show the contents of the A, Q and Q₋₁ registers after each step. [6]

(a) Booth's Multiplication Hardware [6]

Block diagram (described in words):

        +-----------+
        |  Sequence |  (counts n bits)
        |  Counter  |
        +-----------+
              |
   +------+   +------+   +-----+
   |  A   |   |  Q   | Q-1 |
   +------+   +------+-----+
      |  Arithmetic/Logic |
      |     Unit (ALU)    |
      +-------------------+
              |
        +-----------+
        | Register M | (Multiplicand BR)
        +-----------+

Function of each register:

  • M (BR): holds the multiplicand. Added to or subtracted from the partial product depending on the recoded bit pair.
  • Q (QR): initially holds the multiplier; the low-order half of the product accumulates here at the end.
  • A (AC): accumulator, initialized to 0; holds the high-order part of the partial product.
  • Q₋₁: a 1-bit flip-flop appended to the right of Q. Together (Q0,Q1)(Q_0, Q_{-1}) controls the operation.
  • Sequence Counter (SC): loaded with nn (number of multiplier bits); decremented each cycle; stops when 0.

Booth's rule (examine Q0Q1Q_0 Q_{-1}):

  • 10AAMA \leftarrow A - M
  • 01AA+MA \leftarrow A + M
  • 00 or 11 → no add/subtract

Then arithmetic shift right the combined AQQ1A\,Q\,Q_{-1} register.

(b) Compute (7)×(+3)(-7)\times(+3), 4-bit [6]

M=+3=0011M = +3 = 0011, M=1101-M = 1101. Multiplier Q=7=1001Q = -7 = 1001. A=0000A=0000, Q1=0Q_{-1}=0, SC=4SC=4.

StepOperationAQQ1Q_{-1}
Init000010010
1Q0Q1=10AMQ_0Q_{-1}=10 \Rightarrow A-M110110010
ASR111011001
2Q0Q1=01A+MQ_0Q_{-1}=01 \Rightarrow A+M000111001
ASR000011100
3Q0Q1=00Q_0Q_{-1}=00 \Rightarrow no-op000011100
ASR000001110
4Q0Q1=10AMQ_0Q_{-1}=10 \Rightarrow A-M110101110
ASR111010111

Result AQ=11101011A\,Q = 1110\,1011. As 8-bit 2's complement this is (000101012)=21-(00010101_2) = -21.

Check: (7)×(+3)=21.(-7)\times(+3) = -21.

Marking: 6 marks for diagram + register roles; 6 marks for correct step-by-step trace reaching 21-21.

alu-organizationarithmetic-algorithms
2long12 marks

A control unit can be implemented as hardwired or microprogrammed.

(a) Draw the block diagram of a microprogrammed control unit and explain the role of the control address register (CAR), control memory and sequencer. [7]

(b) Compare hardwired and microprogrammed control units on the basis of speed, flexibility, cost and implementation complexity. [5]

(a) Microprogrammed Control Unit [7]

Block diagram (described):

Instruction opcode
        |
        v
  [ Mapping Logic ] --> [ Control Address Register (CAR) ]
                              |
                              v
                     [ Control Memory (ROM) ]
                              |
                              v
                  [ Control Data Register / micro-instruction ]
               /                              \
   control signals to datapath          next-address field
                                              |
                                              v
                                     [ Sequencer / Next-Address
                                       Generator ] --> back to CAR
                                       (uses status/condition bits)

Roles:

  • Control Address Register (CAR): holds the address of the next microinstruction to be read from control memory; acts like a microprogram counter (μPC).
  • Control Memory: a ROM/PROM that stores the microprogram — the sequence of microinstructions. Each word emits the control signals for one micro-operation plus the next-address information.
  • Sequencer (next-address generator): determines the next value loaded into CAR — it may increment CAR, take a branch address from the microinstruction, use mapping logic for the opcode, or use a subroutine register — selecting based on status/condition flags.

(b) Hardwired vs Microprogrammed [5]

BasisHardwiredMicroprogrammed
SpeedFaster (pure logic)Slower (control-memory fetch each cycle)
FlexibilityRigid; design change needs rewiringFlexible; change microprogram in control memory
CostLower for simple, fixed controlHigher base cost; economical for complex sets
Implementation complexityHard to design/modify for large instruction setsSystematic, easier to design and debug
Suited toRISC, simple controlCISC, complex instruction sets

Marking: diagram 3, CAR/CM/sequencer roles 4; comparison table 5 (1 per basis).

control-unitmicroprogrammed-control
3long14 marks

(a) Explain the four-segment instruction pipeline with a space-time diagram, and discuss the different types of pipeline hazards (structural, data and control) along with at least one technique to handle each. [8]

(b) A non-pipelined processor executes a task in 80 ns. Suppose the same task is divided into a 5-stage pipeline with equal stage delays and a clock skew/latch overhead of 2 ns per stage. Compute the clock period, the speedup and the throughput of the pipeline for executing 1000 instructions. [6]

(a) Four-Segment Instruction Pipeline [8]

The instruction cycle is split into four segments:

  1. FI – Fetch Instruction
  2. DA – Decode instruction & calculate effective Address
  3. FO – Fetch Operand
  4. EX – Execute

Space-time diagram (rows = instructions, columns = clock cycles):

Instr \ Clock1234567
I1FIDAFOEX
I2FIDAFOEX
I3FIDAFOEX
I4FIDAFOEX

After the pipeline fills, one instruction completes every clock cycle.

Hazards and remedies:

  • Structural hazard: two segments need the same resource at once (e.g. FI and FO both access memory). Fix: separate instruction/data memories or caches (Harvard) or stall one cycle.
  • Data hazard: an instruction needs a result not yet produced (RAW dependency). Fix: operand forwarding/bypassing, or stall/insert bubbles, or compiler reordering.
  • Control (branch) hazard: a taken branch invalidates prefetched instructions. Fix: branch prediction, delayed branch, or flushing the pipeline.

(b) Numerical [6]

Non-pipelined time T=80T = 80 ns. 5 stages with equal delay =80/5=16= 80/5 = 16 ns; latch overhead =2= 2 ns per stage.

  • Clock period tp=16+2=18t_p = 16 + 2 = 18 ns.
  • Time for 1000 instructions (pipelined): (k+n1)tp=(5+10001)×18=1004×18=18072(k + n - 1)\,t_p = (5 + 1000 - 1)\times 18 = 1004 \times 18 = 18072 ns.
  • Non-pipelined time: n×T=1000×80=80000n\times T = 1000 \times 80 = 80000 ns.
  • Speedup S=80000180724.43S = \dfrac{80000}{18072} \approx 4.43.
  • Throughput =100018072 ns55.3= \dfrac{1000}{18072\text{ ns}} \approx 55.3 MIPS (i.e. 5.53×1075.53\times10^7 instr/s).

Marking: 8 for pipeline diagram + three hazards with fixes; 6 for clock period 18 ns, speedup ≈4.43, throughput ≈55 MIPS.

pipelininginstruction-pipelinehazards
4long12 marks

Consider a main memory of size 4 K words and a cache of 128 words with a block size of 8 words.

(a) For direct mapping, determine the number of bits in the tag, line (block) and word fields of the CPU address, and draw the address format. [6]

(b) Explain set-associative mapping and, for a 2-way set-associative organization of the same cache, recompute the tag, set and word field widths. State one advantage it has over direct mapping. [6]

Main memory =4K=4096=212= 4\text{K} = 4096 = 2^{12} words → 12-bit physical address. Cache =128= 128 words, block size =8= 8 words.

(a) Direct Mapping [6]

  • Word (offset) field: block size =8=23= 8 = 2^3 \Rightarrow 3 bits.
  • Number of cache lines =128/8=16=24= 128/8 = 16 = 2^4 \Rightarrow Line (block) field = 4 bits.
  • Tag field =1243== 12 - 4 - 3 = 5 bits.

Address format (12 bits):

Tag (5)Line (4)Word (3)

(Total =5+4+3=12= 5+4+3 = 12 bits. ✓)

(b) Set-Associative Mapping [6]

In set-associative mapping the cache is divided into sets, each set holding kk lines (k-way). A memory block maps to exactly one set (set = block number mod number-of-sets) but may occupy any line within that set. It is a compromise between direct (1 line/set) and fully associative.

For 2-way set-associative with the same cache:

  • Total lines =16= 16; lines per set =2= 2 \Rightarrow number of sets =16/2=8=23= 16/2 = 8 = 2^3.
  • Word field =3= 3 bits (unchanged).
  • Set field =3= 3 bits.
  • Tag field =1233== 12 - 3 - 3 = 6 bits.
Tag (6)Set (3)Word (3)

Advantage over direct mapping: because a block can go into any of the 2 lines of its set, fewer conflict (collision) misses occur when several blocks contend for the same line, giving a higher hit ratio.

Marking: (a) word=3, line=4, tag=5 with format — 6 marks; (b) explanation + word=3, set=3, tag=6 + one advantage — 6 marks.

memory-hierarchycachemapping
B

Section B: Short Answer Questions

Attempt all / any as specified.

8 questions
5short6 marks

Write a program to evaluate the arithmetic statement X = (A + B) * (C + D) using three-address, two-address, one-address and zero-address (stack) instruction formats. Comment on the number of instructions and memory references required by each.

Evaluate X = (A + B) * (C + D).

Three-address:

ADD  R1, A, B    ; R1 = A + B
ADD  R2, C, D    ; R2 = C + D
MUL  X,  R1, R2  ; X  = R1 * R2

3 instructions.

Two-address:

MOV  R1, A       ; R1 = A
ADD  R1, B       ; R1 = A + B
MOV  R2, C       ; R2 = C
ADD  R2, D       ; R2 = C + D
MUL  R1, R2      ; R1 = R1 * R2
MOV  X,  R1      ; X  = R1

6 instructions.

One-address (single accumulator AC):

LOAD  A      ; AC = A
ADD   B      ; AC = A + B
STORE T      ; T  = AC
LOAD  C      ; AC = C
ADD   D      ; AC = C + D
MUL   T      ; AC = (C+D)*(A+B)
STORE X      ; X  = AC

7 instructions.

Zero-address (stack):

PUSH A
PUSH B
ADD          ; (A+B)
PUSH C
PUSH D
ADD          ; (C+D)
MUL          ; (A+B)*(C+D)
POP  X

8 instructions.

Comment: As we move from three- to zero-address, the number of instructions increases (3 → 6 → 7 → 8) but each instruction becomes shorter (fewer/no address fields). Three-address needs the fewest instructions; the stack machine has the shortest instructions (operands implicit). Memory references for operands also differ: three-address makes the program compact, while one-/zero-address trade instruction count for smaller opcodes and simpler hardware.

Marking: ~1.5 marks per correct format program + comment on instruction count/length.

instruction-set-architectureinstruction-formats
6short6 marks

Explain the following addressing modes with a suitable example for each, and state one typical use: (a) immediate, (b) register indirect, (c) indexed, and (d) relative addressing. For the instruction LOAD 500, if PC = 200, R1 = 400 and memory[500] = 800, determine the effective address and the loaded operand in direct, indexed (using R1) and relative modes.

Addressing modes [4]

  • (a) Immediate: the operand itself is in the instruction (the address field is the data). E.g. MOV R1, #5 → R1 = 5. Use: loading constants/initializing registers.
  • (b) Register indirect: the instruction names a register that holds the address of the operand in memory. E.g. LOAD (R1) loads memory[R1]. Use: pointer/array traversal.
  • (c) Indexed: effective address = (base/address in instruction) + (index register). E.g. LOAD 500(R1) → EA = 500 + R1. Use: accessing array elements with a varying index.
  • (d) Relative: effective address = (address field) + (program counter, PC). E.g. branch targets. Use: position-independent code and branch instructions.

Computation for LOAD 500, PC = 200, R1 = 400, memory[500] = 800 [2]

ModeEffective AddressOperand loaded
Direct500memory[500] = 800
Indexed (R1)500 + 400 = 900memory[900]
Relative (PC)500 + 200 = 700memory[700]

(For indexed and relative the operand is the value stored at EA 900 and 700 respectively; only memory[500] = 800 is given.)

Marking: 1 each for the four modes with example/use; 2 for the three effective-address computations 500/900/700.

addressing-modes
7short6 marks

(a) Differentiate between programmed I/O, interrupt-driven I/O and DMA on the basis of CPU involvement and data transfer speed. [4]

(b) Explain how daisy chaining is used for establishing interrupt priority among multiple I/O devices. [2]

(a) Programmed I/O vs Interrupt-driven I/O vs DMA [4]

BasisProgrammed I/OInterrupt-driven I/ODMA
CPU involvementHighest — CPU continuously polls the status flag and moves every word (busy-wait)Moderate — CPU does other work; device interrupts when ready, CPU services each wordLowest — CPU only sets up the transfer; DMA controller moves the whole block
Data-transfer speedSlowest (CPU bottleneck)Faster than polling but per-word interrupt overheadFastest — block transfer directly between device and memory
Best forFew bytes, simple devicesDevices with sporadic dataHigh-volume transfers (disk, network)

(b) Daisy Chaining for Interrupt Priority [2]

In daisy chaining, all devices share a single interrupt-request line to the CPU, and an acknowledge (INTA) signal is passed serially from one device to the next. Devices are connected in a chain ordered by priority — the device closest to the CPU has the highest priority. When the CPU acknowledges, the signal enters the first device; if that device requested the interrupt it places its vector on the bus and blocks the acknowledge from propagating; otherwise it passes the acknowledge to the next device. Thus priority is fixed by physical position in the chain.

Marking: 4 for the three-way comparison (CPU involvement + speed); 2 for daisy-chain priority mechanism.

io-organizationinterrupts
8short5 marks

Draw the block diagram of a DMA controller and explain the cycle stealing mode of data transfer. How does the DMA controller gain control of the system bus from the CPU?

DMA Controller Block Diagram [described]

          Address bus  <---+
          Data bus    <--->|     +------------------------+
          Read/Write  <----|     |   DMA Controller       |
                           |     |  +------------------+  |
   CPU  --BR/BG (bus req/  +-----|  | Address Register |  |
          grant)----------|     |  +------------------+  |
          DMA req/ack <---|     |  | Word-Count Reg.  |  |
   I/O device <---------->|     |  +------------------+  |
                                |  | Control Register |  |
                                |  +------------------+  |
                                +------------------------+

Key registers: an address register (points to the memory location), a word-count register (number of words remaining, decremented each transfer), and a control register (read/write, mode). The DMA also has DMA request/acknowledge lines to the device and bus request (BR)/bus grant (BG) lines to the CPU.

Cycle-Stealing Mode

In cycle stealing, the DMA controller transfers one word per bus cycle and then returns control of the bus to the CPU. It "steals" occasional memory cycles from the CPU: whenever the device is ready, DMA grabs the bus for a single cycle, moves one word, and releases it, so the CPU is only slowed slightly rather than halted. (Contrast with burst/block mode, where DMA holds the bus for the entire block.)

How DMA Gains the Bus

  1. The I/O device asserts DMA request to the controller.
  2. The DMA controller raises Bus Request (BR) to the CPU.
  3. The CPU finishes its current bus cycle, floats (tri-states) its address, data and control lines, and asserts Bus Grant (BG).
  4. The DMA controller now masters the bus, sends DMA acknowledge to the device, and performs the transfer directly between the device and memory.
  5. On completion (word count = 0) it drops BR, the CPU regains the bus, and the DMA may raise an interrupt to signal completion.

Marking: 2 for diagram + registers; 2 for cycle stealing; 1 for BR/BG bus-acquisition sequence.

io-organizationdma
9short6 marks

(a) State Flynn's classification of computer architectures (SISD, SIMD, MISD, MIMD) with one example of each. [4]

(b) Differentiate between tightly coupled and loosely coupled multiprocessor systems. [2]

(a) Flynn's Classification [4]

Based on the number of concurrent instruction and data streams:

  • SISD (Single Instruction, Single Data): one processor executes one instruction stream on one data stream. Example: a traditional uniprocessor (classic von Neumann PC).
  • SIMD (Single Instruction, Multiple Data): one instruction operates on many data elements in parallel. Example: vector/array processors, GPUs.
  • MISD (Multiple Instruction, Single Data): many instruction streams operate on the same data; rare. Example: systolic arrays / fault-tolerant pipelined systems (e.g. Space Shuttle flight control).
  • MIMD (Multiple Instruction, Multiple Data): multiple processors execute different instructions on different data. Example: multicore/multiprocessor systems, clusters.

(b) Tightly vs Loosely Coupled Multiprocessors [2]

Tightly coupledLoosely coupled
Share a common (global) main memoryEach processor has its own local memory (distributed)
Communicate through shared memory; fastCommunicate by message passing over an interconnection network; slower
Higher memory contention; smaller scaleScales to many nodes; e.g. clusters/distributed systems

Marking: 1 per correct class + example (4); 2 for the tightly/loosely coupled distinction.

parallel-processingflynn-classification
10short5 marks

Explain the organization of a stack in a CPU. Differentiate between a register stack and a memory stack, and show how the PUSH and POP operations modify the stack pointer (SP) in each case.

Stack Organization in a CPU

A stack is a Last-In-First-Out (LIFO) storage. A stack pointer (SP) register holds the address of the top of stack (TOS). Two operations: PUSH (insert) and POP (remove).

Register Stack vs Memory Stack

Register stackMemory stack
StorageA fixed set of high-speed registers (finite depth)A reserved region of main memory (RAM)
CapacitySmall, limited by hardwareLarge, limited only by allocated memory
SPCounts registers; FULL/EMPTY flags neededAn address register pointing into memory
SpeedVery fastSlower (memory access)

PUSH / POP and SP

Register stack (SP counts up on push; with FULL & EMTY flags):

PUSH:  SP ← SP + 1
       M[SP] ← DR        ; store data at new TOS
       if (SP = 0) FULL ← 1
       EMTY ← 0
POP:   DR ← M[SP]        ; read TOS
       SP ← SP - 1
       if (SP = 0) EMTY ← 1
       FULL ← 0

Memory stack (typical: stack grows toward lower addresses):

PUSH:  SP ← SP - 1
       M[SP] ← data
POP:   data ← M[SP]
       SP ← SP + 1

Thus in a memory stack PUSH decrements SP and POP increments it (for a downward-growing stack), whereas in the register stack of Mano's model PUSH increments and POP decrements SP.

Marking: 1 for stack/LIFO + SP; 2 for register vs memory differences; 2 for PUSH/POP SP updates.

cpu-organizationregister-organization
11short5 marks

Define virtual memory. Explain paging as an address-mapping technique and describe the function of the page table and the translation lookaside buffer (TLB) in speeding up address translation.

Virtual Memory

Virtual memory is a memory-management technique that gives a program the illusion of a very large, contiguous main memory by using main memory and secondary storage (disk) together. Programs use logical (virtual) addresses; only the actively used portions are kept in physical RAM, and the rest reside on disk, being brought in on demand. It lets programs larger than physical memory run, and provides protection/relocation.

Paging

In paging, the virtual address space is divided into fixed-size pages and physical memory into equal-size frames. A virtual address is split into a page number and a page offset:

virtual address=(page number, offset)\text{virtual address} = (\text{page number},\ \text{offset})

The page number indexes the page table to obtain a frame number; the physical address = (frame number, same offset). This maps any page to any frame, removing external fragmentation.

Page Table

The page table is a data structure (one entry per virtual page) that stores the frame number for each page plus control bits (valid/present, dirty, reference, protection). On each access the page number is used to look up the corresponding frame.

Translation Lookaside Buffer (TLB)

The page table itself lives in main memory, so a naive translation needs an extra memory access per reference. The TLB is a small, fast associative cache that stores recently used (page → frame) translations.

  • TLB hit: frame number obtained directly — no page-table access, fast translation.
  • TLB miss: the page table in memory is consulted and the new entry is loaded into the TLB.

Because of locality of reference, the TLB hit ratio is high, so address translation is sped up significantly.

Marking: 1 VM definition; 2 paging + page table; 2 TLB function and hit/miss.

memory-hierarchyvirtual-memory
12short4 marks

Compare RISC and CISC architectures with respect to instruction set size, addressing modes, instruction length, use of registers and control unit implementation. Give one example processor of each type.

RISC vs CISC

BasisRISCCISC
Instruction-set sizeSmall set of simple instructionsLarge set, many complex instructions
Addressing modesFew (load/store; mostly register)Many, complex addressing modes
Instruction lengthFixed length, easy to decode/pipelineVariable length
Use of registersLarge register file; memory accessed only by LOAD/STOREFewer registers; instructions can operate directly on memory
Control unitMostly hardwiredMostly microprogrammed
Execution / cyclesUsually 1 instruction per cycle; pipelining friendlyMultiple cycles per instruction
Code size vs hardwareLarger code, simpler hardwareCompact code, complex hardware
Example processorARM, MIPS, RISC-V, SPARCIntel x86 (8086/Pentium), Motorola 68000, VAX

Summary: RISC favors simple, uniform, fast instructions with compiler-managed registers and hardwired control; CISC packs complex operations into single instructions using microprogrammed control, reducing program size at the cost of decoding/hardware complexity.

Marking: ~0.5 per correctly contrasted attribute + 1 for correct example of each (ARM/MIPS vs x86).

instruction-set-architecturerisc-cisc

Frequently asked questions

Where can I find the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) question paper 2079?
The full BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper come with solutions?
Yes. Every question on this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper?
The BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2079 paper carries 80 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper free?
Yes — reading and attempting this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper on Kekkei is completely free.