Browse papers
A

Section A: Long Answer Questions

Attempt all / any as specified.

4 questions
1long12 marks

(a) Differentiate between hardwired control unit and microprogrammed control unit on the basis of speed, flexibility, implementation cost and ease of modification. [6]

(b) Design a hardwired control unit for a simple processor. Draw the block diagram showing the instruction register, sequence counter, instruction decoder and control logic gates, and explain how the control signals are generated for a fetch cycle. [6]

(a) Hardwired vs Microprogrammed Control Unit

BasisHardwired Control UnitMicroprogrammed Control Unit
SpeedFaster — control signals are produced directly by combinational logic, so no extra memory fetch is needed.Slower — each control word (microinstruction) must be fetched from the control memory before signals are issued.
FlexibilityInflexible — implemented as fixed gates, flip-flops and decoders.Highly flexible — the instruction set/behaviour can be changed simply by rewriting the microprogram.
Implementation costCheaper for a small instruction set, but design/wiring cost rises sharply as complexity grows.More economical for complex/large instruction sets because the control logic is replaced by a regular control ROM.
Ease of modificationDifficult — any change requires redesigning the hardware circuit.Easy — modify or add microroutines in control memory; no hardware rewiring.

In short, hardwired control suits simple, high-speed RISC machines, whereas microprogrammed control suits complex CISC machines where ease of design and modification matter more than raw speed.

(b) Design of a Hardwired Control Unit (Fetch Cycle)

Block diagram (described):

        +------------------+
        | Instruction Reg  |  (IR) -- holds current opcode
        +--------+---------+
                 | opcode bits
                 v
        +------------------+
        | Instruction      |
        | Decoder (n->2^n) |  --> D0, D1, ... Dk decoded outputs
        +------------------+
                 |
  +-----------+   |   +------------------+
  | Sequence  |---+-->|  Control Logic   |--> control signals
  | Counter   |       |  Gates (AND/OR)  |    (Read, Write, LD-IR,
  | (SC) T0..Tn       +------------------+     INR-PC, bus select...)
  +-----------+              ^
        ^                    |
        | timing             | status flags / clock
      Clock

Components

  • Instruction Register (IR): stores the opcode of the fetched instruction.
  • Instruction Decoder: decodes the opcode into individual operation lines D0,D1,D_0,D_1,\dots.
  • Sequence Counter (SC): a counter decoded into timing signals T0,T1,T2,T_0,T_1,T_2,\dots that mark successive clock steps; it is incremented each clock and cleared at the end of a cycle.
  • Control Logic Gates: combinational AND/OR gates that combine the decoded operation lines, the timing signals and status flags to produce the actual control signals.

How control signals are generated for the fetch cycle

The fetch cycle is implemented over timing steps T0,T1,T2T_0, T_1, T_2:

T0:  MARPCT_0: \; MAR \leftarrow PC T1:  MBRM[MAR],PCPC+1T_1: \; MBR \leftarrow M[MAR], \quad PC \leftarrow PC + 1 T2:  IRMBRT_2: \; IR \leftarrow MBR

Each control signal is the logical product of the relevant timing line. For example, the signal that loads the IR is asserted by the gate term T2T_2, the memory-read signal by T1T_1, and the PC-increment signal by T1T_1. Because these depend only on the timing counter (and not yet on the opcode), the fetch logic is fixed combinational hardware — exactly the hardwired approach. After T2T_2, control passes to the decode/execute steps where the decoded opcode lines DiD_i are ANDed with later timing signals to generate execution control signals.

control-unitmicroprogrammed-control
2long12 marks

(a) Explain the principle of locality of reference and how the memory hierarchy exploits it to bridge the speed gap between the CPU and main memory. [4]

(b) A computer has a 256 KB main memory and a 2 KB cache with a block size of 16 bytes. Show the address format (number of tag, line/set and word bits) for (i) direct mapping and (ii) 4-way set-associative mapping. [8]

(a) Locality of Reference and the Memory Hierarchy

Locality of reference is the observed tendency of programs to access a relatively small portion of their address space at any given time. It has two forms:

  • Temporal locality: a memory location that has been referenced recently is likely to be referenced again soon (e.g. loop variables, instructions inside a loop).
  • Spatial locality: if a location is referenced, nearby locations are likely to be referenced soon (e.g. sequential instruction fetch, array traversal).

How the hierarchy exploits it: Memory is arranged in levels — registers → cache (SRAM) → main memory (DRAM) → secondary storage — with speed and cost decreasing down the hierarchy. By keeping recently used (temporal) data and whole blocks of neighbouring (spatial) data in the small, fast upper levels, most accesses are served quickly from cache, so the CPU rarely waits for slow main memory. This makes the effective access time close to that of the fast level while the capacity is close to that of the slow level, bridging the CPU–memory speed gap.

(b) Address Format

Given: Main memory = 256 KB =218= 2^{18} bytes → physical address = 18 bits. Cache = 2 KB =211= 2^{11} bytes. Block size = 16 bytes =24= 2^{4}word/offset = 4 bits.

Number of cache blocks (lines) =2KB16B=21124=27=128= \dfrac{2\text{KB}}{16\text{B}} = \dfrac{2^{11}}{2^{4}} = 2^{7} = 128 lines.

(i) Direct Mapping

  • Line bits =log2128=7= \log_2 128 = 7 bits
  • Word bits =4= 4 bits
  • Tag bits =1874=7= 18 - 7 - 4 = 7 bits
TagLineWord
774

(ii) 4-way Set-Associative Mapping

Number of sets =linesways=1284=32=25= \dfrac{\text{lines}}{\text{ways}} = \dfrac{128}{4} = 32 = 2^{5}.

  • Set bits =log232=5= \log_2 32 = 5 bits
  • Word bits =4= 4 bits
  • Tag bits =1854=9= 18 - 5 - 4 = 9 bits
TagSetWord
954
cache-memorymemory-hierarchy
3long12 marks

(a) What is instruction pipelining? Describe the different segments of a four-stage instruction pipeline (FI, DA, FO, EX) with the help of a space-time diagram. [6]

(b) Define pipeline hazards. Explain data hazards and control hazards, and discuss any two techniques used to handle the control hazard caused by branch instructions. [6]

(a) Instruction Pipelining and the Four-Stage Pipeline

Instruction pipelining is a technique that overlaps the execution of successive instructions by dividing the instruction cycle into independent stages, each handled by a dedicated hardware segment. While one instruction is in one stage, the next instruction occupies the previous stage, so several instructions are processed simultaneously, increasing throughput.

Four segments:

  1. FI (Fetch Instruction): fetch the instruction from memory into the instruction register.
  2. DA (Decode & Calculate Address): decode the opcode and compute the effective operand address.
  3. FO (Fetch Operand): read the operand from memory/registers.
  4. EX (Execute): perform the operation in the ALU and store the result.

Space–time diagram (rows = pipeline stages, columns = clock cycles); I1,I2,I3,I4I_1,I_2,I_3,I_4 are instructions:

Stage \ Clock1234567
FII1I2I3I4
DAI1I2I3I4
FOI1I2I3I4
EXI1I2I3I4

Without a pipeline, 4 instructions ×\times 4 stages =16= 16 cycles; with the pipeline they finish in 4+(41)=74 + (4-1) = 7 cycles. In general nn instructions through a kk-stage pipeline take k+(n1)k + (n-1) cycles, giving a speedup approaching kk for large nn.

(b) Pipeline Hazards

A pipeline hazard is a situation that prevents the next instruction from executing in its designated clock cycle, stalling the pipeline.

  • Data hazard: arises when an instruction depends on the result of a previous instruction that has not yet completed (e.g. ADD R1,R2,R3 followed by SUB R4,R1,R5). The operand of the second instruction is not ready, causing a read-after-write dependency.
  • Control (branch) hazard: arises from branch/jump instructions. Until the branch outcome is known, the pipeline does not know which instructions to fetch next, so wrongly-fetched instructions may have to be discarded.

Two techniques to handle control hazards:

  1. Branch prediction: the hardware predicts whether the branch will be taken (static, e.g. always-not-taken, or dynamic using a branch-history/prediction table) and prefetches accordingly. If the prediction is correct, no penalty occurs; if wrong, the speculatively fetched instructions are flushed.
  2. Delayed branch: the instruction(s) immediately following the branch (the branch delay slot) are always executed regardless of the branch outcome. The compiler fills the slot with a useful, independent instruction, so the pipeline does no wasted work. (Other valid techniques: branch-target buffer/prefetching both paths, or pipeline stall/flush.)
pipelininginstruction-pipeline
4long12 marks

(a) Draw the hardware block diagram for Booth's multiplication algorithm and explain its operation. [6]

(b) Multiply the signed numbers (-5) and (+3) using Booth's algorithm, showing the contents of the registers (A, Q, Q₋₁) at each step. [6]

(a) Booth's Multiplication — Hardware and Operation

Hardware block diagram (described): A multiplicand register M (n bits), a register pair A (accumulator, initially 0) and Q (holds the multiplier), an extra 1-bit register Q₋₁ appended to the right of Q, an n-bit ALU/adder–subtractor feeding A, a control/sequence counter holding the bit count, and a combined right-shift path over the A-Q-Q₋₁ register set.

        +----+        +-------------------+
        | M  |------->|  Adder/Subtractor |---> A
        +----+        +---------+---------+
                                ^
                                |
   [ A | Q | Q-1 ]  <-- arithmetic right shift
        |   |   |
        +---+---+--> control examines Q0,Q-1

Operation: Booth's algorithm multiplies two signed (2's-complement) numbers by examining the pair of bits (Q0,Q1)(Q_0, Q_{-1}) at each step:

  • 0000 or 1111 → no arithmetic, just shift.
  • 0101AA+MA \leftarrow A + M (end of a string of 1s), then shift.
  • 1010AAMA \leftarrow A - M (start of a string of 1s), then shift.

After each step, an arithmetic right shift is applied to the combined register AQQ1A\,Q\,Q_{-1} (sign bit preserved). The process repeats nn times (n = number of multiplier bits). The final 2n2n-bit product appears in A:QA:Q. It handles signed numbers directly and skips runs of identical bits, making it efficient.

(b) Multiply (-5) × (+3) using Booth's Algorithm

Use 4-bit 2's-complement (n=4n=4).

  • Multiplicand M=+3=0011M = +3 = 0011, M=1101-M = 1101.
  • Multiplier Q=5=1011Q = -5 = 1011.
  • A=0000A = 0000, Q1=0Q_{-1}=0, count =4=4.
StepOperationAQQ₋₁
Init000010110
1Q0Q1=10A=AMQ_0Q_{-1}=10 \Rightarrow A=A-M110110110
ASR111011011
2Q0Q1=11Q_0Q_{-1}=11 \Rightarrow shift111011011
ASR111101101
3Q0Q1=01A=A+MQ_0Q_{-1}=01 \Rightarrow A=A+M001001101
ASR000100110
4Q0Q1=10A=AMQ_0Q_{-1}=10 \Rightarrow A=A-M111000110
ASR111100011

Result: A:Q=11110001A:Q = 1111\,0001 (8-bit 2's-complement).

This equals (00001111)=15-(0000\,1111) = -15.

Check: (5)×(+3)=15(-5)\times(+3) = -15. ✓

alu-organizationarithmetic-algorithms
B

Section B: Short Answer Questions

Attempt all / any as specified.

8 questions
5short6 marks

Explain the following addressing modes with a suitable example of each, and state one practical use of each: (a) immediate, (b) register indirect, (c) indexed, (d) relative addressing mode.

Addressing Modes

(a) Immediate addressing: the operand itself is contained in the instruction (no memory reference for the operand).

  • Example: MOV R1, #5 → loads the constant 5 into R1.
  • Use: initialising registers with constants and defining counters/loop limits.

(b) Register indirect addressing: the instruction names a register that holds the address of the operand; the operand is in memory at that address.

  • Example: MOV R1, (R2) → R2 holds an address, the data at that address is loaded into R1.
  • Use: accessing data through pointers and traversing data structures (linked lists, arrays).

(c) Indexed addressing: the effective address == base address in the instruction ++ contents of an index register: EA=A+(Index)EA = A + (Index).

  • Example: LDA A(X) → loads M[A+X]M[A + X].
  • Use: sequential access to array/table elements by incrementing the index register.

(d) Relative addressing: the effective address == contents of the Program Counter ++ the (signed) address field: EA=(PC)+AEA = (PC) + A.

  • Example: BRANCH +6 → jump to a location 6 words ahead of the current PC.
  • Use: writing position-independent (relocatable) code, especially branch/jump targets.
addressing-modes
6short6 marks

Differentiate between RISC and CISC architectures with respect to instruction format, addressing modes, execution and the use of registers. Give one example processor of each type.

RISC vs CISC

FeatureRISCCISC
Instruction formatFixed-length, uniform, simple instructions.Variable-length, complex instructions of differing sizes.
Addressing modesFew, simple addressing modes (mostly load/store).Many, complex addressing modes.
ExecutionMostly single-cycle execution; register-to-register operations; memory accessed only by LOAD/STORE; heavily pipelined; control is usually hardwired.Multi-cycle instructions; instructions can operate directly on memory operands; control is usually microprogrammed.
Use of registersLarge set of general-purpose registers; operands kept in registers to reduce memory traffic.Fewer general-purpose registers; relies more on memory.
Code size / compilerLarger code, more instructions, more work for the compiler.More compact code, more work done in hardware.

Example processors: RISC — ARM / MIPS; CISC — Intel x86 (8086/Pentium).

instruction-set-architecture
7short6 marks

What is an interrupt-driven I/O? With the help of a flowchart, explain how the CPU handles an interrupt request issued by an I/O device, and compare it with programmed I/O in terms of CPU utilization.

Interrupt-Driven I/O

Definition: In interrupt-driven I/O, the CPU does not continuously poll the device. Instead, when an I/O device becomes ready, it raises an interrupt request to the CPU. The CPU finishes its current instruction, suspends the running program, and executes an Interrupt Service Routine (ISR) to transfer the data, then resumes the interrupted program.

Flowchart of CPU handling an interrupt (described):

   Execute current instruction
              |
              v
   Check interrupt line at end of instruction
              |
        Interrupt? --No--> continue normal execution
              | Yes
              v
   Save PC and processor status (push onto stack)
              |
              v
   Acknowledge interrupt / identify device (vector)
              |
              v
   Load PC with ISR start address; execute ISR (transfer data)
              |
              v
   Restore PC and status (pop from stack)
              |
              v
   Resume interrupted program

Comparison with programmed I/O (CPU utilisation): In programmed I/O the CPU repeatedly tests the device status flag in a busy-wait loop, so it is fully occupied waiting and cannot do useful work while the slow device prepares data — wasting CPU time. In interrupt-driven I/O the CPU is interrupted only when the device is actually ready, so between requests it is free to run other programs. This gives far better CPU utilisation and overlap of computation with I/O, at the small cost of interrupt-handling overhead.

io-organizationinterrupt
8short6 marks

Explain the working of Direct Memory Access (DMA) with a block diagram. Describe the cycle-stealing and burst transfer modes of DMA operation.

Direct Memory Access (DMA)

Working: DMA allows an I/O device to transfer a block of data directly to/from main memory without continuous CPU involvement. A DMA controller (DMAC) takes over the system bus from the CPU to perform the transfer. The CPU only initialises the controller by loading: the memory address register (starting address), the word/byte count register (block size), and the control register (read/write direction, start). The DMAC then:

  1. Requests the bus by asserting HOLD (bus request) to the CPU.
  2. The CPU completes its current bus cycle and replies with HLDA (bus grant), releasing the bus.
  3. The DMAC transfers data directly between the device and memory, incrementing the address and decrementing the count after each word.
  4. When the count reaches zero, the DMAC releases the bus and raises an interrupt to inform the CPU that the transfer is complete.

Block diagram (described):

  CPU <--HOLD/HLDA--> DMA Controller <--DMA req/ack--> I/O Device
   |                       |                              
   +------ Address / Data / Control Bus ------+----- Main Memory

The DMA controller contains an address register, a count register, a control/status register and data lines, and connects to the bus alongside the CPU and memory.

Transfer modes:

  • Cycle stealing: the DMAC transfers one word at a time, "stealing" a single bus cycle from the CPU for each word and then returning bus control to the CPU. The CPU is slowed slightly but can still execute between transfers. Suitable for slower devices.
  • Burst (block) transfer: the DMAC holds the bus and transfers the entire block of words in one continuous burst before releasing the bus. This is fast and efficient for high-speed devices (e.g. disk), but the CPU is completely halted from using the bus for the duration of the burst.
io-organizationdma
9short6 marks

State and explain Flynn's classification of parallel computer architectures (SISD, SIMD, MISD, MIMD) with a representative example or application of each category.

Flynn's Classification

Flynn classifies computer architectures by the multiplicity of instruction streams and data streams:

1. SISD (Single Instruction, Single Data): one processing unit executes a single instruction stream on a single data stream. This is the classical sequential (von Neumann) uniprocessor; instruction-level pipelining may be used. Example: a traditional single-core PC / classic CPU.

2. SIMD (Single Instruction, Multiple Data): one control unit issues the same instruction to many processing elements, each operating on its own data. Ideal for data-parallel work. Example: vector/array processors, GPUs; applications like image processing and matrix operations.

3. MISD (Multiple Instruction, Single Data): many instruction streams act on the same data stream. It is mostly theoretical and rarely built. Example/application: fault-tolerant systems where several units process the same data and results are compared (e.g. systolic arrays are sometimes cited).

4. MIMD (Multiple Instruction, Multiple Data): multiple processors each execute their own instruction stream on their own data stream independently. Most modern parallel systems fall here. Example: multicore processors, multiprocessors and distributed clusters; general-purpose parallel/distributed computing.

parallel-processingflynn-taxonomy
10short6 marks

(a) List the major registers of a basic computer CPU and state the function of the Program Counter (PC), Memory Address Register (MAR) and Instruction Register (IR). [3]

(b) Write the register transfer microoperations for the fetch phase of an instruction. [3]

(a) Major CPU Registers

Major registers of a basic computer CPU: PC (Program Counter), MAR/AR (Memory Address Register), MBR/DR (Memory Buffer/Data Register), IR (Instruction Register), AC (Accumulator), TR (Temporary Register), and the general-purpose registers.

  • Program Counter (PC): holds the memory address of the next instruction to be fetched; it is incremented after each fetch.
  • Memory Address Register (MAR): holds the address of the memory location to be read from or written to; it is connected to the address bus.
  • Instruction Register (IR): holds the current instruction (opcode) being decoded and executed.

(b) Fetch-Phase Microoperations

The register-transfer microoperations for the fetch phase, over timing steps T0,T1,T2T_0, T_1, T_2:

T0:MARPCT_0: \quad MAR \leftarrow PC T1:MBRM[MAR],PCPC+1T_1: \quad MBR \leftarrow M[MAR], \quad PC \leftarrow PC + 1 T2:IRMBRT_2: \quad IR \leftarrow MBR

That is, the PC supplies the address, the addressed memory word is read into the buffer while the PC is incremented to point to the next instruction, and finally the instruction is copied into the IR for decoding.

cpu-organizationregister-transfer
11short6 marks

A processor has a cache with a hit ratio of 0.92. The cache access time is 10 ns and the main memory access time is 100 ns. (a) Compute the average memory access time. [3] (b) Explain the write-through and write-back cache write policies and state one advantage of each. [3]

(a) Average Memory Access Time

Given hit ratio h=0.92h = 0.92, cache time tc=10t_c = 10 ns, memory time tm=100t_m = 100 ns.

Using the miss-penalty model where a miss costs cache time plus memory time:

Tavg=htc+(1h)(tc+tm)T_{avg} = h \cdot t_c + (1-h)\,(t_c + t_m) Tavg=0.92(10)+0.08(10+100)=9.2+0.08(110)=9.2+8.8=18 nsT_{avg} = 0.92(10) + 0.08(10 + 100) = 9.2 + 0.08(110) = 9.2 + 8.8 = \mathbf{18\ ns}

(If the simpler model Tavg=htc+(1h)tmT_{avg} = h\,t_c + (1-h)\,t_m is assumed: 0.92(10)+0.08(100)=9.2+8=17.20.92(10) + 0.08(100) = 9.2 + 8 = 17.2 ns.)

(b) Write Policies

  • Write-through: every write updates both the cache and main memory at the same time. Advantage: main memory always holds the current (consistent) data, making it simple and reliable, which is important in multiprocessor/DMA environments.
  • Write-back (copy-back): a write updates only the cache; the modified ("dirty") block is written to main memory only when it is evicted. Advantage: far less memory traffic, since multiple writes to the same block cause only one memory update — giving higher performance.
cache-memorycache-performance
12short6 marks

Write short notes on any two of the following: (a) Vector processing and array processors, (b) Hypercube interconnection network, (c) Amdahl's law and its significance in parallel computing.

Short Notes (any two)

(a) Vector Processing and Array Processors

A vector processor has instructions that operate on entire vectors (one-dimensional arrays) of data with a single instruction, using deeply pipelined arithmetic units so successive element operations overlap. It exploits data parallelism and reduces instruction-fetch/loop overhead (one vector instruction replaces an inner loop). An array processor is an SIMD machine consisting of many identical processing elements (PEs), each with its own local memory, all executing the same instruction broadcast by a single control unit but on different data. Vector processors pipeline operations in time; array processors replicate hardware in space. Both are used for scientific/matrix computations, signal/image processing and simulations.

(b) Hypercube Interconnection Network

A hypercube of dimension nn connects N=2nN = 2^{n} nodes, where each node has a unique nn-bit address. Two nodes are directly connected iff their binary addresses differ in exactly one bit. Hence each node has exactly nn neighbours (degree =n=log2N= n = \log_2 N), and the maximum distance (diameter) between any two nodes is n=log2Nn = \log_2 N, since routing simply flips the differing address bits one at a time. It offers good scalability, low diameter and multiple redundant paths (fault tolerance), and is widely used in massively parallel computers (e.g. nCUBE).

(c) Amdahl's Law

Amdahl's law gives the maximum speedup achievable by parallelising a program with pp processors when a fraction ff of the work is inherently sequential:

Speedup=1f+1fp\text{Speedup} = \frac{1}{f + \dfrac{1-f}{p}}

As pp \to \infty, the speedup is bounded by 1f\dfrac{1}{f}. Significance: the serial fraction limits the benefit of adding processors — even a small non-parallelisable portion caps the attainable speedup (e.g. if f=0.1f = 0.1, maximum speedup is only 10×10\times no matter how many processors are used). It guides designers to minimise the sequential part rather than just adding hardware.

parallel-processinginterconnection-network

Frequently asked questions

Where can I find the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) question paper 2078?
The full BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2078 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2078 paper come with solutions?
Yes. Every question on this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2078 paper?
The BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2078 paper carries 80 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper free?
Yes — reading and attempting this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper on Kekkei is completely free.