BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) Question Paper 2078 Nepal
This is the official BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) question paper for 2078, as set in the regular annual examination. It carries 80 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) exam or solving previous years' question papers, this 2078 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt all / any as specified.
(a) Differentiate between hardwired control unit and microprogrammed control unit on the basis of speed, flexibility, implementation cost and ease of modification. [6]
(b) Design a hardwired control unit for a simple processor. Draw the block diagram showing the instruction register, sequence counter, instruction decoder and control logic gates, and explain how the control signals are generated for a fetch cycle. [6]
(a) Hardwired vs Microprogrammed Control Unit
| Basis | Hardwired Control Unit | Microprogrammed Control Unit |
|---|---|---|
| Speed | Faster — control signals are produced directly by combinational logic, so no extra memory fetch is needed. | Slower — each control word (microinstruction) must be fetched from the control memory before signals are issued. |
| Flexibility | Inflexible — implemented as fixed gates, flip-flops and decoders. | Highly flexible — the instruction set/behaviour can be changed simply by rewriting the microprogram. |
| Implementation cost | Cheaper for a small instruction set, but design/wiring cost rises sharply as complexity grows. | More economical for complex/large instruction sets because the control logic is replaced by a regular control ROM. |
| Ease of modification | Difficult — any change requires redesigning the hardware circuit. | Easy — modify or add microroutines in control memory; no hardware rewiring. |
In short, hardwired control suits simple, high-speed RISC machines, whereas microprogrammed control suits complex CISC machines where ease of design and modification matter more than raw speed.
(b) Design of a Hardwired Control Unit (Fetch Cycle)
Block diagram (described):
+------------------+
| Instruction Reg | (IR) -- holds current opcode
+--------+---------+
| opcode bits
v
+------------------+
| Instruction |
| Decoder (n->2^n) | --> D0, D1, ... Dk decoded outputs
+------------------+
|
+-----------+ | +------------------+
| Sequence |---+-->| Control Logic |--> control signals
| Counter | | Gates (AND/OR) | (Read, Write, LD-IR,
| (SC) T0..Tn +------------------+ INR-PC, bus select...)
+-----------+ ^
^ |
| timing | status flags / clock
Clock
Components
- Instruction Register (IR): stores the opcode of the fetched instruction.
- Instruction Decoder: decodes the opcode into individual operation lines .
- Sequence Counter (SC): a counter decoded into timing signals that mark successive clock steps; it is incremented each clock and cleared at the end of a cycle.
- Control Logic Gates: combinational AND/OR gates that combine the decoded operation lines, the timing signals and status flags to produce the actual control signals.
How control signals are generated for the fetch cycle
The fetch cycle is implemented over timing steps :
Each control signal is the logical product of the relevant timing line. For example, the signal that loads the IR is asserted by the gate term , the memory-read signal by , and the PC-increment signal by . Because these depend only on the timing counter (and not yet on the opcode), the fetch logic is fixed combinational hardware — exactly the hardwired approach. After , control passes to the decode/execute steps where the decoded opcode lines are ANDed with later timing signals to generate execution control signals.
(a) Explain the principle of locality of reference and how the memory hierarchy exploits it to bridge the speed gap between the CPU and main memory. [4]
(b) A computer has a 256 KB main memory and a 2 KB cache with a block size of 16 bytes. Show the address format (number of tag, line/set and word bits) for (i) direct mapping and (ii) 4-way set-associative mapping. [8]
(a) Locality of Reference and the Memory Hierarchy
Locality of reference is the observed tendency of programs to access a relatively small portion of their address space at any given time. It has two forms:
- Temporal locality: a memory location that has been referenced recently is likely to be referenced again soon (e.g. loop variables, instructions inside a loop).
- Spatial locality: if a location is referenced, nearby locations are likely to be referenced soon (e.g. sequential instruction fetch, array traversal).
How the hierarchy exploits it: Memory is arranged in levels — registers → cache (SRAM) → main memory (DRAM) → secondary storage — with speed and cost decreasing down the hierarchy. By keeping recently used (temporal) data and whole blocks of neighbouring (spatial) data in the small, fast upper levels, most accesses are served quickly from cache, so the CPU rarely waits for slow main memory. This makes the effective access time close to that of the fast level while the capacity is close to that of the slow level, bridging the CPU–memory speed gap.
(b) Address Format
Given: Main memory = 256 KB bytes → physical address = 18 bits. Cache = 2 KB bytes. Block size = 16 bytes → word/offset = 4 bits.
Number of cache blocks (lines) lines.
(i) Direct Mapping
- Line bits bits
- Word bits bits
- Tag bits bits
| Tag | Line | Word |
|---|---|---|
| 7 | 7 | 4 |
(ii) 4-way Set-Associative Mapping
Number of sets .
- Set bits bits
- Word bits bits
- Tag bits bits
| Tag | Set | Word |
|---|---|---|
| 9 | 5 | 4 |
(a) What is instruction pipelining? Describe the different segments of a four-stage instruction pipeline (FI, DA, FO, EX) with the help of a space-time diagram. [6]
(b) Define pipeline hazards. Explain data hazards and control hazards, and discuss any two techniques used to handle the control hazard caused by branch instructions. [6]
(a) Instruction Pipelining and the Four-Stage Pipeline
Instruction pipelining is a technique that overlaps the execution of successive instructions by dividing the instruction cycle into independent stages, each handled by a dedicated hardware segment. While one instruction is in one stage, the next instruction occupies the previous stage, so several instructions are processed simultaneously, increasing throughput.
Four segments:
- FI (Fetch Instruction): fetch the instruction from memory into the instruction register.
- DA (Decode & Calculate Address): decode the opcode and compute the effective operand address.
- FO (Fetch Operand): read the operand from memory/registers.
- EX (Execute): perform the operation in the ALU and store the result.
Space–time diagram (rows = pipeline stages, columns = clock cycles); are instructions:
| Stage \ Clock | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| FI | I1 | I2 | I3 | I4 | |||
| DA | I1 | I2 | I3 | I4 | |||
| FO | I1 | I2 | I3 | I4 | |||
| EX | I1 | I2 | I3 | I4 |
Without a pipeline, 4 instructions 4 stages cycles; with the pipeline they finish in cycles. In general instructions through a -stage pipeline take cycles, giving a speedup approaching for large .
(b) Pipeline Hazards
A pipeline hazard is a situation that prevents the next instruction from executing in its designated clock cycle, stalling the pipeline.
- Data hazard: arises when an instruction depends on the result of a previous instruction that has not yet completed (e.g.
ADD R1,R2,R3followed bySUB R4,R1,R5). The operand of the second instruction is not ready, causing a read-after-write dependency. - Control (branch) hazard: arises from branch/jump instructions. Until the branch outcome is known, the pipeline does not know which instructions to fetch next, so wrongly-fetched instructions may have to be discarded.
Two techniques to handle control hazards:
- Branch prediction: the hardware predicts whether the branch will be taken (static, e.g. always-not-taken, or dynamic using a branch-history/prediction table) and prefetches accordingly. If the prediction is correct, no penalty occurs; if wrong, the speculatively fetched instructions are flushed.
- Delayed branch: the instruction(s) immediately following the branch (the branch delay slot) are always executed regardless of the branch outcome. The compiler fills the slot with a useful, independent instruction, so the pipeline does no wasted work. (Other valid techniques: branch-target buffer/prefetching both paths, or pipeline stall/flush.)
(a) Draw the hardware block diagram for Booth's multiplication algorithm and explain its operation. [6]
(b) Multiply the signed numbers (-5) and (+3) using Booth's algorithm, showing the contents of the registers (A, Q, Q₋₁) at each step. [6]
(a) Booth's Multiplication — Hardware and Operation
Hardware block diagram (described): A multiplicand register M (n bits), a register pair A (accumulator, initially 0) and Q (holds the multiplier), an extra 1-bit register Q₋₁ appended to the right of Q, an n-bit ALU/adder–subtractor feeding A, a control/sequence counter holding the bit count, and a combined right-shift path over the A-Q-Q₋₁ register set.
+----+ +-------------------+
| M |------->| Adder/Subtractor |---> A
+----+ +---------+---------+
^
|
[ A | Q | Q-1 ] <-- arithmetic right shift
| | |
+---+---+--> control examines Q0,Q-1
Operation: Booth's algorithm multiplies two signed (2's-complement) numbers by examining the pair of bits at each step:
- or → no arithmetic, just shift.
- → (end of a string of 1s), then shift.
- → (start of a string of 1s), then shift.
After each step, an arithmetic right shift is applied to the combined register (sign bit preserved). The process repeats times (n = number of multiplier bits). The final -bit product appears in . It handles signed numbers directly and skips runs of identical bits, making it efficient.
(b) Multiply (-5) × (+3) using Booth's Algorithm
Use 4-bit 2's-complement ().
- Multiplicand , .
- Multiplier .
- , , count .
| Step | Operation | A | Q | Q₋₁ |
|---|---|---|---|---|
| Init | 0000 | 1011 | 0 | |
| 1 | 1101 | 1011 | 0 | |
| ASR | 1110 | 1101 | 1 | |
| 2 | shift | 1110 | 1101 | 1 |
| ASR | 1111 | 0110 | 1 | |
| 3 | 0010 | 0110 | 1 | |
| ASR | 0001 | 0011 | 0 | |
| 4 | 1110 | 0011 | 0 | |
| ASR | 1111 | 0001 | 1 |
Result: (8-bit 2's-complement).
This equals .
Check: . ✓
Section B: Short Answer Questions
Attempt all / any as specified.
Explain the following addressing modes with a suitable example of each, and state one practical use of each: (a) immediate, (b) register indirect, (c) indexed, (d) relative addressing mode.
Addressing Modes
(a) Immediate addressing: the operand itself is contained in the instruction (no memory reference for the operand).
- Example:
MOV R1, #5→ loads the constant 5 into R1. - Use: initialising registers with constants and defining counters/loop limits.
(b) Register indirect addressing: the instruction names a register that holds the address of the operand; the operand is in memory at that address.
- Example:
MOV R1, (R2)→ R2 holds an address, the data at that address is loaded into R1. - Use: accessing data through pointers and traversing data structures (linked lists, arrays).
(c) Indexed addressing: the effective address base address in the instruction contents of an index register: .
- Example:
LDA A(X)→ loads . - Use: sequential access to array/table elements by incrementing the index register.
(d) Relative addressing: the effective address contents of the Program Counter the (signed) address field: .
- Example:
BRANCH +6→ jump to a location 6 words ahead of the current PC. - Use: writing position-independent (relocatable) code, especially branch/jump targets.
Differentiate between RISC and CISC architectures with respect to instruction format, addressing modes, execution and the use of registers. Give one example processor of each type.
RISC vs CISC
| Feature | RISC | CISC |
|---|---|---|
| Instruction format | Fixed-length, uniform, simple instructions. | Variable-length, complex instructions of differing sizes. |
| Addressing modes | Few, simple addressing modes (mostly load/store). | Many, complex addressing modes. |
| Execution | Mostly single-cycle execution; register-to-register operations; memory accessed only by LOAD/STORE; heavily pipelined; control is usually hardwired. | Multi-cycle instructions; instructions can operate directly on memory operands; control is usually microprogrammed. |
| Use of registers | Large set of general-purpose registers; operands kept in registers to reduce memory traffic. | Fewer general-purpose registers; relies more on memory. |
| Code size / compiler | Larger code, more instructions, more work for the compiler. | More compact code, more work done in hardware. |
Example processors: RISC — ARM / MIPS; CISC — Intel x86 (8086/Pentium).
What is an interrupt-driven I/O? With the help of a flowchart, explain how the CPU handles an interrupt request issued by an I/O device, and compare it with programmed I/O in terms of CPU utilization.
Interrupt-Driven I/O
Definition: In interrupt-driven I/O, the CPU does not continuously poll the device. Instead, when an I/O device becomes ready, it raises an interrupt request to the CPU. The CPU finishes its current instruction, suspends the running program, and executes an Interrupt Service Routine (ISR) to transfer the data, then resumes the interrupted program.
Flowchart of CPU handling an interrupt (described):
Execute current instruction
|
v
Check interrupt line at end of instruction
|
Interrupt? --No--> continue normal execution
| Yes
v
Save PC and processor status (push onto stack)
|
v
Acknowledge interrupt / identify device (vector)
|
v
Load PC with ISR start address; execute ISR (transfer data)
|
v
Restore PC and status (pop from stack)
|
v
Resume interrupted program
Comparison with programmed I/O (CPU utilisation): In programmed I/O the CPU repeatedly tests the device status flag in a busy-wait loop, so it is fully occupied waiting and cannot do useful work while the slow device prepares data — wasting CPU time. In interrupt-driven I/O the CPU is interrupted only when the device is actually ready, so between requests it is free to run other programs. This gives far better CPU utilisation and overlap of computation with I/O, at the small cost of interrupt-handling overhead.
Explain the working of Direct Memory Access (DMA) with a block diagram. Describe the cycle-stealing and burst transfer modes of DMA operation.
Direct Memory Access (DMA)
Working: DMA allows an I/O device to transfer a block of data directly to/from main memory without continuous CPU involvement. A DMA controller (DMAC) takes over the system bus from the CPU to perform the transfer. The CPU only initialises the controller by loading: the memory address register (starting address), the word/byte count register (block size), and the control register (read/write direction, start). The DMAC then:
- Requests the bus by asserting HOLD (bus request) to the CPU.
- The CPU completes its current bus cycle and replies with HLDA (bus grant), releasing the bus.
- The DMAC transfers data directly between the device and memory, incrementing the address and decrementing the count after each word.
- When the count reaches zero, the DMAC releases the bus and raises an interrupt to inform the CPU that the transfer is complete.
Block diagram (described):
CPU <--HOLD/HLDA--> DMA Controller <--DMA req/ack--> I/O Device
| |
+------ Address / Data / Control Bus ------+----- Main Memory
The DMA controller contains an address register, a count register, a control/status register and data lines, and connects to the bus alongside the CPU and memory.
Transfer modes:
- Cycle stealing: the DMAC transfers one word at a time, "stealing" a single bus cycle from the CPU for each word and then returning bus control to the CPU. The CPU is slowed slightly but can still execute between transfers. Suitable for slower devices.
- Burst (block) transfer: the DMAC holds the bus and transfers the entire block of words in one continuous burst before releasing the bus. This is fast and efficient for high-speed devices (e.g. disk), but the CPU is completely halted from using the bus for the duration of the burst.
State and explain Flynn's classification of parallel computer architectures (SISD, SIMD, MISD, MIMD) with a representative example or application of each category.
Flynn's Classification
Flynn classifies computer architectures by the multiplicity of instruction streams and data streams:
1. SISD (Single Instruction, Single Data): one processing unit executes a single instruction stream on a single data stream. This is the classical sequential (von Neumann) uniprocessor; instruction-level pipelining may be used. Example: a traditional single-core PC / classic CPU.
2. SIMD (Single Instruction, Multiple Data): one control unit issues the same instruction to many processing elements, each operating on its own data. Ideal for data-parallel work. Example: vector/array processors, GPUs; applications like image processing and matrix operations.
3. MISD (Multiple Instruction, Single Data): many instruction streams act on the same data stream. It is mostly theoretical and rarely built. Example/application: fault-tolerant systems where several units process the same data and results are compared (e.g. systolic arrays are sometimes cited).
4. MIMD (Multiple Instruction, Multiple Data): multiple processors each execute their own instruction stream on their own data stream independently. Most modern parallel systems fall here. Example: multicore processors, multiprocessors and distributed clusters; general-purpose parallel/distributed computing.
(a) List the major registers of a basic computer CPU and state the function of the Program Counter (PC), Memory Address Register (MAR) and Instruction Register (IR). [3]
(b) Write the register transfer microoperations for the fetch phase of an instruction. [3]
(a) Major CPU Registers
Major registers of a basic computer CPU: PC (Program Counter), MAR/AR (Memory Address Register), MBR/DR (Memory Buffer/Data Register), IR (Instruction Register), AC (Accumulator), TR (Temporary Register), and the general-purpose registers.
- Program Counter (PC): holds the memory address of the next instruction to be fetched; it is incremented after each fetch.
- Memory Address Register (MAR): holds the address of the memory location to be read from or written to; it is connected to the address bus.
- Instruction Register (IR): holds the current instruction (opcode) being decoded and executed.
(b) Fetch-Phase Microoperations
The register-transfer microoperations for the fetch phase, over timing steps :
That is, the PC supplies the address, the addressed memory word is read into the buffer while the PC is incremented to point to the next instruction, and finally the instruction is copied into the IR for decoding.
A processor has a cache with a hit ratio of 0.92. The cache access time is 10 ns and the main memory access time is 100 ns. (a) Compute the average memory access time. [3] (b) Explain the write-through and write-back cache write policies and state one advantage of each. [3]
(a) Average Memory Access Time
Given hit ratio , cache time ns, memory time ns.
Using the miss-penalty model where a miss costs cache time plus memory time:
(If the simpler model is assumed: ns.)
(b) Write Policies
- Write-through: every write updates both the cache and main memory at the same time. Advantage: main memory always holds the current (consistent) data, making it simple and reliable, which is important in multiprocessor/DMA environments.
- Write-back (copy-back): a write updates only the cache; the modified ("dirty") block is written to main memory only when it is evicted. Advantage: far less memory traffic, since multiple writes to the same block cause only one memory update — giving higher performance.
Write short notes on any two of the following: (a) Vector processing and array processors, (b) Hypercube interconnection network, (c) Amdahl's law and its significance in parallel computing.
Short Notes (any two)
(a) Vector Processing and Array Processors
A vector processor has instructions that operate on entire vectors (one-dimensional arrays) of data with a single instruction, using deeply pipelined arithmetic units so successive element operations overlap. It exploits data parallelism and reduces instruction-fetch/loop overhead (one vector instruction replaces an inner loop). An array processor is an SIMD machine consisting of many identical processing elements (PEs), each with its own local memory, all executing the same instruction broadcast by a single control unit but on different data. Vector processors pipeline operations in time; array processors replicate hardware in space. Both are used for scientific/matrix computations, signal/image processing and simulations.
(b) Hypercube Interconnection Network
A hypercube of dimension connects nodes, where each node has a unique -bit address. Two nodes are directly connected iff their binary addresses differ in exactly one bit. Hence each node has exactly neighbours (degree ), and the maximum distance (diameter) between any two nodes is , since routing simply flips the differing address bits one at a time. It offers good scalability, low diameter and multiple redundant paths (fault tolerance), and is widely used in massively parallel computers (e.g. nCUBE).
(c) Amdahl's Law
Amdahl's law gives the maximum speedup achievable by parallelising a program with processors when a fraction of the work is inherently sequential:
As , the speedup is bounded by . Significance: the serial fraction limits the benefit of adding processors — even a small non-parallelisable portion caps the attainable speedup (e.g. if , maximum speedup is only no matter how many processors are used). It guides designers to minimise the sequential part rather than just adding hardware.
Frequently asked questions
- Where can I find the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) question paper 2078?
- The full BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2078 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2078 paper come with solutions?
- Yes. Every question on this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2078 paper?
- The BE Computer Engineering (IOE, TU) Computer Organization and Architecture (IOE, CT 603 / ENCT 303) 2078 paper carries 80 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper free?
- Yes — reading and attempting this Computer Organization and Architecture (IOE, CT 603 / ENCT 303) past paper on Kekkei is completely free.