BSc CSIT (TU) Science Computer Architecture (BSc CSIT, CSC208) Question Paper 2075 Nepal

Q: Where can I find the BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) question paper 2075?

The full BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2075 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Computer Architecture (BSc CSIT, CSC208) 2075 paper come with solutions?

Yes. Every question on this Computer Architecture (BSc CSIT, CSC208) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2075 paper?

The BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Computer Architecture (BSc CSIT, CSC208) past paper free?

Yes — reading and attempting this Computer Architecture (BSc CSIT, CSC208) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is cache memory? Explain the different cache mapping techniques (direct, associative and set-associative) with suitable diagrams.

memorycache

Answer 1

Cache Memory

Cache memory is a small, fast (usually SRAM) memory placed between the CPU and main memory. It holds copies of the most frequently/recently used instructions and data so the CPU can access them faster than going to slower main memory. It works on the principle of locality of reference (temporal and spatial locality).

When the CPU needs a word, it first checks the cache:

Cache hit: word is present, served quickly.
Cache miss: word is fetched from main memory (a whole block/line) and stored in cache.

The mapping function decides where a main-memory block is placed in the cache.

1. Direct Mapping

Each main-memory block maps to exactly one fixed cache line:

\text{cache line} = (\text{block number}) \bmod (\text{number of cache lines})

The address is split into three fields:

Tag	Line (Index)	Word (Offset)

Simple and cheap; only one tag comparison is needed.
Drawback: high conflict misses — two active blocks mapping to the same line keep evicting each other.

Diagram (in words): main memory blocks 0, m, 2m, … all map to cache line 0; blocks 1, m+1, … map to line 1, and so on (m = number of cache lines).

2. Associative (Fully Associative) Mapping

A block can be placed in any cache line. The address has only two fields:

Tag	Word (Offset)

On access, the tag is compared with all lines in parallel (using associative/CAM hardware).
Most flexible, lowest conflict misses; needs a replacement policy (LRU, FIFO, random).
Drawback: expensive — requires a comparator for every line.

3. Set-Associative Mapping

A compromise: the cache is divided into sets, each containing k lines (k-way set-associative). A block maps to one fixed set but can occupy any line within that set:

\text{set number} = (\text{block number}) \bmod (\text{number of sets})

Address fields:

Tag	Set (Index)	Word (Offset)

Only k tags (within the set) are compared in parallel.
Balances cost and performance; widely used (e.g. 4-way, 8-way).
Special cases: 1-way = direct mapping; k = total lines = fully associative.

Summary

Technique	Block placement	Comparators	Conflict misses
Direct	One fixed line	1	High
Associative	Any line	All lines	Lowest
Set-associative	Any line in one set	k (set size)	Moderate (low)

Answer 2

Pipelining

Pipelining is a technique of overlapping the execution of multiple instructions by dividing the processing into independent stages. Each stage works on a different instruction simultaneously, like an assembly line, increasing instruction throughput (instructions completed per unit time) without reducing the latency of a single instruction.

Instruction Pipeline Stages

A classic 5-stage RISC instruction pipeline:

IF (Instruction Fetch) – fetch instruction from memory using PC.
ID (Instruction Decode) – decode opcode and read source registers.
EX (Execute) – ALU performs the operation / computes address.
MEM (Memory Access) – read from or write to data memory.
WB (Write Back) – write the result back into the register file.

In an ideal pipeline of k stages, after the pipeline fills, one instruction completes every clock cycle. For n instructions and k stages:

\text{Total cycles} = k + (n-1), \qquad \text{Speedup} \approx k \;(\text{ideal})

Pipeline Hazards and Solutions

Hazards prevent the next instruction from executing in its designated cycle.

1. Structural Hazard

Two instructions need the same hardware resource in the same cycle (e.g. one memory for both fetch and data access).

Solutions: add resources (separate instruction and data memories/caches — Harvard organization), or stall.

2. Data Hazard

An instruction depends on the result of a previous instruction still in the pipeline (RAW – read after write):

ADD R1, R2, R3   ; produces R1
SUB R4, R1, R5   ; needs R1

Solutions: forwarding/bypassing (route ALU result directly to the next stage), pipeline stalls/bubbles, and compiler instruction reordering.

3. Control (Branch) Hazard

The outcome/target of a branch is not known until later stages, so the next fetch is uncertain.

Solutions: branch prediction (static/dynamic), delayed branch (fill delay slot), flushing mispredicted instructions, and computing the branch target early.

Conclusion

Pipelining boosts throughput but introduces hazards; forwarding, stalling, and branch prediction are used to keep the pipeline full and correct.

Answer 3

Modes of Data Transfer between CPU and I/O

Data transfer between the processor and I/O devices is handled in three main modes.

1. Programmed I/O

The CPU continuously controls the entire transfer. It executes a program that polls the device status flag in a loop:

LOOP: read status register
      if (flag == 0) goto LOOP   ; busy-wait / polling
      read/write data word

Every word passes through the CPU register.
Advantage: simple hardware.
Disadvantage: CPU is kept busy in a polling loop and wastes time — very inefficient for slow devices.

2. Interrupt-Driven I/O

The CPU issues the I/O command and then continues other work. When the device is ready, it raises an interrupt signal.

The CPU finishes the current instruction, saves its state (PC, registers), and runs the Interrupt Service Routine (ISR) to transfer the word, then resumes.
Removes wasteful polling, so the CPU is freed during the wait.
Disadvantage: each word still requires CPU intervention (interrupt + ISR overhead), costly for large/fast transfers.

3. Direct Memory Access (DMA)

A dedicated DMA controller transfers a whole block of data directly between memory and the I/O device without the CPU handling each word.

Steps:

CPU initializes the DMA controller with the starting memory address, word count, and direction (read/write).
DMA requests the system bus via HOLD/Bus Request; CPU grants control (HLDA/Bus Grant) and releases the bus (cycle stealing or burst mode).
DMA transfers data block directly to/from memory.
On completion, DMA interrupts the CPU to signal the transfer is done.

Advantage: fastest, very low CPU overhead — ideal for high-speed devices (disk, network).

Comparison

Mode	CPU involvement	Speed	Use
Programmed I/O	Full (polling)	Slow	Simple/slow devices
Interrupt-driven	Per word (ISR)	Medium	Moderate-speed devices
DMA	Only setup + end	Fast	Block transfer, high-speed

Answer 4

System Bus

A system bus is a shared set of parallel conducting lines (wires) that interconnect the CPU, main memory, and I/O modules, allowing them to exchange data, addresses and control signals. It is broadly divided into three functional buses:

Address Bus – carries the address of the memory location or I/O port to be accessed. It is unidirectional (CPU → memory/I/O). Its width determines the addressable memory size; e.g. an n-bit address bus can address $2^n$ locations.
Data Bus – carries the actual data/instructions being transferred. It is bidirectional (data flows both to and from the CPU). Its width (e.g. 8, 16, 32, 64 bits) determines how many bits are transferred at once, affecting performance.
Control Bus – carries control and timing signals that coordinate the operations, such as Memory Read, Memory Write, I/O Read, I/O Write, clock, interrupt request, and bus grant. It is mostly bidirectional (individual lines unidirectional).

Answer 5

DMA (Direct Memory Access)

DMA is a data-transfer technique in which a dedicated DMA controller transfers a block of data directly between an I/O device and main memory without the CPU processing each word, greatly reducing CPU overhead for high-speed transfers.

How DMA Transfers Data

Initialization: The CPU programs the DMA controller with the starting memory address, the number of words (count) to transfer, and the direction (read/write), then continues with other tasks.
Bus request: When the I/O device is ready, the DMA controller requests the system bus by asserting HOLD / Bus Request.
Bus grant: The CPU completes its current bus cycle, releases the address, data and control buses, and acknowledges with HLDA / Bus Grant. The CPU is now temporarily disconnected from the bus.
Transfer: The DMA controller takes over the buses and transfers data directly between the device and memory, incrementing the address and decrementing the count for each word. This is done in cycle-stealing or burst mode.
Completion: When the count reaches zero, the DMA controller releases the bus and sends an interrupt to inform the CPU that the transfer is complete.

Thus the CPU is involved only at the start (setup) and end (interrupt), making DMA much faster than programmed or interrupt-driven I/O for block transfers.

Answer 6

RISC vs CISC

Feature	RISC	CISC
Instruction set	Small, simple instructions	Large, complex instructions
Instruction length	Fixed length	Variable length
Execution time	Mostly single clock cycle	Multiple clock cycles
Addressing modes	Few	Many
Memory access	Only via load/store instructions	Many instructions access memory directly
Registers	Large number of registers	Fewer registers
Control unit	Mostly hardwired	Mostly microprogrammed
Pipelining	Easy and efficient	Harder due to varying instructions
Code size	Larger (more instructions)	Smaller (fewer, denser instructions)
Complexity	In compiler/software	In hardware/microcode
Examples	ARM, MIPS, SPARC, RISC-V	Intel x86, VAX

Summary: RISC emphasizes a small set of simple, fixed-length, single-cycle, register-based instructions that pipeline well, shifting complexity to the compiler. CISC provides powerful, variable-length instructions that reduce program size but complicate the hardware.

Answer 7

Register Transfer Language (RTL)

Register Transfer Language (RTL) is a symbolic notation used to describe the micro-operations — the elementary operations performed on data stored in registers — and the flow of information between registers in a digital system. Registers are denoted by capital letters (e.g. R1, PC, MAR) and a transfer is shown with the replacement/assignment operator ←.

General form of a conditional transfer:

P:\; R2 \leftarrow R1

meaning “if control condition P = 1, copy the contents of R1 into R2.”

Types of Micro-operations with Examples

1. Register-transfer micro-operations – move data between registers:

R2 \leftarrow R1

2. Arithmetic micro-operations – add, subtract, increment, decrement, complement:

R3 \leftarrow R1 + R2, \qquad R1 \leftarrow R1 + 1 \;(\text{increment})

3. Logic micro-operations – bitwise AND, OR, XOR, NOT:

R1 \leftarrow R1 \land R2 \;(\text{AND}), \qquad R1 \leftarrow \overline{R1}\;(\text{complement})

4. Shift micro-operations – shift register contents left/right (logical, circular, arithmetic):

R1 \leftarrow shl\,R1, \qquad R2 \leftarrow shr\,R2

RTL is hardware-independent and is used to specify the internal operations of a CPU precisely.

Answer 8

Restoring Division (Unsigned Integers)

Restoring division divides an unsigned dividend by an unsigned divisor using repeated shift and subtract. It uses three registers: A (initially 0, holds partial remainder), Q (holds the dividend, finally the quotient) and M (the divisor). For n-bit numbers, repeat n times:

Algorithm

1. Initialize A = 0, Q = dividend, M = divisor, count = n
2. Repeat n times:
   a. Shift left (A, Q) by one bit
   b. A = A - M            (subtract divisor)
   c. If A < 0 (MSB = 1):  // failed
         Q0 = 0
         A = A + M         // RESTORE
      Else:                // succeeded
         Q0 = 1
3. After n steps: Q = quotient, A = remainder

Example: 7 ÷ 3 (4-bit, n = 4)

Dividend Q = 0111, Divisor M = 0011, A = 0000.

Step	Operation	A	Q	Q0
Init		0000	0111
1	shift; A−M=0000−0011=1101 (<0) → restore	0000	1110	0
2	shift; A−M=0001−0011=1110 (<0) → restore	0001	1100	0
3	shift; A−M=0011−0011=0000 (≥0)	0000	1001	1
4	shift; A−M=0001−0011=1110 (<0) → restore	0001	0011	0

Result: Quotient Q = 0010 = 2, Remainder A = 0001 = 1. Check: $3\times 2 + 1 = 7$ . ✓

Answer 9

Microprogrammed Control

A control unit generates the control signals that drive all micro-operations of the CPU. In a microprogrammed control unit, these control signals are not generated by fixed hardwired logic but are stored as microinstructions (a sequence of bits) in a special fast memory called the control memory (control store).

Each machine instruction corresponds to a microprogram — a sequence of microinstructions.
A Control Address Register (CAR) holds the address of the next microinstruction; the Control Data Register (CDR) holds the microinstruction currently being executed.
Executing a microinstruction means activating the control lines specified by its bits, then determining the address of the next microinstruction (via a sequencer).

Advantages: flexible, easy to design and modify (just change the microprogram). Disadvantage: slower than hardwired control.

Microinstruction Format

A microinstruction is typically divided into fields:

Control / Micro-operation field	Condition (select) field	Branch field	Next-address field

Control / micro-operation field: bits that specify which control signals/micro-operations to activate (may be horizontal — one bit per signal, or vertical — encoded).
Condition field: selects the status flag/condition to test for branching.
Branch field: specifies the type of branch (jump, call, map, etc.).
Address field: gives the address of the next microinstruction in control memory.

Horizontal microinstructions have long, mostly-unencoded fields (fast, more memory); vertical ones use encoded fields (compact, slower due to decoding).

Answer 10

Virtual Memory

Virtual memory is a memory-management technique that gives a program the illusion of a very large, contiguous main memory by using secondary storage (disk) as an extension of physical RAM. Programs use virtual (logical) addresses, which the system translates into physical addresses; only the actively needed portions of a program are kept in RAM while the rest stays on disk.

Benefits: programs larger than physical memory can run, better multiprogramming, and memory protection/relocation.

Paging

Paging is the most common implementation of virtual memory:

The virtual address space is divided into fixed-size blocks called pages.
The physical memory is divided into blocks of the same size called frames.
A page table maps each virtual page number to a physical frame number.

A virtual address is split as:

Page Number	Offset

The page number indexes the page table to obtain the frame number; the offset is added to locate the exact word.

If the referenced page is not in memory, a page fault occurs; the OS loads the page from disk into a free frame (replacing one using LRU/FIFO if needed) and updates the page table.
A TLB (Translation Lookaside Buffer) caches recent page-table entries to speed up address translation.

Paging eliminates external fragmentation and allows non-contiguous allocation of a program in physical memory.

Answer 11

Instruction Format

An instruction format defines the layout of bits in a machine instruction. A typical instruction is divided into fields:

Opcode	Mode	Operand / Address field(s)

Opcode – specifies the operation to be performed (e.g. ADD, LOAD).
Mode – specifies the addressing mode (how operands are interpreted).
Address/operand field(s) – specify the location(s) of operands or the data itself.

Types of Instructions Based on Number of Addresses

1. Three-address instruction – specifies two source operands and one destination:

ADD R1, R2, R3   ; R1 ← R2 + R3

Short programs, but long instructions.

2. Two-address instruction – one operand serves as both a source and the destination:

ADD R1, R2       ; R1 ← R1 + R2

Most common in commercial machines.

3. One-address instruction – uses an implied accumulator (AC); only one operand is specified:

LOAD A           ; AC ← M[A]
ADD  B           ; AC ← AC + M[B]
STORE C          ; M[C] ← AC

4. Zero-address instruction – operands are implied on a stack (used in stack machines):

PUSH A
PUSH B
ADD              ; TOS ← (TOS) + (TOS-1)
POP  C

Fewer addresses give shorter instructions but require more instructions to perform the same task.

Answer 12

Set-Associative Mapping

In set-associative mapping, the cache is divided into a number of sets, and each set contains a fixed number of lines, k (this is called a k-way set-associative cache). A main-memory block maps to one specific set (determined by a modulo function) but can be placed in any line within that set. It combines the advantages of direct mapping (simple indexing) and associative mapping (flexible placement).

\text{set number} = (\text{block number}) \bmod (\text{number of sets})

The memory address is divided into three fields:

Tag	Set (index)	Word (offset)

On access, the set field selects a set, and the tag is compared in parallel with the tags of all k lines only in that set. A hit occurs if any matches.

Example

Suppose a cache has 8 lines organized as 2-way set-associative ⇒ number of sets $= 8 / 2 = 4$ sets. Block placement uses:

\text{set} = (\text{block number}) \bmod 4

Block 0 → set 0, Block 1 → set 1, Block 2 → set 2, Block 3 → set 3
Block 4 → set 0, Block 5 → set 1, …

So blocks 0, 4, 8, 12, … all map to set 0, but since each set has 2 lines, two of these blocks can reside in the cache at the same time without conflict. A replacement policy (e.g. LRU) chooses which line to evict when the set is full.

Advantage: fewer conflict misses than direct mapping, and cheaper than fully associative (only k tag comparisons per access).

Level	BSc CSIT (TU)
Stream	Science
Subject	Computer Architecture (BSc CSIT, CSC208)
Year	2075 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions