BSc CSIT (TU) Science Computer Architecture (BSc CSIT, CSC208) Question Paper 2080 Nepal

Q: Where can I find the BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) question paper 2080?

The full BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2080 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Computer Architecture (BSc CSIT, CSC208) 2080 paper come with solutions?

Yes. Every question on this Computer Architecture (BSc CSIT, CSC208) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2080 paper?

The BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Computer Architecture (BSc CSIT, CSC208) past paper free?

Yes — reading and attempting this Computer Architecture (BSc CSIT, CSC208) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is cache memory? Explain the different cache mapping techniques (direct, associative and set-associative) with suitable diagrams.

memorycache

Answer 1

Cache Memory

Cache memory is a small, high-speed memory placed between the CPU and main memory (RAM). It stores copies of frequently accessed instructions and data so that the CPU can access them faster than fetching from slower main memory. It works on the principle of locality of reference (temporal and spatial locality).

When the CPU needs a word, it first checks the cache:

Cache hit — the word is found in cache (fast access).
Cache miss — the word is not in cache, so it is fetched from main memory and a block is copied into cache.

The ratio of hits to total references is the hit ratio, a key measure of cache performance.

Main memory is divided into blocks and cache into lines of the same size. A mapping function decides which main-memory block goes to which cache line.

Cache Mapping Techniques

1. Direct Mapping

Each main-memory block maps to exactly one fixed cache line, given by:

\text{Cache line} = (\text{Block number}) \bmod (\text{Number of cache lines})

The physical address is split into three fields:

Tag	Line (Index)	Word (Offset)

Word selects the byte/word within a block.
Line selects the cache line directly.
Tag is stored with the line and compared to verify the right block.

Diagram (in words): Each cache line has a tag field and a data block. The line field of the address indexes one line; its stored tag is compared with the address tag — equal means hit.

Advantage: Simple and cheap (one comparison). Disadvantage: High conflict misses — two blocks mapping to the same line repeatedly evict each other.

2. Associative (Fully Associative) Mapping

A main-memory block can be placed in any cache line. The address has only two fields:

Tag	Word

The tag must be compared with the tags of all cache lines simultaneously (using associative/content-addressable memory).

Diagram (in words): All cache-line tags are searched in parallel by comparators; any match is a hit.

Advantage: Most flexible, lowest conflict misses, best hit ratio. Disadvantage: Expensive hardware (many comparators) and needs a replacement algorithm (LRU/FIFO).

3. Set-Associative Mapping

A compromise between the two. The cache is divided into sets, each containing $k$ lines (a k-way set-associative cache). A block maps to one fixed set but can occupy any line within that set:

\text{Set number} = (\text{Block number}) \bmod (\text{Number of sets})

The address has three fields:

Tag	Set	Word

Diagram (in words): The set field indexes one set; the tag is compared in parallel with the $k$ lines of that set only.

Advantage: Lower conflict misses than direct mapping and far cheaper than fully associative ( $k$ comparators only). Most real caches use 2-way, 4-way or 8-way set-associative mapping.

Summary

Direct mapping is the special case $k=1$ ; fully associative is the case where the whole cache is one set. Set-associative balances cost and performance.

Answer 2

Pipelining

Pipelining is a technique of implementing instruction-level parallelism in which multiple instructions are overlapped in execution. The processor is divided into stages, each performing a part of an instruction, like an assembly line. While one instruction is being decoded, another can be fetched, and so on, increasing throughput (instructions completed per unit time).

For a $k$ -stage pipeline executing $n$ instructions, ideal speedup approaches $k$ :

\text{Speedup} = \frac{n \times k}{k + (n-1)} \to k \text{ for large } n

Instruction Pipeline Stages

A classic RISC pipeline has five stages:

IF (Instruction Fetch) — fetch the instruction from memory (using PC).
ID (Instruction Decode) — decode the opcode and read source registers.
EX (Execute) — perform the ALU operation or compute the effective address.
MEM (Memory Access) — read from or write to data memory (for load/store).
WB (Write Back) — write the result back to the register file.

In steady state one instruction finishes every clock cycle.

Pipeline Hazards and Solutions

Hazards prevent the next instruction from executing in its designated cycle.

1. Structural Hazard

Two instructions need the same hardware resource in the same cycle (e.g., a single memory port for IF and MEM).

Solution: Provide separate instruction and data memories/caches (Harvard organization) or duplicate resources; stall if unavoidable.

2. Data Hazard

An instruction depends on the result of a previous instruction still in the pipeline (RAW — read after write being the most common).

Solutions:
- Operand forwarding (bypassing): route the ALU result directly to a dependent instruction before write-back.
- Pipeline stalls / bubbles inserted by hardware.
- Compiler instruction reordering to fill the gap.

3. Control (Branch) Hazard

Caused by branch/jump instructions — the next instruction's address is not known until the branch is resolved.

Solutions:
- Branch prediction (static or dynamic) and speculative execution.
- Delayed branch (fill the delay slot with a useful instruction).
- Flushing wrongly-fetched instructions when a branch is taken.
- Compute branch target early in the pipeline to reduce the penalty.

Conclusion

Pipelining greatly improves throughput, but hazards limit the ideal speedup; forwarding, stalling, branch prediction and resource duplication are used to mitigate them.

Answer 3

Modes of Data Transfer between CPU and I/O

Data transfer between the CPU/memory and I/O devices occurs in three modes: programmed I/O, interrupt-driven I/O, and DMA.

1. Programmed I/O

The CPU is fully responsible for the transfer. It continuously polls (checks) the status flag of the I/O interface in a loop until the device is ready, then transfers the data word.

Steps:

CPU reads the device status register.
If not ready, it loops back (busy-waiting).
When ready, CPU reads/writes the data word through a CPU register.

Drawback: The CPU wastes most of its time busy-waiting; very inefficient. Suitable only for slow, simple systems.

2. Interrupt-Driven I/O

Instead of polling, the device interrupts the CPU when it is ready.

Steps:

CPU issues an I/O command and continues other work.
When the device is ready, it raises an interrupt request.
CPU finishes the current instruction, saves state (PC, registers), and jumps to the Interrupt Service Routine (ISR).
The ISR transfers the data word, then control returns to the interrupted program.

Advantage: CPU is free during the wait. Drawback: Each word still requires CPU intervention (one interrupt per word), so it is slow for high-speed bulk transfers.

3. Direct Memory Access (DMA)

A dedicated DMA controller transfers a whole block of data directly between memory and the I/O device without the CPU handling each word.

Steps:

CPU programs the DMA controller with the starting memory address, word count, and direction (read/write).
The DMA controller requests the system bus (bus request / HOLD); CPU grants it (bus grant / HLDA) and releases the bus.
DMA transfers the block word-by-word directly to/from memory (cycle stealing or burst mode).
On completion, the DMA controller interrupts the CPU to signal that the block transfer is done.

Advantage: Very fast, frees the CPU; ideal for disk and high-speed devices.

Comparison

Feature	Programmed I/O	Interrupt-driven I/O	DMA
CPU involvement	Highest (polling)	Per word (ISR)	Lowest (per block)
Speed	Slow	Moderate	Fast
Best for	Few/slow devices	Moderate-speed devices	Bulk/high-speed transfers

Answer 4

System Bus

A system bus is a set of parallel conducting wires (a shared communication pathway) that connects the CPU, main memory, and I/O devices, allowing them to transfer data, addresses, and control signals. It is logically divided into three functional groups:

1. Address Bus

Carries the address of the memory location or I/O port to be accessed.
Unidirectional (CPU → memory/I/O).
Its width determines the addressable memory size: an $n$ -line address bus can address $2^n$ locations.

2. Data Bus

Carries the actual data/instructions being transferred between CPU, memory, and I/O.
Bidirectional (data flows both ways).
Its width (e.g., 8, 16, 32, 64 bits) determines how many bits are transferred at once, affecting performance.

3. Control Bus

Carries control and timing signals that coordinate and manage operations, e.g., Memory Read, Memory Write, I/O Read, I/O Write, clock, interrupt, and bus-request/grant signals.
A mix of unidirectional lines that synchronise the devices.

Answer 5

Direct Memory Access (DMA)

DMA is a data-transfer technique in which a dedicated hardware unit called the DMA controller transfers a block of data directly between an I/O device and main memory without continuous CPU intervention. The CPU is only involved at the start and end of the transfer.

How DMA Transfers Data

Setup: The CPU programs the DMA controller registers with the starting memory address, the word/byte count (block size), and the direction (read or write).
Bus request: When the I/O device is ready, the DMA controller sends a bus request (HOLD) signal to the CPU.
Bus grant: The CPU completes its current bus cycle, releases the system bus (address, data, control), and responds with a bus grant (HLDA). The CPU is now temporarily idle on the bus.
Transfer: The DMA controller takes control of the buses and transfers data word-by-word directly between memory and the device, incrementing the address and decrementing the count after each word. Transfer can be by cycle stealing (one word at a time, returning the bus) or burst mode (entire block).
Completion: When the count reaches zero, the DMA controller releases the bus and sends an interrupt to the CPU indicating the transfer is complete.

Because the CPU does not handle each individual word, DMA is much faster and far more efficient than programmed or interrupt-driven I/O for large/high-speed transfers (e.g., disk, network).

Answer 6

RISC vs CISC

Feature	RISC (Reduced Instruction Set Computer)	CISC (Complex Instruction Set Computer)
Instruction set	Small, simple, fixed-length instructions	Large, complex, variable-length instructions
Instructions per program	More instructions needed	Fewer instructions (one does more)
Execution	Mostly single-cycle	Multi-cycle, varies per instruction
Addressing modes	Few, simple	Many, complex
Memory access	Only via load/store instructions	Memory operands allowed in many instructions
Control unit	Hardwired control	Usually microprogrammed control
Registers	Large number of registers	Fewer registers
Pipelining	Easy and efficient	Harder due to varying instruction lengths
Code size	Larger	Smaller (compact)
Examples	ARM, MIPS, SPARC, RISC-V	x86 (Intel/AMD), VAX, IBM System/360

Summary: RISC emphasises simple instructions executed quickly with a hardwired control unit and heavy use of registers and pipelining, putting complexity in the compiler. CISC emphasises powerful, complex instructions that reduce program length, putting complexity in the hardware/microcode.

Answer 7

Register Transfer Language (RTL)

Register Transfer Language is a symbolic notation used to describe the operations and information flow between the registers of a digital system. A micro-operation is an elementary operation performed on the data stored in registers during one clock pulse.

Notation

Registers are denoted by capital letters: R1, R2, AC, PC, MAR.
A transfer is shown with the replacement operator ←.
Conditional transfers use a control function with :.

Example of a conditional transfer (transfer R2 into R1 only when control signal P = 1):

P: R1 ← R2

Types of Micro-operations with Examples

1. Register transfer micro-operation — move data between registers:

R1 ← R2          ; copy contents of R2 into R1

2. Arithmetic micro-operations — add, subtract, increment, decrement:

R3 ← R1 + R2     ; addition
R3 ← R1 + R2' + 1 ; subtraction using 2's complement
R1 ← R1 + 1      ; increment

3. Logic micro-operations — bitwise AND, OR, XOR, complement:

R1 ← R1 ∧ R2     ; bitwise AND
R1 ← R1 ∨ R2     ; bitwise OR
R1 ← R1'         ; complement

4. Shift micro-operations — logical, circular, arithmetic shifts:

R1 ← shl R1      ; shift left
R1 ← shr R1      ; shift right

RTL provides a concise, hardware-independent way to specify the internal organisation and control sequences of a computer.

Answer 8

Restoring Division of Unsigned Integers

Restoring division performs binary division of an $n$ -bit dividend by a divisor using repeated subtraction and shifting, with a register A (initially 0, the partial remainder), register Q (the dividend, ends holding the quotient), and register M (the divisor).

Algorithm

Repeat $n$ times (n = number of dividend bits):

Shift left the combined register pair A,Q by one bit.
Subtract: A ← A − M.
Test sign of A:
- If A ≥ 0 (sign bit 0): set the least-significant bit Q0 ← 1.
- If A < 0 (sign bit 1): restore by A ← A + M and set Q0 ← 0.

After $n$ iterations: Q = quotient, A = remainder.

Example: 7 ÷ 3 (4-bit)

Dividend Q = 0111, Divisor M = 0011 (so −M = 1101 in 2's complement). Start A = 0000.

Step	Operation	A	Q
Init		0000	0111
1	Shift left	0000	111_
	A − M	1101	—
	A<0 → restore, Q0=0	0000	1110
2	Shift left	0001	110_
	A − M	1110	—
	A<0 → restore, Q0=0	0001	1100
3	Shift left	0011	100_
	A − M	0000	—
	A≥0 → Q0=1	0000	1001
4	Shift left	0001	001_
	A − M	1110	—
	A<0 → restore, Q0=0	0001	0010

Result: Quotient Q = 0010 = 2, Remainder A = 0001 = 1.

Check: $7 = 3 \times 2 + 1$ . ✓

Answer 9

Microprogrammed Control

In a microprogrammed control unit, the control signals required to execute each instruction are generated by reading microinstructions stored in a special memory called the control memory (control store). Each machine instruction corresponds to a microprogram — a sequence of microinstructions. This contrasts with hardwired control, where control signals are produced by fixed logic gates.

Key Components

Control Memory (ROM): stores all microprograms.
Control Address Register (CAR): holds the address of the next microinstruction.
Control Data Register (CDR): holds the microinstruction currently being executed.
Sequencer / Next-address generator: determines the address of the next microinstruction (sequential, branch, or based on instruction opcode).

Operation: CAR addresses control memory → the microinstruction is read into CDR → its control field activates the control signals → the sequencer updates CAR for the next microinstruction.

Microinstruction Format

A typical microinstruction contains the following fields:

Field	Purpose
Control / Micro-operation field	Bits specifying which control signals are activated (the micro-operations to perform)
Condition (CD) field	Specifies the status/condition to test for branching (e.g., zero, carry, unconditional)
Branch (BR) field	Type of branch/next-address selection (jump, call, return, map)
Address field	The address in control memory of the next microinstruction if a branch is taken

Advantages: flexible, easy to modify/design, supports complex instructions. Disadvantage: slower than hardwired control due to control-memory access.

Answer 10

Virtual Memory

Virtual memory is a memory-management technique that gives a program the illusion of a very large, contiguous main memory by using a combination of RAM and secondary storage (disk). Only the parts of a program currently in use are kept in physical RAM; the rest reside on disk and are brought in on demand. This allows programs larger than physical memory to run and provides isolation between processes.

The addresses generated by the CPU are logical (virtual) addresses, which are translated to physical addresses by the Memory Management Unit (MMU).

Paging

Paging is a virtual-memory scheme that removes the need for contiguous allocation:

The virtual address space is divided into fixed-size blocks called pages.
Physical memory is divided into blocks of the same size called frames.
A page table maps each virtual page number to a physical frame number.

Address Translation

A virtual address is split into a page number and an offset:

Virtual address = | Page number | Offset |

The page number indexes the page table to get the frame number; the frame number combined with the offset gives the physical address.

Page hit: the page is in a frame → access proceeds.
Page fault: the page is on disk → the OS loads it into a free/replaced frame, updates the page table, and retries.

A TLB (Translation Lookaside Buffer) caches recent page-table entries to speed up translation. Paging eliminates external fragmentation but can cause internal fragmentation in the last page.

Answer 11

Instruction Format

An instruction format defines the layout of bits in a machine instruction. It specifies the fields that the control unit interprets. A typical instruction contains:

Opcode	Operand / Address field(s)	Addressing mode

Opcode (operation code): specifies the operation to perform (ADD, LOAD, JMP, etc.).
Operand/Address field: specifies the data or the address of the operand(s).
Mode field: specifies how the operand address is interpreted (direct, indirect, indexed, etc.).

The number of address fields gives rise to the following instruction types.

Types of Instructions by Number of Addresses

1. Three-address instruction — two source operands and a destination.

ADD R1, R2, R3   ; R1 ← R2 + R3

For X = (A+B)*(C+D): ADD T1,A,B ADD T2,C,D MUL X,T1,T2. Shorter programs but long instructions.

2. Two-address instruction — one operand is both source and destination.

ADD R1, R2       ; R1 ← R1 + R2

Most common in real machines.

3. One-address instruction — uses an implied accumulator (AC).

LOAD A           ; AC ← A
ADD  B           ; AC ← AC + B
STORE X          ; X ← AC

4. Zero-address instruction — operands are implicit on a stack (stack-based machine).

PUSH A
PUSH B
ADD              ; pops A, B, pushes A+B
POP X

As the number of address fields decreases, instructions become shorter but more instructions are needed to perform the same task.

Answer 12

Set-Associative Mapping

Set-associative mapping is a cache mapping technique that combines the features of direct and fully associative mapping. The cache is divided into a number of sets, and each set contains $k$ lines. This is called a k-way set-associative cache.

A main-memory block maps to one specific set (like direct mapping):

\text{Set number} = (\text{Block number}) \bmod (\text{Number of sets})

Within that set, the block may be placed in any of the $k$ lines (like associative mapping).

The physical address is divided into three fields:

Tag	Set	Word (offset)

On access, the set field selects one set, and the tag is compared in parallel with the tags of all $k$ lines in that set only. A match is a hit.

Example

Suppose:

Cache = 8 lines, 2-way set-associative ⇒ number of sets = 8 / 2 = 4 sets.
Main memory has 256 blocks.

Then block number $B$ maps to set $= B \bmod 4$ .

Block 0 → set 0, Block 1 → set 1, Block 4 → set 0, Block 5 → set 1, etc.

Block 4 can be stored in either of the 2 lines of set 0. So blocks 0 and 4 (both mapping to set 0) can reside in cache simultaneously — which would conflict in direct mapping. This reduces conflict misses.

Advantages: Fewer conflict misses than direct mapping; far cheaper than fully associative (only $k$ comparators). A replacement policy (e.g., LRU) selects the victim line within the set. Most modern caches are 2-, 4-, or 8-way set-associative.

Level	BSc CSIT (TU)
Stream	Science
Subject	Computer Architecture (BSc CSIT, CSC208)
Year	2080 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

Section A: Long Answer Questions

Cache Memory

Cache Mapping Techniques

1. Direct Mapping

2. Associative (Fully Associative) Mapping

3. Set-Associative Mapping

Summary

Pipelining

Instruction Pipeline Stages

Pipeline Hazards and Solutions

1. Structural Hazard

2. Data Hazard

3. Control (Branch) Hazard

Conclusion

Modes of Data Transfer between CPU and I/O

1. Programmed I/O

2. Interrupt-Driven I/O

3. Direct Memory Access (DMA)

Comparison

Section B: Short Answer Questions

System Bus

1. Address Bus

2. Data Bus

3. Control Bus

Direct Memory Access (DMA)

How DMA Transfers Data

RISC vs CISC

Register Transfer Language (RTL)

Notation

Types of Micro-operations with Examples

Restoring Division of Unsigned Integers

Algorithm

Example: 7 ÷ 3 (4-bit)

Microprogrammed Control

Key Components

Microinstruction Format

Virtual Memory

Paging

Address Translation

Instruction Format

Types of Instructions by Number of Addresses

Set-Associative Mapping

Example

Frequently asked questions