BSc CSIT (TU) Science Computer Architecture (BSc CSIT, CSC208) Question Paper 2081 Nepal

Q: Where can I find the BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) question paper 2081?

The full BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2081 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Computer Architecture (BSc CSIT, CSC208) 2081 paper come with solutions?

Yes. Every question on this Computer Architecture (BSc CSIT, CSC208) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2081 paper?

The BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Computer Architecture (BSc CSIT, CSC208) past paper free?

Yes — reading and attempting this Computer Architecture (BSc CSIT, CSC208) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Explain the different modes of data transfer between CPU and I/O devices: programmed I/O, interrupt-driven I/O and DMA.

iointerrupt

Answer 1

Modes of Data Transfer Between CPU and I/O Devices

Data transfer between the CPU/memory and peripheral I/O devices can be carried out in three modes.

1. Programmed I/O

In programmed I/O, the data transfer is controlled entirely by instructions executed by the CPU. The processor continuously checks (polls) the status flag of the interface to know whether the device is ready.

The CPU reads the status register; if the flag is not set, it loops (busy-waiting).
When the flag indicates the device is ready, the CPU executes a transfer of one word/byte and clears the flag.
Drawback: The CPU is kept busy in the polling loop and wastes processing time. Suitable only for slow devices and simple systems.

LOOP: read status register
      if flag = 0 goto LOOP   ; wait
      read/write data
      clear flag

2. Interrupt-Driven I/O

Here the CPU does not poll. Instead, the device interface sends an interrupt signal to the CPU when it is ready for transfer.

The CPU continues with other useful work.
On receiving the interrupt, the CPU completes the current instruction, saves the program counter and status (PSW) on the stack, and branches to an Interrupt Service Routine (ISR).
The ISR performs the single word transfer, then control returns to the interrupted program.
Advantage: No CPU time is wasted in waiting. Drawback: Each transfer still passes through the CPU and involves context-saving overhead, so it is inefficient for high-speed bulk transfer.

3. Direct Memory Access (DMA)

For high-speed block transfer, a DMA controller transfers data directly between memory and the I/O device without CPU involvement for each word.

The CPU initializes the DMA controller with the starting memory address, word count (block size) and direction (read/write), then continues other work.
The DMA controller requests the bus from the CPU using bus request (HOLD); the CPU grants the bus via bus grant (HLDA) — this is called cycle stealing.
The DMA controller transfers the whole block, incrementing the address and decrementing the count, and finally raises an interrupt to inform the CPU that the transfer is complete.
Advantage: Very fast; CPU is freed during the block transfer. Used for disk, network and memory-to-memory transfers.

Comparison

Feature	Programmed I/O	Interrupt-driven I/O	DMA
CPU involvement	Continuous (polling)	Per word (ISR)	Only at start/end
CPU efficiency	Lowest	Moderate	Highest
Speed	Slow	Medium	Fast (block)
Hardware cost	Least	Moderate	DMA controller needed

Answer 2

Basic Organization of a CPU

The Central Processing Unit (CPU) is the part of the computer that fetches, decodes and executes instructions. It is organized into three major parts:

Register Set – fast storage that holds operands, addresses and intermediate results during processing.
Arithmetic and Logic Unit (ALU) – performs arithmetic (add, subtract) and logical (AND, OR, NOT, shift) operations.
Control Unit (CU) – generates control signals that direct the operation of the ALU, registers and the rest of the system; it sequences the instruction cycle.

These units are interconnected by an internal common bus and the status (flag) register.

        +--------------------------------------+
        |               CPU                    |
        |  +---------+   +-----+   +---------+  |
        |  | Register|<->| ALU |<->| Control |  |
        |  |   Set   |   +-----+   |  Unit   |  |
        |  +---------+      ^      +---------+  |
        |        ^          |          |       |
        +--------|----------|----------|-------+
                 v          v          v
               Internal common bus  ->  Memory / I/O

Function of Various Registers

Register	Function
PC (Program Counter)	Holds the address of the next instruction to be fetched.
IR (Instruction Register)	Holds the instruction currently being executed.
MAR (Memory Address Register)	Holds the address of the memory location to be accessed.
MBR/MDR (Memory Buffer/Data Register)	Holds data read from or to be written to memory.
AC (Accumulator)	General-purpose register that holds operands and ALU results.
TR (Temporary Register)	Holds temporary data during operations.
DR (Data Register)	Holds the operand read from memory.
PSW / Flags	Holds status bits (carry, zero, sign, overflow).
SP (Stack Pointer)	Points to the top of the stack in memory.

Instruction Cycle

Every instruction passes through these phases:

Fetch: Copy PC to MAR, read memory into MBR, copy MBR to IR, and increment PC ( $PC \leftarrow PC+1$ ).
Decode: The control unit interprets the opcode in the IR to determine the operation.
Operand (Effective Address) Fetch: If the instruction needs an operand from memory, its address is computed and the operand is fetched.
Execute: The ALU performs the required operation and results are stored in a register/memory.
Interrupt check: If an interrupt is pending, it is serviced; otherwise the cycle repeats.

  +-------+   +--------+   +---------+   +---------+
  | Fetch |-->| Decode |-->| Operand |-->| Execute |---+
  +-------+   +--------+   +---------+   +---------+   |
     ^                                                 |
     +----------------- (next instruction) <-----------+

Thus the CPU repeatedly performs the fetch–decode–execute cycle to run a program.

Answer 3

Flynn's Classification of Computer Architectures

Michael Flynn (1966) classified computer architectures according to the number of concurrent instruction streams and data streams a machine can handle, giving four categories.

1. SISD (Single Instruction, Single Data)

One instruction stream operates on one data stream; instructions are executed sequentially.
This is the conventional uniprocessor (von Neumann) machine.
Examples: Traditional single-core PCs, older mainframes (e.g., IBM PC, classic Intel 8086 based systems).

Instruction --> [ CU ] --> [ PU ] --> Data

2. SIMD (Single Instruction, Multiple Data)

A single instruction is executed simultaneously on multiple data items by many processing units under one control unit.
Ideal for data-parallel problems (vector/matrix, image processing).
Examples: Array processors, vector processors (Cray), GPUs, and SSE/AVX/MMX instructions.

            +--> [PU1] --> Data1
Instr --> CU +--> [PU2] --> Data2
            +--> [PU3] --> Data3

3. MISD (Multiple Instruction, Single Data)

Multiple instruction streams operate on a single data stream.
Rare/largely theoretical. Practical use is in fault-tolerant systems where different processors run different algorithms on the same data and compare results.
Example: Systolic arrays, redundant flight-control computers (e.g., Space Shuttle).

4. MIMD (Multiple Instruction, Multiple Data)

Multiple processors execute different instructions on different data independently and concurrently. The most general and flexible class.
Subtypes: shared-memory (multiprocessors) and distributed-memory (multicomputers/clusters).
Examples: Modern multicore CPUs, multiprocessor servers, computer clusters, distributed systems.

Summary Table

Class	Instruction Streams	Data Streams	Example
SISD	1	1	Uniprocessor PC
SIMD	1	Many	GPU, vector/array processor
MISD	Many	1	Systolic array, fault-tolerant systems
MIMD	Many	Many	Multicore CPU, clusters

Answer 4

Register Transfer Language (RTL)

Register Transfer Language is a symbolic notation used to describe the micro-operations that transfer information among the registers of a digital system. It precisely specifies what data movement and processing happen on each clock pulse.

Notation:

Registers are denoted by capital letters: $R1, R2, PC, MAR$ , etc.
A transfer is shown by an arrow ( $\leftarrow$ ): $R2 \leftarrow R1$ means the contents of $R1$ are copied into $R2$ .
Conditional transfer uses a control function before a colon: $P: R2 \leftarrow R1$ means transfer occurs only when $P = 1$ .

Types of Micro-operations (with examples)

Register transfer (move): copies data between registers.
- $R1 \leftarrow R2$
Arithmetic micro-operations: add, subtract, increment, decrement, complement.
- $R3 \leftarrow R1 + R2$ (addition)
- $R1 \leftarrow R1 + 1$ (increment)
- $R2 \leftarrow \overline{R2} + 1$ (2's complement / negation)
Logic micro-operations: bit-wise AND, OR, XOR, NOT.
- $R1 \leftarrow R1 \wedge R2$ (AND)
- $R1 \leftarrow R1 \vee R2$ (OR)
- $R1 \leftarrow \overline{R1}$ (complement)
Shift micro-operations: logical, circular and arithmetic shifts.
- $R1 \leftarrow shl\ R1$ (shift left)
- $R2 \leftarrow shr\ R2$ (shift right)

Memory example: Reading from memory: $MAR \leftarrow PC$ , then $MBR \leftarrow M[MAR]$ .

RTL therefore provides a concise, hardware-independent way to describe the internal data flow of a CPU.

Answer 5

Restoring Division of Unsigned Integers

The restoring division algorithm divides an unsigned dividend by an unsigned divisor to produce a quotient and remainder, using shift and subtract operations.

Registers

$Q$ = dividend (quotient bits replace it), $M$ = divisor, $A$ = partial remainder (initially 0). $n$ = number of bits.

Algorithm

A = 0,  load dividend in Q,  divisor in M
repeat n times:
    1. Shift left the pair (A, Q) by one bit
    2. A = A - M
    3. if A < 0 (MSB = 1):   // unsuccessful
          Q0 = 0
          A = A + M          // restore A
       else:                 // A >= 0, successful
          Q0 = 1
end repeat
Quotient = Q,  Remainder = A

Example: $11 \div 3$ (4-bit, dividend $Q = 1011$ , divisor $M = 0011$ )

Initially $A = 0000$ , $Q = 1011$ . In each of the $n = 4$ iterations the pair $(A,Q)$ is shifted left (the MSB of $Q$ moves into $A$ ), then $A = A - M$ is tried.

Step	After shift left (A : Q)	A = A - M (= A - 0011)	Decision	Result (A : Q)
1	0001 : 0110	0001 - 0011 < 0	restore, Q0 = 0	0001 : 0110
2	0010 : 1100	0010 - 0011 < 0	restore, Q0 = 0	0010 : 1100
3	0101 : 1000	0101 - 0011 = 0010 (>=0)	keep, Q0 = 1	0010 : 1001
4	0101 : 0010	0101 - 0011 = 0010 (>=0)	keep, Q0 = 1	0010 : 0011

Final answer: Quotient $Q = 0011 = 3$ and Remainder $A = 0010 = 2$ , which satisfies $11 = 3 \times 3 + 2$ . The quotient ends up in $Q$ and the remainder in $A$ .

Answer 6

Microprogrammed Control

In a microprogrammed control unit, the control signals required to execute each instruction are not generated by hard-wired logic, but are stored as microinstructions (microcode) in a special control memory (a fast ROM/Control Store).

Concept

Each machine instruction is implemented by a sequence of microinstructions called a microprogram.
Each microinstruction contains the control bits (micro-operations) that must be active in one clock cycle, plus information about the address of the next microinstruction.
The Control Address Register (CAR) holds the address of the next microinstruction in control memory; the Control Data Register (CDR) holds the microinstruction read out.
A sequencer determines the next CAR value (next address, branch, or mapping from the opcode).

Opcode --> Mapping --> CAR --> Control Memory --> CDR --> control signals
                        ^                              |
                        +------ Sequencer <------------+

Advantages: Flexible, easy to design and modify (just change microcode), supports complex instructions. Disadvantage: Slower than hard-wired control.

Microinstruction Format

A microinstruction is typically divided into fields:

Field	Purpose
Control / micro-operation field	Bits that specify which micro-operations (control signals) to activate. Often grouped into encoded sub-fields.
Condition (CD) field	Selects the status bit / condition to test for branching.
Branch (BR) field	Specifies the type of branch (next, jump, call, return, map).
Address field	Address of the next microinstruction if a branch is taken.

Formats can be horizontal (long, one bit per control signal, fast, less encoding) or vertical (short, encoded fields decoded by a decoder, more compact but slower).

Answer 7

Virtual Memory

Virtual memory is a memory-management technique that gives the programmer the illusion of a very large main memory by using a combination of main memory (RAM) and secondary storage (disk). Programs are written using logical (virtual) addresses that are independent of the available physical memory; the system automatically transfers data between disk and RAM as needed.

Benefits: programs larger than physical RAM can run, allows multiprogramming, and provides protection and relocation through address translation.

A Memory Management Unit (MMU) translates virtual addresses into physical addresses at run time.

Paging

Paging is the most common implementation of virtual memory:

The virtual address space is divided into fixed-size blocks called pages.
The physical memory is divided into blocks of the same size called page frames.
Any page can be loaded into any free frame, so memory need not be contiguous (eliminates external fragmentation).
A page table (one entry per page) maps each virtual page number to a physical frame number and holds a valid/present bit.

Address translation: A virtual address is split into a page number and an offset:

\text{Virtual Address} = (\text{Page Number}, \text{Offset})

The page number indexes the page table to get the frame number; the offset is appended to give the physical address.

If the present bit shows the page is not in RAM, a page fault occurs; the required page is fetched from disk into a frame (replacing a victim page if needed, e.g., by LRU/FIFO).
A Translation Lookaside Buffer (TLB) caches recent translations to speed up the lookup.

Answer 8

Instruction Format and Address Types

Instruction Format

An instruction format defines the layout of bits in a machine instruction. It is divided into fields:

Field	Purpose
Opcode	Specifies the operation to be performed (ADD, LOAD, JMP, ...).
Mode	Specifies the addressing mode (direct, indirect, immediate, indexed, ...).
Operand / Address	Specifies the operand value or the address of the operand(s).

+--------+------+-----------------+
| Opcode | Mode | Address/Operand |
+--------+------+-----------------+

The number of bits determines how many operations (opcode width) and how large an address space the instruction can reference.

Types of Instructions Based on Number of Addresses

1. Three-address instruction: Two source operands and one destination are specified.

ADD R1, R2, R3 → $R1 \leftarrow R2 + R3$ . Programs are short but instructions are long.

2. Two-address instruction: One operand is both source and destination.

ADD R1, R2 → $R1 \leftarrow R1 + R2$ . Most common in commercial CPUs.

3. One-address instruction: Uses an implied accumulator (AC) for one operand and the result.

ADD X → $AC \leftarrow AC + M[X]$ .

4. Zero-address instruction: Operands are taken implicitly from a stack (used in stack machines).

ADD → pops top two stack elements, adds them, pushes the result.

Fewer addresses give shorter instructions but require more instructions to perform a task (e.g., evaluating an expression).

Answer 9

Set-Associative Mapping of Cache Memory

Set-associative mapping is a compromise between direct mapping and fully associative mapping. The cache is divided into a number of sets, and each set contains $k$ lines (called a k-way set-associative cache). A main-memory block maps to a fixed set (like direct mapping) but can be placed in any line within that set (like associative mapping).

Address Division

The main-memory address is divided into three fields:

\text{Address} = (\text{Tag}, \text{Set number}, \text{Word/Offset})

Set field: selects which set the block maps to: $\text{set} = (\text{block number}) \bmod (\text{number of sets})$ .
Tag field: stored with each cache line and compared (in parallel for all lines in the set) to identify the block.
Word/Offset: selects the word within the block.

Number of sets $= \dfrac{\text{number of cache lines}}{k}$ .

Example

Suppose a cache has 8 lines, block size = 1 word, and it is 2-way set-associative.

Number of sets $= 8 / 2 = 4$ sets (set 0–3).
Main-memory block $B$ maps to set $B \bmod 4$ .
Example: memory block 13 → set $13 \bmod 4 = 1$ . It can be stored in either of the 2 lines of set 1.
Block 9 → set $9 \bmod 4 = 1$ as well; both blocks 13 and 9 can coexist in set 1 (two lines), avoiding the conflict that direct mapping would cause.

Advantages

Reduces conflict (collision) misses compared to direct mapping.
Cheaper and faster than fully associative mapping (only $k$ tag comparisons per access, not all lines).
Replacement (LRU/FIFO) is applied only within the small set.

Answer 10

IEEE 754 Single-Precision Floating-Point Representation

The IEEE 754 single-precision format stores a real number in 32 bits, divided into three fields:

Field	Bits	Description
Sign (S)	1	0 = positive, 1 = negative
Exponent (E)	8	Biased exponent (bias = 127)
Mantissa / Fraction (M)	23	Fractional part of the normalized significand

| S |  E (8 bits)  |        M (23 bits)        |
 31  30        23  22                       0

The value represented (for normalized numbers) is:

N = (-1)^S \times 1.M \times 2^{(E - 127)}

The leading 1 before the binary point is implicit ("hidden bit").

Example: Represent $-6.75$

Step 1 — Convert to binary: $6.75 = 110.11_2$ .

Step 2 — Normalize: $110.11 = 1.1011 \times 2^{2}$ . So the exponent is $2$ and the mantissa is $1011$ .

Step 3 — Sign: number is negative → $S = 1$ .

Step 4 — Biased exponent: $E = 2 + 127 = 129 = 10000001_2$ .

Step 5 — Mantissa (23 bits): $10110000000000000000000$ .

Result (32 bits):

1 10000001 10110000000000000000000

In hexadecimal this is 0xC0D80000.

Answer 11

Memory Hierarchy in a Computer System

The memory hierarchy organizes the different types of memory in a system into levels according to speed, cost per bit and capacity. As we move up the hierarchy memory becomes faster, smaller and costlier per bit; as we move down it becomes slower, larger and cheaper.

            ^  Faster, smaller, costlier
            |   +------------------+
            |   |    Registers     |   (inside CPU, fastest)
            |   +------------------+
            |   |   Cache (L1/L2/L3)|  (SRAM)
            |   +------------------+
            |   |  Main Memory (RAM)|  (DRAM)
            |   +------------------+
            |   | Secondary Storage |  (SSD/HDD)
            |   +------------------+
            |   |  Tertiary/Backup  |  (magnetic tape, optical)
            v   +------------------+
               Slower, larger, cheaper

Levels

Registers: Inside the CPU; fastest, very small capacity; hold operands being processed.
Cache memory (SRAM): Small, fast buffer between CPU and main memory; stores frequently used data/instructions (L1, L2, L3).
Main memory (DRAM/RAM): Primary working memory holding currently running programs and data; volatile.
Secondary storage (HDD/SSD): Large, non-volatile storage for files and programs not in current use.
Tertiary / backup storage (tapes, optical discs): Very large capacity, slowest, used for archival/backup.

Why a Hierarchy Works

It exploits the principle of locality of reference (temporal and spatial): recently or nearby accessed data is likely to be accessed again. Keeping such data in faster upper levels gives the speed of the fast memory at nearly the cost of the cheap memory.

Answer 12

Interrupts

An interrupt is a signal to the processor (from hardware or software) that an event needs immediate attention, causing the CPU to suspend its current program, save its state, and transfer control to a special routine called the Interrupt Service Routine (ISR). After servicing, the CPU resumes the interrupted program from where it left off. Interrupts allow the CPU to respond to events without continuous polling, improving efficiency.

Types of Interrupts

1. Hardware Interrupts – generated by external hardware devices through interrupt request lines.

Maskable interrupts: can be enabled/disabled (masked) by the CPU (e.g., normal device I/O interrupts via INTR).
Non-maskable interrupts (NMI): cannot be disabled; used for critical events such as power failure or memory parity error.

2. Software Interrupts – generated by executing a special instruction (e.g., INT n, system calls / traps). Used by programs to request OS services.

3. Internal Interrupts / Exceptions (Traps) – generated inside the processor due to error or special conditions during execution, such as divide-by-zero, arithmetic overflow, invalid opcode, or page fault.

Other classifications

Vectored vs Non-vectored: In a vectored interrupt the device supplies the address of its ISR (interrupt vector); in non-vectored the ISR address is fixed.
Priority: When several interrupts occur, a priority scheme (or daisy chaining) decides which is serviced first.

Level	BSc CSIT (TU)
Stream	Science
Subject	Computer Architecture (BSc CSIT, CSC208)
Year	2081 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

Section A: Long Answer Questions