BSc CSIT (TU) Science Computer Architecture (BSc CSIT, CSC208) Question Paper 2078 Nepal

Q: Where can I find the BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) question paper 2078?

The full BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2078 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Computer Architecture (BSc CSIT, CSC208) 2078 paper come with solutions?

Yes. Every question on this Computer Architecture (BSc CSIT, CSC208) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2078 paper?

The BSc CSIT (TU) Computer Architecture (BSc CSIT, CSC208) 2078 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Computer Architecture (BSc CSIT, CSC208) past paper free?

Yes — reading and attempting this Computer Architecture (BSc CSIT, CSC208) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Explain Flynn's classification of computer architectures (SISD, SIMD, MISD, MIMD) with examples.

parallel

Answer 1

Flynn's Classification of Computer Architectures

Michael Flynn (1966) classified computer architectures based on the number of concurrent instruction streams and data streams that can be processed. This gives four categories.

Class	Instruction Stream	Data Stream	Meaning
SISD	Single	Single	One instruction on one data item
SIMD	Single	Multiple	One instruction on many data items
MISD	Multiple	Single	Many instructions on one data item
MIMD	Multiple	Multiple	Many instructions on many data items

1. SISD (Single Instruction, Single Data)

A single processing unit fetches and executes one instruction at a time, operating on a single data stream. It corresponds to the classical von Neumann sequential machine. Instruction-level parallelism (pipelining) may exist, but there is only one instruction and one data stream logically.

Example: Traditional uniprocessor machines such as early IBM PCs, the Intel 8085/8086, and any single-core CPU.

2. SIMD (Single Instruction, Multiple Data)

A single control unit broadcasts one instruction to many processing elements, each operating on its own data. Ideal for data-parallel problems (vector/matrix/image processing).

Example: Vector processors (Cray-1), array processors, GPUs, and CPU SIMD extensions like Intel MMX/SSE/AVX.

3. MISD (Multiple Instruction, Single Data)

Several processing units execute different instructions on the same data stream. It is mostly theoretical and rarely built.

Example: Fault-tolerant systems that run the same data through redundant pipelines (e.g., flight-control voting systems); systolic arrays are sometimes cited.

4. MIMD (Multiple Instruction, Multiple Data)

Multiple autonomous processors each execute their own instruction stream on their own data stream. This is the most general and widely used parallel model.

Shared-memory MIMD (multiprocessors) or distributed-memory MIMD (multicomputers/clusters).
Example: Multi-core CPUs, symmetric multiprocessors (SMP), and clusters/supercomputers.

Summary

SISD = serial computer, SIMD = data parallelism (one op on many data), MISD = many ops on one datum (rare), MIMD = full multiprocessing. Modern systems often combine these (e.g., a multi-core CPU = MIMD with SIMD units inside each core).

Answer 2

Addressing Modes

An addressing mode is the method (specified in the instruction) used to determine the effective address (EA) of the operand, i.e., the way the operand of an instruction is located. Addressing modes give flexibility in accessing operands (registers, immediate values, memory) and support features like pointers, arrays, and program relocation.

Assume R1, R2 are registers, M[] denotes memory, and AC is an accumulator.

1. Implied / Implicit Mode

The operand is specified implicitly in the instruction itself; no address field is needed.

Example: CMA (complement accumulator), CLC (clear carry). Stack operations also use implied SP.

2. Immediate Mode

The operand (a constant) is given directly in the instruction's address field.

Example: MOV R1, #5 → R1 = 5.

3. Register Mode

The operand is in a CPU register; the address field names the register. EA = register.

Example: MOV R1, R2 → R1 = R2.

4. Register Indirect Mode

The register holds the address of the operand in memory. EA = contents of register.

Example: MOV R1, (R2) → R1 = M[R2].

5. Direct (Absolute) Mode

The address field contains the memory address of the operand. EA = address field.

Example: MOV R1, 2000 → R1 = M[2000].

6. Indirect Mode

The address field points to a memory location that holds the address of the operand. EA = M[address field].

Example: if M[2000] = 3000, then MOV R1, @2000 → R1 = M[3000].

7. Indexed Mode

EA = base address + contents of an index register X. Useful for arrays.

Example: EA = Address + X; element A[i] accessed by varying X.

8. Base-Register Mode

EA = base register + offset in instruction. Used for relocation/segmentation.

Example: EA = BR + offset.

9. Relative Mode

EA = contents of Program Counter (PC) + signed offset. Used in branch instructions.

Example: EA = PC + offset → jump relative to current instruction.

10. Auto-Increment / Auto-Decrement Mode

Like register indirect, but the register is automatically incremented/decremented after/before access. Useful for stepping through arrays/stacks.

Example: MOV R1, (R2)+ reads M[R2] then increments R2.

Summary

Immediate and register modes are fastest (no memory access for operand); direct/indirect/indexed/relative trade speed for flexibility and support arrays, pointers, and relocatable code.

Answer 3

Binary Adder-Subtractor

A binary adder-subtractor performs both addition and subtraction using a single circuit built from full adders, exploiting the fact that subtraction is addition of the 2's complement.

Design

For two n-bit numbers $A$ and $B$ with a mode control line $M$ :

Each bit $B_i$ is XORed with $M$ before entering full adder $i$ .
$M$ is also fed as the carry-in $C_0$ to the least-significant full adder.

S_i = A_i \oplus (B_i \oplus M) \oplus C_i

Operation:

When $M = 0$ : $B_i \oplus 0 = B_i$ and $C_0 = 0$ , so the circuit computes $A + B$ (addition).
When $M = 1$ : $B_i \oplus 1 = \overline{B_i}$ (1's complement) and $C_0 = 1$ , so it computes $A + \overline{B} + 1 = A - B$ (2's complement subtraction).

The overflow is detected by $V = C_n \oplus C_{n-1}$ (XOR of the last two carries) for signed numbers.

         M (mode)
          |
B3  B2  B1  B0
 |   |   |   |  (each XOR with M)
A3  A2  A1  A0
 FA<-FA<-FA<-FA <- C0 = M
 |   |   |   |
S3  S2  S1  S0

M=0 → A+B, M=1 → A−B.

Booth's Multiplication Algorithm

Booth's algorithm multiplies two signed 2's-complement numbers efficiently by examining pairs of bits. Let the multiplicand be M, multiplier be Q. Use an extra bit $Q_{-1}$ (initially 0), accumulator A (initially 0), and a counter for n bits.

Rule (examine $Q_0 Q_{-1}$ ):

00 or 11 → arithmetic shift right only.
01 → A = A + M, then arithmetic shift right.
10 → A = A − M, then arithmetic shift right.

Repeat for n cycles; result is in A,Q.

Multiply (−5) × (3) using Booth's Algorithm

Use 4-bit numbers ( $n = 4$ ):

Multiplicand M = −5 = 1011, so −M = +5 = 0101.
Multiplier Q = +3 = 0011.
Initially A = 0000, Q = 0011, Q₋₁ = 0.

Step	Operation	A	Q	Q₋₁
Init		0000	0011	0
1	Q₀Q₋₁=10 → A = A − M (A+0101)	0101	0011	0
	ASR	0010	1001	1
2	Q₀Q₋₁=11 → ASR	0001	0100	1
3	Q₀Q₋₁=01 → A = A + M (A+1011)	1100	0100	1
	ASR	1110	0010	0
4	Q₀Q₋₁=00 → ASR	1111	0001	0

Final result A,Q = 1111 0001.

Interpreting 11110001 as an 8-bit 2's-complement number:

11110001_2 = -(00001111_2) = -15

Verification: $(-5) \times (3) = -15$ . ✔

Answer 4

Virtual Memory

Virtual memory is a memory-management technique that gives a program the illusion of a very large, contiguous main memory by using a combination of RAM and secondary storage (disk). The total addressable (virtual) address space can be larger than the physical RAM. The OS, with hardware support (MMU), transparently moves data between disk and RAM as needed.

Benefits: programs larger than RAM can run, multiprogramming with isolation/protection, and efficient memory utilization.

Paging

Paging is the most common implementation of virtual memory:

The virtual address space is divided into fixed-size blocks called pages.
Physical memory is divided into equal-size blocks called frames (page size = frame size, e.g., 4 KB).
A page table (one per process) maps each virtual page number to a physical frame number.

Address translation: A virtual address is split into a page number and an offset:

\text{Physical Address} = (\text{Frame No. from page table}) \times \text{page size} + \text{offset}

When a referenced page is not in RAM, a page fault occurs; the OS loads the page from disk into a free frame (replacing one using LRU/FIFO if memory is full) and updates the page table. A TLB (Translation Lookaside Buffer) caches recent translations to speed this up.

Advantage: eliminates external fragmentation (fixed-size blocks). Paging is transparent to the programmer.

Answer 5

Instruction Format

An instruction format defines the layout of bits in a machine instruction. It typically consists of:

Opcode field – specifies the operation to be performed (e.g., ADD, LOAD).
Operand/Address field(s) – specify the operands or their addresses.
Mode field – (optional) specifies the addressing mode.

| Opcode | Mode | Address/Operand 1 | Address 2 | ... |

The number of address fields depends on the CPU organization.

Types of Instructions Based on Number of Addresses

1. Three-Address Instruction

Specifies two source operands and one destination. (General-register machines.)

Format: ADD R1, R2, R3 → R1 = R2 + R3.
Computes X = (A+B)*(C+D):

ADD T1, A, B
ADD T2, C, D
MUL X, T1, T2

Merit: short programs. Demerit: long instructions (many address bits).

2. Two-Address Instruction

Most common; one operand acts as both source and destination.

Format: ADD R1, R2 → R1 = R1 + R2.

MOV R1, A
ADD R1, B

3. One-Address Instruction

Uses an implied accumulator (AC) for one operand and result.

Format: ADD B → AC = AC + B.

LOAD A
ADD B
STORE X

4. Zero-Address Instruction

Uses a stack; operands are implicitly on top of the stack (no address field for arithmetic ops). Found in stack machines using reverse Polish notation.

Format: ADD → pops two top elements, pushes their sum.

PUSH A
PUSH B
ADD
POP X

Trade-off

More addresses → fewer instructions but longer instructions; fewer addresses → shorter instructions but more of them per program.

Answer 6

Set-Associative Mapping

Set-associative mapping is a cache mapping technique that combines the advantages of direct mapping and fully-associative mapping. The cache is divided into a number of sets, and each set contains k lines (called a k-way set-associative cache). A memory block maps to exactly one set (like direct mapping), but can be placed in any line within that set (like associative mapping).

Mapping Rule

\text{Set number} = (\text{Block number}) \bmod (\text{Number of sets})

The memory address is divided into three fields:

| Tag | Set (Index) | Word (Offset) |

Word/Offset: selects the byte/word within a block.
Set index: selects the set.
Tag: compared against the tags of all k lines in that set to detect a hit.

Example

Suppose:

Main memory = 4 K blocks (block numbers 0–4095).
Cache = 128 lines, organized as 2-way set associative ⇒ Number of sets = 128 / 2 = 64 sets.

Then block B maps to set B mod 64.

Block 0 → set 0, block 64 → set 0, block 128 → set 0, ... (all may reside in either of the 2 lines of set 0).
Block 5 → set 5, block 69 → set 5, etc.

When the CPU accesses an address, the set index selects one set; the tags of both lines in that set are compared in parallel with the address tag. A match = cache hit; otherwise a miss and a line is replaced (using LRU within the set).

Advantages

Fewer conflict misses than direct mapping (k choices per set).
Cheaper and faster than fully associative mapping (only k tag comparisons, not all lines).
Most modern caches are 2-way, 4-way, or 8-way set associative.

Answer 7

IEEE 754 Single-Precision Floating-Point Representation

The IEEE 754 single-precision format uses 32 bits to represent a real number, divided into three fields:

| Sign (1 bit) | Exponent (8 bits) | Mantissa/Fraction (23 bits) |

Sign (S): 1 bit — 0 for positive, 1 for negative.
Exponent (E): 8 bits — stored in biased form with a bias of 127. So $E_{stored} = E_{actual} + 127$ .
Mantissa (M): 23 bits — the fractional part of the normalized significand. A hidden/implicit leading 1 is assumed, so the actual significand is 1.M.

Value (for normalized numbers):

N = (-1)^S \times (1.M)_2 \times 2^{(E_{stored} - 127)}

Example: Represent $-12.625$

Step 1 – Convert to binary:

12.625_{10} = 1100.101_2

( $12 = 1100$ , $0.625 = 0.101$ )

Step 2 – Normalize (scientific form):

1100.101 = 1.100101 \times 2^{3}

Step 3 – Determine fields:

Sign $S = 1$ (number is negative).
Exponent: $E_{stored} = 3 + 127 = 130 = 10000010_2$ .
Mantissa: digits after the point = 100101, padded to 23 bits → 10010100000000000000000.

Step 4 – Assemble (32 bits):

1 10000010 10010100000000000000000

Grouping in hex: 1100 0001 0100 1010 0000 0000 0000 0000 = 0xC14A0000.

So $-12.625$ is stored as C1 4A 00 00 in IEEE 754 single precision.

Special values: $E=0, M=0$ → ±0; $E=255, M=0$ → ±∞; $E=255, M\ne0$ → NaN.

Answer 8

Memory Hierarchy

The memory hierarchy organizes the various types of memory in a computer system into levels based on speed, cost per bit, and capacity. As we move up the hierarchy (toward the CPU), memory becomes faster, smaller, and more expensive; as we move down, it becomes slower, larger, and cheaper.

        ^  Faster, Smaller, Costlier
        |
   [ CPU Registers ]        <- fastest, smallest
   [ Cache (L1, L2, L3) ]
   [ Main Memory (RAM) ]
   [ Secondary Storage (SSD/HDD) ]
   [ Tertiary Storage (Tape/Optical) ]  <- slowest, largest
        |
        v  Slower, Larger, Cheaper

Levels (from fastest to slowest)

Registers – inside the CPU; fastest, hold operands/results currently in use; capacity in bytes; accessed in 1 clock cycle.
Cache memory (SRAM) – L1/L2/L3; holds frequently used data/instructions to bridge the CPU–RAM speed gap.
Main memory (RAM, DRAM) – holds active programs and data; volatile; access in tens of nanoseconds.
Secondary storage (SSD/HDD) – non-volatile, large capacity for permanent storage; access in microseconds/milliseconds.
Tertiary/Backup storage (magnetic tape, optical disks) – very large, very slow, used for archival/backup.

Principle of Locality

The hierarchy works efficiently because of the principle of locality — programs tend to access a small portion of memory repeatedly (temporal locality) and nearby addresses (spatial locality). Frequently used data is kept in faster upper levels, so the system approaches the speed of the fastest memory at the cost close to the cheapest memory.

Answer 9

Interrupt

An interrupt is a signal sent to the CPU (by hardware or software) that temporarily suspends the currently executing program, transfers control to a special routine called an Interrupt Service Routine (ISR), and resumes the original program afterward. Interrupts allow the CPU to respond promptly to events (e.g., I/O completion) without continuously polling, improving efficiency and concurrency.

Basic sequence: current instruction completes → CPU saves PC and status (PSW) onto the stack → loads the ISR address (via interrupt vector) → executes ISR → restores saved state → resumes the interrupted program.

Types of Interrupts

1. Hardware Interrupts

Generated by external hardware devices.

(a) Maskable interrupts: can be enabled/disabled (ignored) by the CPU using interrupt-enable flags (e.g., INTR line). Used by most peripherals.
(b) Non-maskable interrupts (NMI): cannot be disabled; reserved for critical events such as power failure or hardware error.

2. Software Interrupts

Generated by executing a special instruction within a program (e.g., INT n in x86, system calls/traps). Used to request OS services.

3. Internal Interrupts / Exceptions (Traps)

Generated internally by the processor due to exceptional conditions during execution, such as:

Division by zero
Invalid/illegal opcode
Arithmetic overflow
Page fault / memory-protection violation

Classification Summary

By source: internal (exceptions/traps) vs external (hardware) vs software.
By maskability: maskable vs non-maskable.
By vectoring: vectored (device supplies ISR address) vs non-vectored (fixed ISR address).

Answer 10

System Bus

A system bus is a set of parallel conducting wires (communication pathway) that connects the major components of a computer — the CPU, main memory, and I/O devices — and carries data, addresses, and control signals among them. It is shared by all components, so only one device can drive the bus at a time (controlled by a bus arbiter). The system bus is logically divided into three functional groups:

1. Address Bus

Carries the memory or I/O address of the location the CPU wants to access.
It is unidirectional (from CPU to memory/I/O).
Its width determines the addressable memory size: an n-line address bus can address $2^n$ locations. (e.g., 32-bit address bus → $2^{32}$ = 4 GB addressable space).

2. Data Bus

Carries the actual data being transferred between CPU, memory, and I/O.
It is bidirectional (data can flow both ways: read and write).
Its width determines how many bits are transferred at once (e.g., 32-bit or 64-bit), affecting performance.

3. Control Bus

Carries control and timing signals that coordinate and synchronize the operations of all components.
Typical signals: Memory Read, Memory Write, I/O Read, I/O Write, Clock, Interrupt Request (INTR), Bus Request/Grant, Reset.
It is generally bidirectional (different lines flow in different directions).

Summary

The address bus says where, the data bus says what, and the control bus says how/when a transfer happens. Together they enable communication across the system.

Answer 11

DMA (Direct Memory Access)

Direct Memory Access (DMA) is an I/O technique in which a dedicated controller — the DMA controller (DMAC) — transfers a block of data directly between an I/O device and main memory without involving the CPU for each word. This frees the CPU from the overhead of programmed I/O or interrupt-driven I/O, greatly increasing throughput for high-speed devices (disks, network cards).

How DMA Transfers Data Without CPU Intervention

Setup (CPU involvement): The CPU programs the DMA controller by writing into its registers:
- the memory address (where data should go/come from),
- the word/byte count (number of words to transfer),
- the direction (read from or write to memory), and a start command.
Bus request: When the I/O device is ready, the DMAC requests control of the system bus by asserting the HOLD / Bus Request (BR) line to the CPU.
Bus grant (cycle stealing / DMA): The CPU finishes its current bus cycle, releases (floats) the system bus, and acknowledges with HLDA / Bus Grant (BG). The CPU is temporarily disconnected from the bus.
Data transfer: The DMAC now acts as bus master and transfers data directly between the I/O device and memory over the address/data buses, incrementing the address and decrementing the count after each word. No CPU instructions are executed for the transfer.
Completion: When the word count reaches zero, the DMAC releases the bus (deasserts HOLD) and interrupts the CPU to signal that the transfer is complete. The CPU resumes normal operation.

Modes

Burst (block) mode: entire block transferred while DMAC holds the bus.
Cycle stealing: DMAC takes the bus for one word at a time, releasing it between words so the CPU can still work.
Transparent/hidden mode: transfers only when CPU is not using the bus.

Advantage: CPU is involved only at the start and end, so it is free to do other work during the bulk transfer — much faster than programmed/interrupt I/O for large blocks.

Answer 12

RISC vs CISC

RISC (Reduced Instruction Set Computer) uses a small set of simple, fixed-length instructions optimized for fast, pipelined execution. CISC (Complex Instruction Set Computer) provides a large set of powerful, variable-length instructions, some of which perform complex multi-step operations.

Feature	RISC	CISC
Instruction set size	Small, simple instructions	Large, complex instructions
Instruction length	Fixed length	Variable length
Execution time	Usually 1 clock cycle per instruction	Multiple cycles per instruction
Addressing modes	Few (simple)	Many (complex)
Memory access	Only via LOAD/STORE (load-store architecture)	Operands can be in memory directly
Registers	Large number of general-purpose registers	Fewer registers
Control unit	Hardwired control	Microprogrammed control
Pipelining	Simple and highly efficient	Harder to pipeline
Code size	Larger (more instructions per program)	Smaller (fewer, denser instructions)
Complexity	In compiler (software)	In hardware/microcode
Examples	ARM, MIPS, SPARC, RISC-V, PowerPC	Intel x86, Motorola 68000, VAX

Summary

RISC pushes complexity to the compiler with many simple, uniform instructions that pipeline well, while CISC builds complex operations into hardware/microcode, reducing code size at the cost of slower, harder-to-pipeline execution. Modern x86 CPUs internally translate CISC instructions into RISC-like micro-operations, blending both philosophies.

Level	BSc CSIT (TU)
Stream	Science
Subject	Computer Architecture (BSc CSIT, CSC208)
Year	2078 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

Section A: Long Answer Questions

Flynn's Classification of Computer Architectures

1. SISD (Single Instruction, Single Data)

2. SIMD (Single Instruction, Multiple Data)

3. MISD (Multiple Instruction, Single Data)

4. MIMD (Multiple Instruction, Multiple Data)

Summary

Addressing Modes

1. Implied / Implicit Mode

2. Immediate Mode

3. Register Mode

4. Register Indirect Mode

5. Direct (Absolute) Mode

6. Indirect Mode

7. Indexed Mode

8. Base-Register Mode

9. Relative Mode

10. Auto-Increment / Auto-Decrement Mode

Summary

Binary Adder-Subtractor

Design

Booth's Multiplication Algorithm

Multiply (−5) × (3) using Booth's Algorithm

Section B: Short Answer Questions

Virtual Memory

Paging

Instruction Format

Types of Instructions Based on Number of Addresses

1. Three-Address Instruction

2. Two-Address Instruction

3. One-Address Instruction

4. Zero-Address Instruction

Trade-off

Set-Associative Mapping

Mapping Rule

Example

Advantages

IEEE 754 Single-Precision Floating-Point Representation

Example: Represent −12.625-12.625−12.625

Memory Hierarchy

Levels (from fastest to slowest)

Principle of Locality

Interrupt

Types of Interrupts

1. Hardware Interrupts

2. Software Interrupts

3. Internal Interrupts / Exceptions (Traps)

Classification Summary

System Bus

1. Address Bus

2. Data Bus

3. Control Bus

Summary

DMA (Direct Memory Access)

How DMA Transfers Data Without CPU Intervention

Modes

RISC vs CISC

Summary

Frequently asked questions

Example: Represent $-12.625$