BSc CSIT (TU) Science Distributed System (BSc CSIT, CSC462) Question Paper 2075 Nepal

Q: Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2075?

The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2075 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Distributed System (BSc CSIT, CSC462) 2075 paper come with solutions?

Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2075 paper?

The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Distributed System (BSc CSIT, CSC462) past paper free?

Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What are election algorithms in distributed systems? Explain the Bully algorithm and the Ring algorithm with suitable examples.

election

Answer 1

Election Algorithms

In a distributed system many algorithms require one process to act as a coordinator (e.g., for mutual exclusion, replica master, or transaction commit). An election algorithm is a procedure by which the processes choose a unique coordinator among themselves, typically the active process with the highest process identifier (ID). An election is started whenever a process notices that the current coordinator has crashed. A correct election algorithm must guarantee that, when it terminates, all non-faulty processes agree on the same new coordinator.

Bully Algorithm (Garcia-Molina)

Assumes each process has a unique ID and processes know each other's IDs. When process $P$ detects coordinator failure:

Election message: $P$ sends an ELECTION message to every process with a higher ID.
Response: If no higher process replies (within timeout), $P$ wins and becomes coordinator. It sends a COORDINATOR message to all lower-ID processes.
If some higher process $Q$ replies with OK, $P$ drops out and $Q$ takes over the election, repeating step 1 among still-higher processes.
The eventual winner is always the highest-numbered live process — hence it "bullies" the others into submission.

Example: Processes $P_0 \ldots P_7$ , coordinator $P_7$ crashes. $P_4$ notices and sends ELECTION to $P_5, P_6, P_7$ . $P_5$ and $P_6$ reply OK; each then holds its own election. $P_6$ gets no reply (only $P_7$ is higher and it is dead), so $P_6$ wins and broadcasts COORDINATOR to $P_0 \ldots P_5$ .

Ring Algorithm (Chang–Roberts)

Processes are logically arranged in a ring; each process knows its successor. When a process detects coordinator failure:

It builds an ELECTION message containing its own ID and forwards it to the next live successor (skipping dead ones).
Each process receiving the message appends its own ID and passes it on.
When the message returns to the initiator (it sees its own ID in the list), the highest ID in the list is the new coordinator. It then circulates a COORDINATOR message around the ring to inform everyone.

Example: Ring $P_1 \to P_2 \to P_4 \to P_5 \to P_1$ , with $P_6$ (coordinator) crashed. $P_2$ starts: message $\{2\}$ , then $\{2,4\}$ , $\{2,4,5\}$ , $\{2,4,5,1\}$ returns to $P_2$ . Highest = $P_5$ , which becomes coordinator.

Comparison

Aspect	Bully	Ring
Topology	Fully connected	Logical ring
Messages (worst case)	$O(n^2)$	$O(2n)$
Winner	Highest live ID	Highest live ID

Answer 2

Data Replication

Data replication is the process of maintaining multiple copies (replicas) of the same data on different machines in a distributed system. Replicas are created to improve performance (clients access a nearby copy), availability/reliability (the system survives node failures), and scalability (load is spread across copies). The central challenge is consistency: when one replica is updated, the others must be kept in agreement, which costs bandwidth and coordination.

Consistency Models

A consistency model is a contract between the data store and the processes specifying what guarantees are given about the order and visibility of read/write operations.

Data-centric models:

Strict consistency: A read returns the value of the most recent write (requires absolute global time; impractical).
Sequential consistency (Lamport): The result of execution is the same as if all operations were executed in some sequential order, and each process's operations appear in program order.
Causal consistency: Writes that are causally related must be seen in the same order by all processes; concurrent writes may be seen in any order.
FIFO / PRAM consistency: Writes by a single process are seen by all others in the order issued; writes by different processes may be reordered.
Weak / Release / Entry consistency: Use synchronization variables (acquire/release); consistency is enforced only at synchronization points.

Client-centric models (for mostly-read stores): monotonic reads, monotonic writes, read-your-writes, writes-follow-reads — guarantee consistency from a single client's viewpoint despite eventual consistency overall.

Replica Management Techniques

Replica placement: deciding where copies live — permanent replicas (origin servers), server-initiated replicas (dynamically created near demand), and client-initiated replicas (caches).
Update propagation: push (server proactively sends updates) vs pull (client fetches updates), and propagating either the update operation or the modified data.
Consistency / write protocols:
- Primary-based protocols: all writes go to a single primary replica which then propagates them (remote-write or local-write).
- Replicated-write protocols: writes go to many replicas — active replication (operation sent to all) or quorum-based (read quorum $N_R$ and write quorum $N_W$ with $N_R + N_W > N$ and $N_W > N/2$ to avoid conflicts).

Answer 3

Distributed File System (DFS) Architecture

A distributed file system lets clients on different machines access and share files stored on remote servers as if the files were local, providing location transparency, network transparency, and a common namespace. Key architectural components:

Client module / VFS layer: intercepts file system calls and forwards remote ones over the network.
File service: operations on individual files (read, write, create, delete).
Directory service: maps human-readable names to file identifiers (name resolution).
Server module: stores file data and metadata on disk.

Desirable properties: transparency, concurrent file sharing, replication, fault tolerance, caching for performance, and security.

Network File System (NFS)

NFS (developed by Sun) is the classic example of a DFS using a client–server, stateless design.

Design / working:

Architecture: Built on the Virtual File System (VFS) interface, so NFS-mounted remote directories look identical to local ones. A client mounts a remote directory into its local tree.
Communication: Uses Remote Procedure Calls (RPC) with External Data Representation (XDR) for machine-independent data, originally over UDP.
Statelessness (NFSv2/v3): The server keeps no client state between requests; every request (via a file handle) is self-contained. This makes crash recovery trivial — the client simply retries after the server reboots. Operations are idempotent.
Caching: Clients cache file blocks and attributes to reduce traffic, validated by timestamps; this gives only approximate consistency.
Consistency: NFS provides close-to-open / weak consistency rather than strict UNIX semantics.

Andrew File System (AFS) — alternative

Designed for large scale; uses whole-file caching on the client's local disk.
On open, the entire file is fetched to the client; reads/writes are local; on close the modified file is written back.
Uses callbacks: the server promises to notify a client if a cached file changes, giving better scalability than NFS's repeated validation.
Provides session semantics and is stateful, reducing server load and supporting thousands of clients.

Answer 4

Two-Phase Commit (2PC) Protocol

2PC is an atomic commit protocol ensuring that a distributed transaction either commits at all participating sites or aborts at all of them. One process is the coordinator; the others are participants.

Phase 1 — Voting (Prepare) phase

Coordinator sends a VOTE-REQUEST (prepare) to all participants.
Each participant that can commit writes its data to stable log, replies VOTE-COMMIT (ready); otherwise it replies VOTE-ABORT.

Phase 2 — Decision (Commit) phase 3. If all votes are COMMIT, the coordinator logs and sends GLOBAL-COMMIT to all; if any vote is ABORT (or a timeout occurs), it sends GLOBAL-ABORT. 4. Each participant carries out the decision (commit/abort) and sends an ACK.

Drawback: 2PC is a blocking protocol — if the coordinator crashes after participants voted COMMIT, participants must wait (block) holding locks until it recovers. Three-phase commit (3PC) reduces this blocking.

Answer 5

Consistency Model

A consistency model is a contract between a distributed (replicated) data store and the processes using it. It specifies the guarantees the store makes about the order in which writes become visible and what value a read may return, given that data is replicated. It tells programmers what to expect when reads and writes are issued concurrently to different replicas.

Sequential Consistency

Defined by Lamport, sequential consistency requires that:

The result of any execution is the same as if the operations of all processes were executed in some single sequential (total) order, and the operations of each individual process appear in this sequence in the order specified by its program.

Key points:

All processes see the same interleaving of all operations.
A process's own operations keep their program order.
It does not require operations to follow real (wall-clock) time — only a consistent global order that everyone agrees on.

Example: If $P_1$ writes $x=1$ and $P_2$ writes $x=2$ , every process must observe the same final order (e.g., all see $1$ then $2$ , or all see $2$ then $1$ ) — never $P_3$ seeing $1,2$ while $P_4$ sees $2,1$ .

Answer 6

Naming and Name Resolution

Naming is the use of names to identify and refer to entities (hosts, files, processes, services, users) in a distributed system. A name is a string referring to an entity; an address is the name of the access point where the entity can be reached; an identifier uniquely names exactly one entity and never changes. Names provide a layer of transparency so entities can move without breaking references.

Types of names: human-friendly names (e.g., www.tu.edu.np), addresses, and pure identifiers.

Name resolution is the process of mapping a name to the entity (or its address) it refers to. It is performed by a name space organized as a hierarchical tree of directory nodes (e.g., the DNS hierarchy).

Name space: a labelled, directed graph; a path from the root spells out the name.
Resolution mechanisms:
- Iterative resolution: the client queries the root server, gets a referral to the next server, and queries each server itself.
- Recursive resolution: the client queries one server, which queries the next on the client's behalf and returns the final answer (allows better caching, more server load).

Example (DNS): Resolving mail.tu.edu.np proceeds root → .np → .edu.np → tu.edu.np, each step returning the address of the next-level name server until the host's IP is found. Caching of results speeds up repeated lookups.

Answer 7

Processes vs Threads

A process is an independent program in execution with its own address space and resources. A thread is a lightweight unit of execution within a process that shares the process's address space, code, and open files with sibling threads. In distributed systems, multithreaded servers (and clients) are used to handle many requests concurrently and to overlap communication latency with computation.

Aspect	Process	Thread
Address space	Separate, private	Shared with peer threads
Creation/switch cost	High	Low (lightweight)
Communication	IPC / message passing	Shared memory (fast)
Isolation/protection	Strong (one crash isolated)	Weak (a bad thread can crash whole process)
Resources	Own resources	Shares process resources

Relevance to distributed systems: A multithreaded server dedicates one thread per client request so that a thread blocked on remote I/O does not stall others, improving throughput. Threads make clients more responsive (e.g., one thread per RPC) and exploit multiprocessors, whereas processes are used when strong isolation/fault containment is needed across machines.

Answer 8

Vector Clock

A vector clock is a mechanism for capturing causality among events in a distributed system. Each of the $n$ processes maintains a vector $V$ of $n$ integer counters, where $V_i[j]$ is process $P_i$ 's knowledge of the number of events at process $P_j$ .

Rules:

Before each internal/send event, $P_i$ increments its own entry: $V_i[i] = V_i[i] + 1$ .
Each message carries the sender's vector timestamp.
On receiving a message with timestamp $V_{msg}$ , the receiver sets $V_i[k] = \max(V_i[k], V_{msg}[k])$ for all $k$ , then increments $V_i[i]$ .

Comparison: $V_a \le V_b$ iff $V_a[k] \le V_b[k]$ for all $k$ ; event $a \to b$ (causally precedes) iff $V_a < V_b$ . If neither $V_a \le V_b$ nor $V_b \le V_a$ , the events are concurrent.

Improvement over Lamport's Logical Clock

Lamport's scalar clock guarantees only one direction: $a \to b \Rightarrow L(a) < L(b)$ . The converse is not true — $L(a) < L(b)$ does not imply $a$ happened before $b$ , so a scalar clock cannot detect concurrency or causality precisely.

Vector clocks fix this: $V_a < V_b \iff a \to b$ (biconditional). Thus vector clocks can exactly determine whether two events are causally related or concurrent, at the cost of $O(n)$ storage per timestamp instead of $O(1)$ .

Answer 9

Replication

Replication is the technique of keeping multiple copies (replicas) of the same data or service on different nodes of a distributed system so that clients can access whichever copy is most convenient, and the system can keep working even if some copies fail.

Advantages

Improved reliability / fault tolerance: if one replica fails, others continue serving — no single point of failure.
Higher availability: data remains accessible despite node or network failures.
Better performance: clients read from a nearby replica, reducing latency and network traffic.
Increased scalability: read load is spread across many replicas (load balancing).

Challenges

Consistency maintenance: every update must be propagated to all replicas; keeping them in agreement is hard (the core problem).
Update propagation cost: extra bandwidth and overhead to synchronize replicas, which can reduce write performance.
Concurrency / conflict resolution: simultaneous writes at different replicas can conflict and must be ordered or reconciled.
Trade-off (CAP): under network partitions one must trade consistency against availability.
Replica placement and management complexity.

Answer 10

Network File System (NFS)

NFS, developed by Sun Microsystems, is a widely used distributed file system that lets a client transparently access files on a remote server as though they were local.

Key features:

Client–server model built on the Virtual File System (VFS) interface, so remote files look identical to local files.
Mounting: a client mounts a remote directory into its local directory tree, providing location transparency.
RPC + XDR: all operations are carried out via Remote Procedure Calls using External Data Representation for machine-independent data exchange (originally over UDP).
Stateless server (NFSv2/v3): the server keeps no per-client state; each request carries a self-describing file handle. This makes crash recovery simple — clients just retry after a server reboot — and operations are idempotent.
Caching: clients cache data and attributes (validated by timestamps) to improve performance, giving weak (close-to-open) consistency.
Later versions (NFSv4) are stateful, support compound operations, strong security, and work over TCP.

Advantages: transparency, simple recovery, OS-independence. Limitation: weak consistency and security in early versions.

Answer 11

Security Challenges in Distributed Systems

Because components communicate over open, untrusted networks, distributed systems face several security challenges:

Confidentiality: preventing eavesdropping / disclosure of data on the network.
Integrity: ensuring messages and data are not altered (tampering).
Authentication: verifying the true identity of communicating parties (preventing spoofing/masquerade).
Authorization / access control: ensuring an authenticated party may only do what it is permitted.
Availability: resisting denial-of-service (DoS) attacks.
Non-repudiation, replay attacks, key management, and trust across multiple administrative domains.

Authentication

Authentication is the process of verifying the identity of a user, process, or host before granting access. Mechanisms:

Password / shared-secret based.
Symmetric-key authentication: parties prove knowledge of a shared key, often via a trusted Key Distribution Center (KDC). The Needham–Schroeder protocol and Kerberos use a KDC that issues tickets, with nonces and timestamps to defeat replay attacks.
Public-key authentication: a party proves possession of a private key (digital signatures / challenge–response), with public keys vouched for by certificates from a Certificate Authority.

Authentication is usually combined with session key establishment so that subsequent messages are encrypted and integrity-protected.

Answer 12

Scalability

Scalability is the ability of a distributed system to handle growth — in the number of users/requests, in geographic spread, or in administrative size — without a significant loss of performance or manageability. The three dimensions are:

Size scalability: more users and resources can be added.
Geographical scalability: users and resources may be far apart.
Administrative scalability: the system spans many independent organizations.

Techniques to Achieve Scalability

Hiding communication latencies: use asynchronous communication and move computation to the client so it does not block waiting for remote replies.
Replication: keep multiple copies of data/services to balance load and place data near users (improves both performance and availability).
Caching: a special form of replication on the client side that reduces repeated remote access (e.g., DNS, web caches).
Distribution / partitioning: split a component or data set into parts spread across machines (e.g., DNS splits the name space into zones; databases use sharding).
Decentralization: avoid centralized servers, data, and algorithms; use distributed/peer-to-peer structures and decentralized routing to remove bottlenecks and single points of failure.

Replication and caching introduce a consistency cost, which is the main trade-off when scaling.

Level	BSc CSIT (TU)
Stream	Science
Subject	Distributed System (BSc CSIT, CSC462)
Year	2075 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

BSc CSIT (TU) Science Distributed System (BSc CSIT, CSC462) Question Paper 2075 Nepal

Section A: Long Answer Questions

Election Algorithms

Bully Algorithm (Garcia-Molina)

Ring Algorithm (Chang–Roberts)

Comparison

Data Replication

Consistency Models

Replica Management Techniques

Distributed File System (DFS) Architecture

Network File System (NFS)

Andrew File System (AFS) — alternative

Section B: Short Answer Questions

Two-Phase Commit (2PC) Protocol

Consistency Model

Sequential Consistency

Naming and Name Resolution

Processes vs Threads

Vector Clock

Improvement over Lamport's Logical Clock

Replication

Advantages

Challenges

Network File System (NFS)

Security Challenges in Distributed Systems

Authentication

Scalability

Techniques to Achieve Scalability

Frequently asked questions