Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What are election algorithms in distributed systems? Explain the Bully algorithm and the Ring algorithm with suitable examples.

Election Algorithms

In a distributed system many algorithms require one process to act as a coordinator (e.g., for mutual exclusion, replica master, or transaction commit). An election algorithm is a procedure by which the processes choose a unique coordinator among themselves, typically the active process with the highest process identifier (ID). An election is started whenever a process notices that the current coordinator has crashed. A correct election algorithm must guarantee that, when it terminates, all non-faulty processes agree on the same new coordinator.

Bully Algorithm (Garcia-Molina)

Assumes each process has a unique ID and processes know each other's IDs. When process PP detects coordinator failure:

  1. Election message: PP sends an ELECTION message to every process with a higher ID.
  2. Response: If no higher process replies (within timeout), PP wins and becomes coordinator. It sends a COORDINATOR message to all lower-ID processes.
  3. If some higher process QQ replies with OK, PP drops out and QQ takes over the election, repeating step 1 among still-higher processes.
  4. The eventual winner is always the highest-numbered live process — hence it "bullies" the others into submission.

Example: Processes P0P7P_0 \ldots P_7, coordinator P7P_7 crashes. P4P_4 notices and sends ELECTION to P5,P6,P7P_5, P_6, P_7. P5P_5 and P6P_6 reply OK; each then holds its own election. P6P_6 gets no reply (only P7P_7 is higher and it is dead), so P6P_6 wins and broadcasts COORDINATOR to P0P5P_0 \ldots P_5.

Ring Algorithm (Chang–Roberts)

Processes are logically arranged in a ring; each process knows its successor. When a process detects coordinator failure:

  1. It builds an ELECTION message containing its own ID and forwards it to the next live successor (skipping dead ones).
  2. Each process receiving the message appends its own ID and passes it on.
  3. When the message returns to the initiator (it sees its own ID in the list), the highest ID in the list is the new coordinator. It then circulates a COORDINATOR message around the ring to inform everyone.

Example: Ring P1P2P4P5P1P_1 \to P_2 \to P_4 \to P_5 \to P_1, with P6P_6 (coordinator) crashed. P2P_2 starts: message {2}\{2\}, then {2,4}\{2,4\}, {2,4,5}\{2,4,5\}, {2,4,5,1}\{2,4,5,1\} returns to P2P_2. Highest = P5P_5, which becomes coordinator.

Comparison

AspectBullyRing
TopologyFully connectedLogical ring
Messages (worst case)O(n2)O(n^2)O(2n)O(2n)
WinnerHighest live IDHighest live ID
election
2long10 marks

What is data replication in distributed systems? Explain different consistency models and replica management techniques used to maintain consistency.

Data Replication

Data replication is the process of maintaining multiple copies (replicas) of the same data on different machines in a distributed system. Replicas are created to improve performance (clients access a nearby copy), availability/reliability (the system survives node failures), and scalability (load is spread across copies). The central challenge is consistency: when one replica is updated, the others must be kept in agreement, which costs bandwidth and coordination.

Consistency Models

A consistency model is a contract between the data store and the processes specifying what guarantees are given about the order and visibility of read/write operations.

Data-centric models:

  • Strict consistency: A read returns the value of the most recent write (requires absolute global time; impractical).
  • Sequential consistency (Lamport): The result of execution is the same as if all operations were executed in some sequential order, and each process's operations appear in program order.
  • Causal consistency: Writes that are causally related must be seen in the same order by all processes; concurrent writes may be seen in any order.
  • FIFO / PRAM consistency: Writes by a single process are seen by all others in the order issued; writes by different processes may be reordered.
  • Weak / Release / Entry consistency: Use synchronization variables (acquire/release); consistency is enforced only at synchronization points.

Client-centric models (for mostly-read stores): monotonic reads, monotonic writes, read-your-writes, writes-follow-reads — guarantee consistency from a single client's viewpoint despite eventual consistency overall.

Replica Management Techniques

  1. Replica placement: deciding where copies live — permanent replicas (origin servers), server-initiated replicas (dynamically created near demand), and client-initiated replicas (caches).
  2. Update propagation: push (server proactively sends updates) vs pull (client fetches updates), and propagating either the update operation or the modified data.
  3. Consistency / write protocols:
    • Primary-based protocols: all writes go to a single primary replica which then propagates them (remote-write or local-write).
    • Replicated-write protocols: writes go to many replicas — active replication (operation sent to all) or quorum-based (read quorum NRN_R and write quorum NWN_W with NR+NW>NN_R + N_W > N and NW>N/2N_W > N/2 to avoid conflicts).
replicationconsistency
3long10 marks

Explain the architecture of a distributed file system. Discuss the design and working of the Network File System (NFS) or Andrew File System (AFS).

Distributed File System (DFS) Architecture

A distributed file system lets clients on different machines access and share files stored on remote servers as if the files were local, providing location transparency, network transparency, and a common namespace. Key architectural components:

  • Client module / VFS layer: intercepts file system calls and forwards remote ones over the network.
  • File service: operations on individual files (read, write, create, delete).
  • Directory service: maps human-readable names to file identifiers (name resolution).
  • Server module: stores file data and metadata on disk.

Desirable properties: transparency, concurrent file sharing, replication, fault tolerance, caching for performance, and security.

Network File System (NFS)

NFS (developed by Sun) is the classic example of a DFS using a client–server, stateless design.

Design / working:

  • Architecture: Built on the Virtual File System (VFS) interface, so NFS-mounted remote directories look identical to local ones. A client mounts a remote directory into its local tree.
  • Communication: Uses Remote Procedure Calls (RPC) with External Data Representation (XDR) for machine-independent data, originally over UDP.
  • Statelessness (NFSv2/v3): The server keeps no client state between requests; every request (via a file handle) is self-contained. This makes crash recovery trivial — the client simply retries after the server reboots. Operations are idempotent.
  • Caching: Clients cache file blocks and attributes to reduce traffic, validated by timestamps; this gives only approximate consistency.
  • Consistency: NFS provides close-to-open / weak consistency rather than strict UNIX semantics.

Andrew File System (AFS) — alternative

  • Designed for large scale; uses whole-file caching on the client's local disk.
  • On open, the entire file is fetched to the client; reads/writes are local; on close the modified file is written back.
  • Uses callbacks: the server promises to notify a client if a cached file changes, giving better scalability than NFS's repeated validation.
  • Provides session semantics and is stateful, reducing server load and supporting thousands of clients.
distributed-fs
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Explain the two-phase commit protocol for distributed transactions.

Two-Phase Commit (2PC) Protocol

2PC is an atomic commit protocol ensuring that a distributed transaction either commits at all participating sites or aborts at all of them. One process is the coordinator; the others are participants.

Phase 1 — Voting (Prepare) phase

  1. Coordinator sends a VOTE-REQUEST (prepare) to all participants.
  2. Each participant that can commit writes its data to stable log, replies VOTE-COMMIT (ready); otherwise it replies VOTE-ABORT.

Phase 2 — Decision (Commit) phase 3. If all votes are COMMIT, the coordinator logs and sends GLOBAL-COMMIT to all; if any vote is ABORT (or a timeout occurs), it sends GLOBAL-ABORT. 4. Each participant carries out the decision (commit/abort) and sends an ACK.

Drawback: 2PC is a blocking protocol — if the coordinator crashes after participants voted COMMIT, participants must wait (block) holding locks until it recovers. Three-phase commit (3PC) reduces this blocking.

transaction
5short5 marks

What is a consistency model? Explain sequential consistency.

Consistency Model

A consistency model is a contract between a distributed (replicated) data store and the processes using it. It specifies the guarantees the store makes about the order in which writes become visible and what value a read may return, given that data is replicated. It tells programmers what to expect when reads and writes are issued concurrently to different replicas.

Sequential Consistency

Defined by Lamport, sequential consistency requires that:

The result of any execution is the same as if the operations of all processes were executed in some single sequential (total) order, and the operations of each individual process appear in this sequence in the order specified by its program.

Key points:

  • All processes see the same interleaving of all operations.
  • A process's own operations keep their program order.
  • It does not require operations to follow real (wall-clock) time — only a consistent global order that everyone agrees on.

Example: If P1P_1 writes x=1x=1 and P2P_2 writes x=2x=2, every process must observe the same final order (e.g., all see 11 then 22, or all see 22 then 11) — never P3P_3 seeing 1,21,2 while P4P_4 sees 2,12,1.

consistency
6short5 marks

Explain naming and name resolution in distributed systems.

Naming and Name Resolution

Naming is the use of names to identify and refer to entities (hosts, files, processes, services, users) in a distributed system. A name is a string referring to an entity; an address is the name of the access point where the entity can be reached; an identifier uniquely names exactly one entity and never changes. Names provide a layer of transparency so entities can move without breaking references.

Types of names: human-friendly names (e.g., www.tu.edu.np), addresses, and pure identifiers.

Name resolution is the process of mapping a name to the entity (or its address) it refers to. It is performed by a name space organized as a hierarchical tree of directory nodes (e.g., the DNS hierarchy).

  • Name space: a labelled, directed graph; a path from the root spells out the name.
  • Resolution mechanisms:
    • Iterative resolution: the client queries the root server, gets a referral to the next server, and queries each server itself.
    • Recursive resolution: the client queries one server, which queries the next on the client's behalf and returns the final answer (allows better caching, more server load).

Example (DNS): Resolving mail.tu.edu.np proceeds root → .np.edu.nptu.edu.np, each step returning the address of the next-level name server until the host's IP is found. Caching of results speeds up repeated lookups.

naming
7short5 marks

Differentiate between processes and threads in the context of distributed systems.

Processes vs Threads

A process is an independent program in execution with its own address space and resources. A thread is a lightweight unit of execution within a process that shares the process's address space, code, and open files with sibling threads. In distributed systems, multithreaded servers (and clients) are used to handle many requests concurrently and to overlap communication latency with computation.

AspectProcessThread
Address spaceSeparate, privateShared with peer threads
Creation/switch costHighLow (lightweight)
CommunicationIPC / message passingShared memory (fast)
Isolation/protectionStrong (one crash isolated)Weak (a bad thread can crash whole process)
ResourcesOwn resourcesShares process resources

Relevance to distributed systems: A multithreaded server dedicates one thread per client request so that a thread blocked on remote I/O does not stall others, improving throughput. Threads make clients more responsive (e.g., one thread per RPC) and exploit multiprocessors, whereas processes are used when strong isolation/fault containment is needed across machines.

threadprocess
8short5 marks

What is a vector clock? How does it improve over Lamport's logical clock?

Vector Clock

A vector clock is a mechanism for capturing causality among events in a distributed system. Each of the nn processes maintains a vector VV of nn integer counters, where Vi[j]V_i[j] is process PiP_i's knowledge of the number of events at process PjP_j.

Rules:

  1. Before each internal/send event, PiP_i increments its own entry: Vi[i]=Vi[i]+1V_i[i] = V_i[i] + 1.
  2. Each message carries the sender's vector timestamp.
  3. On receiving a message with timestamp VmsgV_{msg}, the receiver sets Vi[k]=max(Vi[k],Vmsg[k])V_i[k] = \max(V_i[k], V_{msg}[k]) for all kk, then increments Vi[i]V_i[i].

Comparison: VaVbV_a \le V_b iff Va[k]Vb[k]V_a[k] \le V_b[k] for all kk; event aba \to b (causally precedes) iff Va<VbV_a < V_b. If neither VaVbV_a \le V_b nor VbVaV_b \le V_a, the events are concurrent.

Improvement over Lamport's Logical Clock

Lamport's scalar clock guarantees only one direction: abL(a)<L(b)a \to b \Rightarrow L(a) < L(b). The converse is not trueL(a)<L(b)L(a) < L(b) does not imply aa happened before bb, so a scalar clock cannot detect concurrency or causality precisely.

Vector clocks fix this: Va<Vb    abV_a < V_b \iff a \to b (biconditional). Thus vector clocks can exactly determine whether two events are causally related or concurrent, at the cost of O(n)O(n) storage per timestamp instead of O(1)O(1).

vector-clock
9short5 marks

What is replication? List the advantages and challenges of data replication.

Replication

Replication is the technique of keeping multiple copies (replicas) of the same data or service on different nodes of a distributed system so that clients can access whichever copy is most convenient, and the system can keep working even if some copies fail.

Advantages

  • Improved reliability / fault tolerance: if one replica fails, others continue serving — no single point of failure.
  • Higher availability: data remains accessible despite node or network failures.
  • Better performance: clients read from a nearby replica, reducing latency and network traffic.
  • Increased scalability: read load is spread across many replicas (load balancing).

Challenges

  • Consistency maintenance: every update must be propagated to all replicas; keeping them in agreement is hard (the core problem).
  • Update propagation cost: extra bandwidth and overhead to synchronize replicas, which can reduce write performance.
  • Concurrency / conflict resolution: simultaneous writes at different replicas can conflict and must be ordered or reconciled.
  • Trade-off (CAP): under network partitions one must trade consistency against availability.
  • Replica placement and management complexity.
replication
10short5 marks

Write short notes on the Network File System (NFS).

Network File System (NFS)

NFS, developed by Sun Microsystems, is a widely used distributed file system that lets a client transparently access files on a remote server as though they were local.

Key features:

  • Client–server model built on the Virtual File System (VFS) interface, so remote files look identical to local files.
  • Mounting: a client mounts a remote directory into its local directory tree, providing location transparency.
  • RPC + XDR: all operations are carried out via Remote Procedure Calls using External Data Representation for machine-independent data exchange (originally over UDP).
  • Stateless server (NFSv2/v3): the server keeps no per-client state; each request carries a self-describing file handle. This makes crash recovery simple — clients just retry after a server reboot — and operations are idempotent.
  • Caching: clients cache data and attributes (validated by timestamps) to improve performance, giving weak (close-to-open) consistency.
  • Later versions (NFSv4) are stateful, support compound operations, strong security, and work over TCP.

Advantages: transparency, simple recovery, OS-independence. Limitation: weak consistency and security in early versions.

distributed-fs
11short5 marks

Explain the security challenges in distributed systems and discuss authentication.

Security Challenges in Distributed Systems

Because components communicate over open, untrusted networks, distributed systems face several security challenges:

  • Confidentiality: preventing eavesdropping / disclosure of data on the network.
  • Integrity: ensuring messages and data are not altered (tampering).
  • Authentication: verifying the true identity of communicating parties (preventing spoofing/masquerade).
  • Authorization / access control: ensuring an authenticated party may only do what it is permitted.
  • Availability: resisting denial-of-service (DoS) attacks.
  • Non-repudiation, replay attacks, key management, and trust across multiple administrative domains.

Authentication

Authentication is the process of verifying the identity of a user, process, or host before granting access. Mechanisms:

  • Password / shared-secret based.
  • Symmetric-key authentication: parties prove knowledge of a shared key, often via a trusted Key Distribution Center (KDC). The Needham–Schroeder protocol and Kerberos use a KDC that issues tickets, with nonces and timestamps to defeat replay attacks.
  • Public-key authentication: a party proves possession of a private key (digital signatures / challenge–response), with public keys vouched for by certificates from a Certificate Authority.

Authentication is usually combined with session key establishment so that subsequent messages are encrypted and integrity-protected.

security
12short5 marks

What is scalability? Explain the techniques used to achieve scalability in distributed systems.

Scalability

Scalability is the ability of a distributed system to handle growth — in the number of users/requests, in geographic spread, or in administrative size — without a significant loss of performance or manageability. The three dimensions are:

  • Size scalability: more users and resources can be added.
  • Geographical scalability: users and resources may be far apart.
  • Administrative scalability: the system spans many independent organizations.

Techniques to Achieve Scalability

  1. Hiding communication latencies: use asynchronous communication and move computation to the client so it does not block waiting for remote replies.
  2. Replication: keep multiple copies of data/services to balance load and place data near users (improves both performance and availability).
  3. Caching: a special form of replication on the client side that reduces repeated remote access (e.g., DNS, web caches).
  4. Distribution / partitioning: split a component or data set into parts spread across machines (e.g., DNS splits the name space into zones; databases use sharding).
  5. Decentralization: avoid centralized servers, data, and algorithms; use distributed/peer-to-peer structures and decentralized routing to remove bottlenecks and single points of failure.

Replication and caching introduce a consistency cost, which is the main trade-off when scaling.

scalability

Frequently asked questions

Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2075?
The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2075 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Distributed System (BSc CSIT, CSC462) 2075 paper come with solutions?
Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2075 paper?
The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Distributed System (BSc CSIT, CSC462) past paper free?
Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.