Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What is distributed mutual exclusion? Explain the centralized, token-ring and Ricart-Agrawala (distributed) algorithms, comparing them in terms of message complexity and fault tolerance.

Distributed Mutual Exclusion

Mutual exclusion ensures that only one process at a time accesses a shared resource (critical section). In a distributed system there is no shared memory and no common clock, so a special-purpose algorithm must coordinate access using message passing. Requirements: safety (at most one process in the CS), liveness (requests eventually granted, no deadlock/starvation) and ideally fairness/ordering (requests served in causal/FIFO order).

1. Centralized Algorithm

One process is elected coordinator. To enter the CS a process sends a REQUEST to the coordinator; the coordinator replies GRANT if the CS is free, otherwise queues the request (no reply). On exit the process sends RELEASE, and the coordinator grants the next queued request.

  • Messages per CS entry: 3 (request, grant, release).
  • Fault tolerance: poor — the coordinator is a single point of failure and a bottleneck. If it crashes, mutual exclusion stops until a new coordinator is elected.

2. Token-Ring Algorithm

Processes are arranged in a logical ring. A single token circulates. Only the holder of the token may enter the CS; after exiting (or if it does not need the CS) it passes the token to its successor.

  • Safety guaranteed because only one token exists.
  • Messages per entry: between 1 and ∞ (a process may have to wait while the token circulates even when no one needs it).
  • Fault tolerance: a lost token (process crash) halts the system; mechanisms are needed to regenerate the token and bypass dead nodes.

3. Ricart–Agrawala (Distributed) Algorithm

Uses Lamport logical timestamps and total ordering. To enter, a process multicasts REQUEST(timestamp, id) to all N1N-1 other processes and waits for an OK/REPLY from every one of them.

When a process receives a request:

  • if it is not interested, it replies OK immediately;
  • if it is in the CS, it queues the request;
  • if it is also requesting, it compares timestamps — the lower timestamp wins (ties broken by process id); the loser replies OK, the winner queues.

On exit, the process replies OK to all queued requests.

  • Messages per entry: 2(N1)2(N-1) (N−1 requests + N−1 replies).
  • Fault tolerance: worse in practice — the crash of any process is interpreted as permission denied, so all participants are points of failure unless failures are detected.

Comparison

AlgorithmMessages / entryDelay (msg times)Fault toleranceProblem
Centralized32Coordinator crashBottleneck / SPOF
Token Ring1 → ∞0 → N−1Lost token, dead nodeWasted messages, latency
Ricart–Agrawala2(N−1)2(N−1)Crash of any nodeHigh message cost, all nodes critical

Conclusion: the centralized algorithm is simplest and cheapest in messages but has a single point of failure; the token ring avoids starvation and is simple but can waste bandwidth and is sensitive to token loss; Ricart–Agrawala is fully distributed and fair but costly and the least fault-tolerant since every node must respond.

mutual-exclusion
2long10 marks

Explain inter-process communication in distributed systems. Discuss the Remote Procedure Call (RPC) mechanism in detail, including parameter passing, binding and marshalling.

Inter-Process Communication (IPC) in Distributed Systems

Since distributed processes do not share memory, all cooperation occurs through message passing over the network. IPC can be:

  • Message-oriented: send/receive primitives, which may be synchronous (blocking) or asynchronous (non-blocking), and persistent or transient.
  • Request–reply (RPC/RMI): higher-level abstraction making remote calls look like local procedure calls.
  • Group communication / multicast: delivering a message to a set of receivers (used for replication).

IPC relies on transport protocols (TCP/UDP), addressing, and (often) middleware that hides heterogeneity.

Remote Procedure Call (RPC)

RPC (Birrell & Nelson, 1984) lets a client invoke a procedure on a remote server as if it were local, hiding all communication details. This is access transparency.

Mechanism / Control Flow

  1. Client calls a local client stub.
  2. The stub marshals the parameters into a message and asks the OS to send it.
  3. The message travels to the server machine.
  4. The server stub unmarshals the parameters and calls the actual procedure.
  5. The procedure executes and returns results to the server stub.
  6. The server stub marshals the result and sends it back.
  7. The client stub unmarshals the result and returns it to the caller.

Steps 2–7 are invisible to the programmer.

Parameter Passing

  • Pass by value: simple data is copied into the message — straightforward.
  • Pass by reference: pointers are meaningless across address spaces, so it is emulated by copy/restore (send the value, return the modified value) or by passing object references in RMI.
  • Heterogeneity: machines may differ in byte order (big/little-endian) and data formats, so a common external representation (e.g., XDR, CORBA CDR, Protocol Buffers) is used.

Marshalling

Marshalling is packing the procedure parameters/results into a standard message format for transmission; unmarshalling reconstructs them on the other side. It handles type encoding and byte-order conversion so heterogeneous machines interoperate.

Binding

Binding locates the server and establishes the client↔server association. The server registers its service (interface, host, port) with a binder / directory service (e.g., a portmapper or registry). At call time the client stub queries the binder to obtain the server's endpoint. Binding may be static (compile-time) or dynamic (run-time lookup).

RPC Failure Semantics

Unlike local calls, RPCs can fail (lost messages, server/client crash). Common semantics: at-least-once (retry, may execute twice), at-most-once (duplicate filtering), and exactly-once (ideal, hard to guarantee). Idempotent operations tolerate retries safely.

Summary: RPC hides communication behind stubs; marshalling handles data representation, binding locates the server, and parameter passing must cope with separate address spaces and machine heterogeneity.

rpccommunication
3long10 marks

What are election algorithms in distributed systems? Explain the Bully algorithm and the Ring algorithm with suitable examples.

Election Algorithms

An election algorithm chooses a single process to act as coordinator (e.g., the central lock manager, sequencer or master). Any process may start an election; when it finishes every non-crashed process must agree on the new coordinator, usually the one with the highest process id (or priority). Elections are needed after the current coordinator crashes.

1. Bully Algorithm (Garcia-Molina)

Assumes each process knows the ids of all others and can message them; uses timeouts to detect crashes.

When a process P notices the coordinator has failed (or it just recovered):

  1. P sends an ELECTION message to all processes with higher ids.
  2. If no higher process answers within the timeout, P wins and sends a COORDINATOR message to all lower processes announcing itself.
  3. If a higher process answers with OK, P drops out; the higher process now holds its own election.
  4. Eventually the highest-id live process becomes coordinator and announces it.

Example: processes 1..7, with 7 the old coordinator. 7 crashes; 4 notices and sends ELECTION to 5,6,7. 5 and 6 reply OK and hold their own elections. 6 sends ELECTION to 7 (no reply) and finds itself highest, so 6 announces itself coordinator to all. The highest survivor 'bullies' the rest — hence the name.

  • Worst-case messages: O(N2)O(N^2).

2. Ring Algorithm

Processes form a logical ring; each knows its successor. No token is used.

  1. A process that detects coordinator failure builds an ELECTION message containing its own id and sends it to its successor (skipping dead nodes).
  2. Each process appends its id and forwards the message around the ring.
  3. When the message returns to the initiator (it sees its own id in the list), the highest id in the list is the new coordinator.
  4. The initiator circulates a COORDINATOR message naming the winner so all processes update.

Example: ring 1→2→3→4→5→6→7→1, coordinator 7 crashes. Process 2 starts an election with {2}; it becomes {2,3,4,5,6} as it travels (7 skipped) and returns to 2. Max = 6, so 6 is the new coordinator, announced via a second pass.

  • Messages: about 2N2N (NN for the election list + NN for the announcement).

Comparison

BullyRing
Topology knowledgeall process idssuccessor only
Messages (worst)O(N2)O(N^2)2N\approx 2N
Resulthighest-id live processhighest id in ring list

Both guarantee that all live processes agree on the same coordinator after the election completes.

election
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Explain the two-phase commit protocol for distributed transactions.

Two-Phase Commit (2PC) Protocol

2PC is an atomic commitment protocol ensuring that a distributed transaction either commits at all participating sites or aborts at all of them, despite the data being spread across multiple nodes. One node acts as coordinator; the others are participants (cohorts).

Phase 1 — Voting (Prepare) Phase

  1. Coordinator sends PREPARE (vote-request) to all participants.
  2. Each participant that can commit writes its updates and a ready/prepare record to its log, then replies VOTE-COMMIT (Yes). If it cannot commit, it replies VOTE-ABORT (No) and aborts locally.

Phase 2 — Decision (Commit) Phase

  1. If all participants voted Yes, the coordinator logs commit and sends GLOBAL-COMMIT to all; if any voted No (or timed out), it logs abort and sends GLOBAL-ABORT.
  2. Each participant carries out the decision (commit or abort), logs it, and returns an ACK.
  3. The coordinator completes once all ACKs are received.

Properties and Drawback

  • Guarantees atomicity of the distributed transaction.
  • Blocking protocol: if the coordinator crashes after participants voted Yes but before the decision is delivered, participants are stuck in an uncertain state holding locks until it recovers. 3PC addresses this with an extra phase.

Prepare → vote → global-commit/abort → ack is the key sequence.

transaction
5short5 marks

What is a consistency model? Explain sequential consistency.

Consistency Model

A consistency model is a contract between a (replicated/shared) data store and its processes: if the processes obey certain rules when reading and writing, the store promises to make the results of read/write operations appear in a well-defined, predictable order. Models range from strict (strongest, every read returns the most recent write — only theoretical) to weak/eventual (replicas converge over time). The model defines what value a read may legally return.

Sequential Consistency (Lamport)

A data store is sequentially consistent if the result of any execution is the same as if the operations of all processes were executed in some single sequential (interleaved) order, and the operations of each individual process appear in this sequence in the order specified by its program.

Key points:

  • There is one global ordering of all reads/writes that all processes observe.
  • Program order per process is preserved, but operations from different processes may be interleaved in any consistent way.
  • It does not require ordering by real (wall-clock) time — a write need not be seen instantly, only that everyone agrees on the same total order.

Example: if P1 writes x=1 then P2 writes x=2, all processes must read these in the same order (everyone sees 1 then 2, or everyone sees 2 then 1) — not some seeing one order and others the opposite.

It is weaker than strict consistency (no real-time guarantee) but stronger than causal consistency.

consistency
6short5 marks

Explain naming and name resolution in distributed systems.

Naming and Name Resolution

Naming

A name is a string used to refer to an entity (host, file, user, service, object). Distributed systems use names to share resources, identify entities and locate them. Types:

  • Human-friendly names — readable (e.g., www.tu.edu.np).
  • Identifiers — unique, never reused, location-independent.
  • Addresses — the access point/location of an entity (e.g., an IP address + port); an entity may change address over time.

A name must ultimately be mapped to the address of an access point so the entity can be used.

Name Resolution

Name resolution is the process of looking up the address/value associated with a name using a name space and a name service. The name space is typically a hierarchical tree of directories (like the DNS hierarchy or a file-system path). Resolution traverses this tree from a known starting (root/context) node, resolving the name one component at a time.

Two strategies:

  • Iterative resolution: the resolver queries one name server, gets a referral to the next, and repeats itself.
  • Recursive resolution: each name server forwards the request to the next server on the client's behalf and returns the final result (allows better caching but loads servers).

Example — DNS: resolving mail.tu.edu.np proceeds root → .nptu.edu.np → host record. Caching of results greatly improves performance, and replication of servers (e.g., multiple root/TLD servers) provides scalability and fault tolerance.

naming
7short5 marks

Differentiate between processes and threads in the context of distributed systems.

Processes vs Threads

A process is an independent program in execution with its own address space, code, data, heap and OS resources. A thread is a lightweight unit of execution within a process that shares the process's address space and resources but has its own stack, registers and program counter.

AspectProcessThread
Address spaceSeparate, privateShared with peer threads
Creation costHigh (heavy context)Low (lightweight)
Context switchExpensive (TLB/MMU swap)Cheap (same address space)
CommunicationIPC (messages, pipes, sockets)Shared memory (fast, but needs synchronization)
Isolation / faultCrash isolated to that processA faulty thread can crash the whole process
ProtectionOS-protected from each otherNo protection between threads

Relevance to Distributed Systems

  • Multithreaded servers improve throughput: a server can spawn a worker thread per client request and overlap I/O (network/disk) with computation, hiding communication latency.
  • Multithreaded clients can hide network delay by issuing requests in parallel.
  • Processes give fault and security isolation across machines; threads give efficiency and concurrency within a node. Distributed middleware commonly combines both: multiple processes (possibly on different hosts) each running a thread pool.
threadprocess
8short5 marks

What is a vector clock? How does it improve over Lamport's logical clock?

Vector Clocks

A vector clock is a logical clock in which each of the NN processes keeps a vector VV of NN integers, where Vi[j]V_i[j] is process ii's current knowledge of the number of events at process jj. It captures causality between events precisely.

Rules

  1. Initially Vi=[0,0,,0]V_i = [0,0,\dots,0].
  2. Before any event (including a send), process ii increments its own component: Vi[i]+=1V_i[i] \mathrel{+}= 1.
  3. Each message carries the sender's vector timestamp VV.
  4. On receiving a message with timestamp VtV_t, the receiver updates component-wise maximum: Vi[k]=max(Vi[k],Vt[k])V_i[k] = \max(V_i[k], V_t[k]) for all kk, then increments Vi[i]V_i[i].

Comparison Rule

  • VaVbV_a \le V_b iff Va[k]Vb[k]V_a[k] \le V_b[k] for all kk.
  • Event aba \to b (a happened-before b) iff Va<VbV_a < V_b (≤ everywhere and < in at least one component).
  • If neither VaVbV_a \le V_b nor VbVaV_b \le V_a, the events are concurrent.

Improvement over Lamport's Logical Clock

Lamport clocks (single counter CC) guarantee only:

ab    C(a)<C(b)a \to b \;\Rightarrow\; C(a) < C(b)

but the converse is not true — C(a)<C(b)C(a) < C(b) does not imply aa happened before bb, so Lamport clocks cannot detect concurrency or causality.

Vector clocks give the biconditional:

ab    V(a)<V(b)a \to b \;\Longleftrightarrow\; V(a) < V(b)

so they can definitively determine whether two events are causally related or concurrent. The cost is O(N)O(N) storage and message overhead per timestamp instead of a single integer.

vector-clock
9short5 marks

What is replication? List the advantages and challenges of data replication.

Replication

Replication is the technique of maintaining copies (replicas) of the same data or service on multiple machines in a distributed system. The main reasons are to enhance reliability (tolerate failures) and improve performance (place data near clients, share load).

Advantages

  • Reliability / availability: if one replica fails, others continue to serve requests; redundancy masks crashes.
  • Performance / scalability: read requests are spread across replicas, reducing load on any single server.
  • Reduced latency: replicas placed close to clients (e.g., geographically, via CDNs) cut access time.
  • Fault tolerance: majority/quorum schemes let the system keep working and detect corrupt copies.

Challenges

  • Consistency: the central problem — keeping all replicas in agreement when updates occur; requires a consistency model (strict, sequential, causal, eventual) and update-propagation protocols.
  • Update propagation cost / bandwidth: every write must reach all replicas, adding overhead.
  • Conflict resolution: concurrent writes at different replicas may conflict and must be reconciled.
  • Trade-off (CAP): stronger consistency reduces availability/performance and vice versa; designers must choose between them under network partitions.

Summary: replication boosts availability, performance and fault tolerance, but the cost is the difficulty of keeping replicas consistent and resolving concurrent updates.

replication
10short5 marks

Write short notes on the Network File System (NFS).

Network File System (NFS)

NFS, developed by Sun Microsystems, is a widely used distributed file system that lets a client machine access files stored on a remote server as if they were local, providing access and location transparency. It follows a client–server model and is built on top of RPC with an external data representation (XDR) so heterogeneous machines interoperate.

Key Features

  • Mounting: a remote directory is mounted into the client's local namespace (e.g., at /mnt/data); afterwards files are accessed with ordinary system calls. The Virtual File System (VFS) layer routes operations to local or NFS code transparently.
  • Stateless server (NFS v2/v3): the server keeps no per-client state between requests. Each request (read/write) carries all needed information (file handle, offset). This makes crash recovery simple — after a server reboot, clients just retry; operations are idempotent. (NFSv4 became stateful to add locking, delegation and better security.)
  • Caching: clients cache file blocks and attributes to improve performance; weak cache consistency may allow temporary staleness.
  • Protocols: uses RPC over UDP/TCP, the Mount protocol to obtain a file handle, and the NFS protocol for file operations.

Pros and Cons

  • Advantages: transparent sharing, simple administration, portability, easy recovery (stateless).
  • Limitations: weaker consistency on concurrent writes, security relies on the client (in early versions), and performance bound by network latency.
distributed-fs
11short5 marks

Explain the security challenges in distributed systems and discuss authentication.

Security in Distributed Systems

Because data and computation are spread over an untrusted network, distributed systems face threats that standalone systems do not. The core security goals are Confidentiality, Integrity and Availability (CIA), plus authentication, authorization and non-repudiation.

Security Challenges

  • Insecure communication channels: messages travel over public networks and can be intercepted (eavesdropping), modified (tampering) or replayed.
  • Authentication of remote parties: a process must verify the identity of a remote, unseen entity.
  • No central control: many independent administrative domains with differing trust and policies.
  • Attacks: masquerading/spoofing, man-in-the-middle, replay attacks, denial-of-service (DoS) against availability, and key-distribution problems at scale.

Authentication

Authentication verifies the claimed identity of a communicating party (user, host or service) before granting access. Common mechanisms:

  • Shared-secret / symmetric authentication using a challenge–response protocol (prove knowledge of a secret without sending it in the clear).
  • Kerberos: a trusted Key Distribution Center (KDC) issues tickets so clients and servers can mutually authenticate using symmetric keys, avoiding password transmission.
  • Public-key authentication / digital signatures: the sender signs with its private key; the receiver verifies with the public key (often via certificates issued by a trusted CA).
  • Nonces/timestamps are included to defeat replay attacks.

Authentication is usually combined with encryption (confidentiality) and digital signatures/MACs (integrity), and is followed by authorization (access control) to enforce what an authenticated entity may do.

security
12short5 marks

What is scalability? Explain the techniques used to achieve scalability in distributed systems.

Scalability

Scalability is the ability of a distributed system to grow — in number of users, resources, or geographic spread — while continuing to perform well and remain manageable. A system is scalable if adding load (and resources) does not cause unacceptable loss of performance. Three dimensions:

  • Size scalability: add more users/resources.
  • Geographical scalability: users and resources far apart (across regions).
  • Administrative scalability: span many independent administrative domains.

Techniques to Achieve Scalability

  1. Hiding communication latencies (asynchrony): use asynchronous communication and move work to the client so it does not block waiting for the server; do other work while requests are outstanding.
  2. Distribution / partitioning: split a component or data set into parts spread over multiple machines (e.g., DNS partitions the name space into zones; database sharding). Each part handles a portion of the load.
  3. Replication and caching: keep multiple copies of data/services close to clients to spread load, increase availability and reduce latency. Caching is replication driven by client demand. The cost is keeping replicas consistent.
  4. Decentralization: avoid centralized servers, data and algorithms (single points become bottlenecks); use peer-to-peer and decentralized algorithms that need no global state and tolerate node failures.
  5. Load balancing: distribute requests across servers so no single node is overwhelmed.

Trade-off: replication and caching improve scalability but introduce consistency problems, so a balance must be struck.

scalability

Frequently asked questions

Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2080?
The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2080 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Distributed System (BSc CSIT, CSC462) 2080 paper come with solutions?
Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2080 paper?
The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Distributed System (BSc CSIT, CSC462) past paper free?
Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.