Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Explain inter-process communication in distributed systems. Discuss the Remote Procedure Call (RPC) mechanism in detail, including parameter passing, binding and marshalling.

Inter-Process Communication (IPC) in Distributed Systems

In a distributed system there is no shared memory, so cooperating processes on different machines must communicate by passing messages over a network. IPC provides the mechanisms (message passing, request/reply, group communication) that let these processes exchange data and synchronize. The two foundational models are:

  • Message-Oriented Communication — send/receive primitives (synchronous/blocking or asynchronous/non-blocking; persistent or transient).
  • Request-Reply (client/server) — the basis of RPC and RMI, where a client sends a request and blocks for a reply.

Remote Procedure Call (RPC)

RPC (Birrell & Nelson, 1984) lets a program call a procedure located on a remote machine as if it were a local call, hiding the underlying message passing. It achieves access transparency.

How an RPC works (steps)

1. Client calls the client stub (a local procedure).
2. Client stub marshals (packs) parameters into a message; OS sends it.
3. Message travels over the network to the server machine.
4. Server OS hands the message to the server stub.
5. Server stub unmarshals parameters and calls the actual server procedure.
6. Server executes and returns the result to the server stub.
7. Server stub marshals the result; OS sends the reply back.
8. Client OS gives the message to the client stub; it unmarshals the result
   and returns to the caller.

The stubs are generated automatically from an Interface Definition Language (IDL) specification.

Parameter Passing

Because client and server do not share an address space:

  • Value parameters are simply copied into the request message — easy.
  • Reference parameters (pointers) cannot be passed directly. Common solutions: copy/restore (copy the referenced data in, copy modified data back) or flattening the data structure.
  • Heterogeneity of data representation (byte order, float format) is handled by a common format such as XDR.

Marshalling

Marshalling is packing procedure parameters (and results) into a standard, machine-independent message format; unmarshalling is the reverse on the receiving side. It resolves differences in byte ordering (big-endian vs little-endian) and data layout.

Binding

Binding is how a client locates and connects to the correct server before the call. A binder / directory service (e.g., a portmapper or name server):

  • The server registers its service (name, version, handle) with the binder.
  • The client queries the binder to look up the server's address/port (this is dynamic binding).
  • Binding can be static (fixed at compile time) or dynamic (resolved at runtime, supporting load balancing and fault tolerance).

RPC Semantics under Failure

  • At-least-once (retransmit on timeout; may execute twice).
  • At-most-once (duplicate filtering; executes 0 or 1 times).
  • Exactly-once (ideal but hard with failures).

Conclusion: RPC provides a clean, transparent abstraction for IPC, with stubs, marshalling, and binding hiding the complexity of network communication.

rpccommunication
2long10 marks

What are election algorithms in distributed systems? Explain the Bully algorithm and the Ring algorithm with suitable examples.

Election Algorithms in Distributed Systems

Many distributed algorithms (mutual exclusion, replica coordination, commit) need one process to act as coordinator. An election algorithm chooses a unique coordinator after the current one fails, and ensures every process agrees on which one is elected. The convention is that the process with the highest ID (priority number) wins.

Assumptions: each process has a unique number; processes can fail and recover; messages are eventually delivered.

1. Bully Algorithm (Garcia-Molina)

When a process PP notices the coordinator is not responding, it starts an election:

  1. PP sends an ELECTION message to every process with a higher ID.
  2. If no one with a higher ID replies, PP wins and becomes coordinator; it sends a COORDINATOR message to all.
  3. If a higher process replies with OK, PP drops out; that higher process now takes over the election.
  4. A recovered higher-ID process can always restart an election and "bully" the current coordinator into stepping down.

Example: Processes 1,2,3,4,5,6,7; coordinator 7 crashes. Process 4 notices and sends ELECTION to 5, 6, 7. 5 and 6 reply OK and start their own elections. Eventually 6 finds no higher process responding and announces itself as the new coordinator. Cost: O(n2)O(n^2) messages in the worst case.

2. Ring Algorithm

Processes are arranged in a logical ring (each knows its successor).

  1. A process that detects coordinator failure builds an ELECTION message containing its own ID and sends it to its successor.
  2. Each process appends its own ID to the message and forwards it around the ring (skipping dead nodes).
  3. When the message returns to the initiator, it sees all active IDs; the highest ID is the new coordinator.
  4. The initiator circulates a COORDINATOR message naming the winner, which everyone records.

Example: Ring 1→2→3→4→5→6→7→1, coordinator 7 crashes. Process 2 starts election: message [2] → 3 makes [2,3] → ... → reaches initiator as [2,3,4,5,6]. Highest = 6, so 6 is coordinator. Cost: about 2n2n messages.

Comparison

FeatureBullyRing
TopologyAny (needs all IDs)Logical ring
Messages (worst)O(n2)O(n^2)O(n)O(n)
Higher processBullies lowerDetected via traversal

Both guarantee that exactly one (the highest-numbered live) process is finally elected.

election
3long10 marks

What is data replication in distributed systems? Explain different consistency models and replica management techniques used to maintain consistency.

Data Replication in Distributed Systems

Data replication means keeping multiple copies (replicas) of the same data on different machines. Its main goals are:

  • Reliability / fault tolerance — survive node crashes.
  • Performance — serve reads from a nearby replica, reducing latency and balancing load.

The central problem is consistency: when one copy is updated, all others must be kept consistent, which costs bandwidth and coordination. This is the classic trade-off captured by the CAP theorem (Consistency, Availability, Partition-tolerance — pick two).

Consistency Models

A consistency model is a contract between the data store and processes specifying what guarantees reads give after writes.

Data-centric models:

  • Strict consistency — a read returns the most recent write instantly (needs absolute global time; impractical).
  • Sequential consistency — all processes see all operations in the same total order, consistent with each process's program order.
  • Causal consistency — writes that are causally related are seen in the same order by all; concurrent writes may differ.
  • FIFO / PRAM consistency — writes by a single process are seen in order; different processes' writes may interleave.

Client-centric models: weaker guarantees for a single client across replicas — eventual consistency, monotonic reads, monotonic writes, read-your-writes, writes-follow-reads.

Replica Management Techniques

(a) Replica placement — permanent replicas, server-initiated replicas (dynamic, based on demand/load), and client-initiated replicas (caches).

(b) Update propagation:

  • State vs operation transfer — send the new value, or send the update operation.
  • Push (server-initiated, good for high read/write ratio) vs Pull (client-initiated caches).

(c) Consistency protocols (how updates are coordinated):

  • Primary-based (master-slave): all writes go to a single primary, which orders and propagates them (remote-write or local-write variants).
  • Replicated-write: writes go to multiple replicas — active replication (operation sent to all) or quorum-based voting, where a write needs NWN_W and a read needs NRN_R replicas with NR+NW>NN_R + N_W > N and NW>N/2N_W > N/2 to avoid stale/conflicting reads.

Conclusion: Replication improves availability and performance, but requires choosing an appropriate consistency model and protocol to balance consistency against cost.

replicationconsistency
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Explain the two-phase commit protocol for distributed transactions.

Two-Phase Commit (2PC) Protocol

2PC is an atomic-commitment protocol that guarantees a distributed transaction either commits at all participating sites or aborts at all of them. It uses one coordinator and several participants.

Phase 1 — Voting (Prepare) Phase

  1. The coordinator sends a PREPARE (can-commit?) message to every participant.
  2. Each participant that can commit writes a ready/prepared record to its log and replies VOTE-COMMIT; otherwise it replies VOTE-ABORT.

Phase 2 — Decision (Commit) Phase

  1. If all participants vote COMMIT, the coordinator logs GLOBAL-COMMIT and sends COMMIT to all; if any voted ABORT (or timed out), it sends GLOBAL-ABORT.
  2. Each participant carries out the decision (commit or roll back), writes it to its log, and sends an ACK.

Properties

  • Guarantees atomicity across sites.
  • Drawback — blocking: if the coordinator crashes after participants vote COMMIT but before the decision arrives, participants are left uncertain (blocked) holding locks. Three-Phase Commit (3PC) adds a pre-commit phase to reduce this blocking.
transaction
5short5 marks

What is a consistency model? Explain sequential consistency.

Consistency Model

A consistency model is a contract between a (replicated/shared) data store and its processes: if the processes obey certain rules when reading and writing data, the store promises that reads will return results according to specific guarantees. It defines which order and how recent values returned by reads must be.

Sequential Consistency

Defined by Lamport, sequential consistency requires:

The result of any execution is the same as if the operations of all processes were executed in some sequential (total) order, and the operations of each individual process appear in this sequence in the order specified by its program.

Key points:

  • There exists one global interleaving of all read/write operations that all processes agree on.
  • Each process's own operations keep their program order.
  • It does not require operations to follow real-time order (it is weaker than strict consistency) — only that everyone sees the same order.

Example: Two processes write x=1x=1 and x=2x=2. Sequential consistency allows either ordering, but every process must observe the same chosen order (all see 1 then 2, or all see 2 then 1) — not different orders.

consistency
6short5 marks

Explain naming and name resolution in distributed systems.

Naming and Name Resolution in Distributed Systems

Naming is the mechanism for identifying entities (hosts, files, services, users, objects) in a distributed system. A name is a string used to refer to an entity; it is bound to the entity's address (access point) and possibly an identifier (unique, never reused).

Types of names:

  • Human-friendly names — readable strings (e.g., www.tu.edu.np).
  • Addresses — location of an access point (e.g., IP 192.168.1.1:80).
  • Identifiers — pure, unique, location-independent references.

Name Resolution

Name resolution is the process of mapping a name to the entity (address) it refers to using a name service. Each name space is organized as a hierarchy (a naming graph of directory nodes), and resolution traverses it.

Resolution strategies:

  • Iterative resolution — the client contacts servers one by one, following referrals down the hierarchy.
  • Recursive resolution — a name server resolves the rest of the name on the client's behalf and returns the final result (enables caching, but more load on servers).

Example — DNS: the hierarchical Domain Name System resolves www.tu.edu.np by querying the root, then .np, then .edu.np, then tu servers, returning the IP address. Caching at each level speeds up future lookups.

naming
7short5 marks

Differentiate between processes and threads in the context of distributed systems.

Processes vs Threads in Distributed Systems

A process is an independent program in execution with its own address space and resources. A thread is a lightweight unit of execution within a process that shares the process's address space with sibling threads but has its own stack and program counter.

AspectProcessThread
Address spaceSeparate, privateShared with peer threads
Creation/switch costHeavy (involves OS, new space)Light (cheap context switch)
CommunicationIPC / message passing (costly)Shared memory (fast, direct)
Isolation/protectionStrong — one crash isolatedWeak — a fault can crash whole process
ResourcesOwns resourcesShares process resources

Relevance to Distributed Systems

  • Multithreaded servers handle many client requests concurrently: while one thread blocks on I/O or a remote call, others keep working, improving throughput and hiding network latency.
  • Multithreaded clients can hide communication delays (e.g., a browser fetching multiple objects in parallel).
  • Threads are cheaper than spawning a process per request, which is critical for the scalability of distributed servers.
threadprocess
8short5 marks

What is a vector clock? How does it improve over Lamport's logical clock?

Vector Clocks

A vector clock is a mechanism for capturing causality among events in a distributed system. Each of the NN processes maintains a vector VV of NN integers, where Vi[j]V_i[j] is process ii's knowledge of the event count at process jj.

Rules:

  1. Before an internal/send event, PiP_i increments its own entry: Vi[i]Vi[i]+1V_i[i] \leftarrow V_i[i] + 1.
  2. PiP_i attaches ViV_i to every message it sends.
  3. On receiving a message with timestamp VmV_m, PjP_j updates element-wise: Vj[k]max(Vj[k],Vm[k])V_j[k] \leftarrow \max(V_j[k], V_m[k]) for all kk, then increments Vj[j]V_j[j].

Comparison rule: event aba \to b (a happened-before b) iff V(a)<V(b)V(a) < V(b), i.e. V(a)[k]V(b)[k]V(a)[k] \le V(b)[k] for all kk and strictly less for at least one. If neither V(a)V(b)V(a) \le V(b) nor V(b)V(a)V(b) \le V(a), the events are concurrent.

Improvement over Lamport's Logical Clock

Lamport's scalar clock guarantees only: if aba \to b then C(a)<C(b)C(a) < C(b). The converse does not holdC(a)<C(b)C(a) < C(b) does not imply aba \to b, so scalar clocks cannot detect concurrency or tell whether two events are truly causally related.

Vector clocks fix this: V(a)<V(b)V(a) < V(b) if and only if aba \to b. Thus they precisely characterize the happened-before relation and let us distinguish causal ordering from concurrent events — at the cost of O(N)O(N) space per timestamp.

vector-clock
9short5 marks

What is replication? List the advantages and challenges of data replication.

Replication

Replication is the technique of maintaining multiple copies (replicas) of the same data or service on different nodes of a distributed system, kept consistent so clients can use any copy.

Advantages

  • Increased reliability / fault tolerance — if one replica fails, others continue serving (no single point of failure).
  • Improved performance — reads are served from a nearby replica, reducing latency.
  • Load balancing / scalability — requests are spread across replicas, increasing throughput.
  • Higher availability — data remains accessible during partial failures or network partitions.

Challenges

  • Consistency — keeping all replicas up to date when data changes is the core difficulty; updates must propagate correctly.
  • Concurrency control — concurrent updates at different replicas can cause conflicts that must be resolved.
  • Update propagation overhead — extra network bandwidth and coordination cost.
  • CAP trade-off — under network partitions one must choose between consistency and availability.
  • Storage cost — multiple copies consume more space.

Summary: Replication boosts reliability, performance, and availability, but the price is the complexity of maintaining consistency among copies.

replication
10short5 marks

Write short notes on the Network File System (NFS).

Network File System (NFS)

NFS, developed by Sun Microsystems (1984), is a widely used distributed file system that lets a client machine access files on a remote server over a network as if they were local, providing access and location transparency.

Key Features

  • Client/server model built on RPC and Sun's XDR (external data representation) for machine independence.
  • Mounting: a remote directory is mounted into the client's local namespace via the mount protocol, after which remote files look like local files (a Virtual File System / VFS layer redirects calls).
  • Stateless server (classic NFSv2/v3): the server keeps no record of past client requests — each request (read at offset, write block) is self-contained. This makes crash recovery simple (a rebooted server can resume immediately) but complicates locking. (NFSv4 is stateful.)
  • Caching: clients cache file blocks and attributes to improve performance, using timestamps for validation; this can cause temporary inconsistency.

Advantages & Limitations

  • Advantages: transparency, centralized administration, hardware/OS heterogeneity support.
  • Limitations: weak consistency under caching, the stateless design limits file locking, and performance depends on the network.
distributed-fs
11short5 marks

Explain the security challenges in distributed systems and discuss authentication.

Security Challenges in Distributed Systems

Because components communicate over open, untrusted networks and span multiple administrative domains, distributed systems face many threats:

  • Eavesdropping (interception): attackers read messages in transit — countered by encryption (confidentiality).
  • Tampering / modification: messages altered en route — countered by message integrity (hashes/MACs, digital signatures).
  • Masquerading / spoofing: an entity pretends to be another — countered by authentication.
  • Replay attacks: old valid messages resent — countered by nonces/timestamps.
  • Denial of Service (DoS): overloading resources to deny access.
  • Access control / authorization across domains, and non-repudiation.

The core security goals are Confidentiality, Integrity, and Availability (CIA), plus authentication and authorization.

Authentication

Authentication is the process of verifying the claimed identity of a user, host, or service before granting access. Mechanisms include:

  • Password / shared-secret based.
  • Symmetric-key authentication — e.g., Kerberos, which uses a trusted Key Distribution Center (KDC) issuing tickets so parties prove identity without sending passwords over the network.
  • Public-key / digital certificates — a Certification Authority (CA) vouches for a public key; challenge-response with private keys proves identity.
  • Mutual authentication — both client and server verify each other.

Authentication is usually the first step, followed by authorization (deciding what the authenticated entity may do).

security
12short5 marks

What is scalability? Explain the techniques used to achieve scalability in distributed systems.

Scalability

Scalability is the ability of a distributed system to handle growth — more users, data, or resources — without significant loss of performance or major redesign. It has three dimensions:

  • Size scalability — add more users/resources.
  • Geographical scalability — components spread over large distances.
  • Administrative scalability — span many independent administrative domains.

Scalability is limited by centralized services, centralized data, and centralized algorithms (single points of bottleneck/failure).

Techniques to Achieve Scalability

1. Hiding communication latencies (asynchrony): use asynchronous / non-blocking communication and move work to the client so it does not wait idly for remote replies (e.g., client-side form validation).

2. Distribution: partition data and computation across many machines, each handling a portion of the namespace/load. Example: DNS splits the name space into zones served by different servers.

3. Replication and Caching: keep multiple copies of data/services close to clients to balance load and reduce latency. Caching is client-controlled replication. (Trade-off: must maintain consistency.)

4. Decentralized algorithms: avoid any machine having complete global state, avoid centralized coordination, and tolerate node failures — so no single bottleneck limits growth.

Summary: Distribution, replication/caching, and latency hiding, combined with decentralized algorithms, let distributed systems scale while keeping performance acceptable.

scalability

Frequently asked questions

Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2078?
The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2078 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Distributed System (BSc CSIT, CSC462) 2078 paper come with solutions?
Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2078 paper?
The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2078 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Distributed System (BSc CSIT, CSC462) past paper free?
Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.