Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Explain clock synchronization in distributed systems. Discuss Lamport's logical clocks and vector clocks with examples, and describe how the happened-before relation orders events.

Clock Synchronization in Distributed Systems

In a distributed system there is no global clock; each node has its own physical clock that drifts at a slightly different rate. Many tasks (ordering events, make, distributed transactions, log timestamps) need a consistent notion of time. Clock synchronization is the problem of making the clocks of all nodes agree, either on real (physical) time or on a consistent order of events.

We distinguish:

  • Physical clock synchronization — bringing clock values numerically close to one another / to UTC (e.g. Cristian's algorithm, Berkeley algorithm, NTP).
  • Logical clock synchronization — we do not care about real time, only about agreeing on the order of events. This is enough for most coordination problems and is captured by the happened-before relation.

The Happened-Before Relation (\rightarrow)

Defined by Lamport, aba \rightarrow b ('a happened before b') is the smallest relation satisfying:

  1. If aa and bb are events in the same process and aa occurs before bb, then aba \rightarrow b.
  2. If aa is the sending of a message and bb is the receipt of that same message, then aba \rightarrow b.
  3. Transitivity: if aba \rightarrow b and bcb \rightarrow c then aca \rightarrow c.

If neither aba \rightarrow b nor bab \rightarrow a, the events are concurrent (aba \parallel b) — they are causally independent. Happened-before is a partial order, because concurrent events are unordered.

Lamport's Logical Clocks

Each process PiP_i keeps an integer counter CiC_i. Rules:

  • Before each event, PiP_i increments: Ci:=Ci+1C_i := C_i + 1.
  • A message mm carries the sender's timestamp ts(m)=Cits(m) = C_i.
  • On receiving mm, the receiver sets Cj:=max(Cj,ts(m))+1C_j := \max(C_j, ts(m)) + 1.

This guarantees the clock condition: if aba \rightarrow b then C(a)<C(b)C(a) < C(b).

Example. Processes P1,P2P_1, P_2.

P1:  a(1) ----m1----> 
P2:        b receives m1

P1P_1 stamps event aa with 11 and sends m1m_1 with ts=1ts=1. P2P_2, whose clock was at 00, sets C2=max(0,1)+1=2C_2 = \max(0,1)+1 = 2 for the receive event bb. Hence C(a)=1<C(b)=2C(a)=1 < C(b)=2, consistent with aba \rightarrow b.

Limitation: the converse does NOT hold — C(a)<C(b)C(a) < C(b) does not imply aba \rightarrow b. Lamport clocks cannot detect concurrency or causal dependence.

Vector Clocks

Vector clocks fix this limitation. Each process PiP_i keeps a vector Vi[1..n]V_i[1..n] (one entry per process), initially all zeros. Rules:

  • Before an internal/send event: Vi[i]:=Vi[i]+1V_i[i] := V_i[i] + 1.
  • A message carries the whole vector ViV_i.
  • On receipt, PjP_j sets Vj[k]:=max(Vj[k],Vmsg[k])V_j[k] := \max(V_j[k], V_{msg}[k]) for all kk, then Vj[j]:=Vj[j]+1V_j[j] := V_j[j] + 1.

Ordering: V(a)<V(b)V(a) < V(b) iff V(a)[k]V(b)[k]V(a)[k] \le V(b)[k] for all kk and V(a)V(b)V(a) \ne V(b). Then

ab    V(a)<V(b),a \rightarrow b \iff V(a) < V(b),

and aba \parallel b iff neither V(a)<V(b)V(a) < V(b) nor V(b)<V(a)V(b) < V(a). Thus vector clocks exactly characterise causality.

Example (3 processes). P1P_1's send event has V=(1,0,0)V=(1,0,0) and sends it. P2P_2 receives it, giving the receive event V=(1,1,0)V=(1,1,0). Meanwhile P3P_3 does a local event with V=(0,0,1)V=(0,0,1). Comparing (1,1,0)(1,1,0) and (0,0,1)(0,0,1): neither dominates, so those two events are concurrent — correctly detected, whereas Lamport scalars could not tell.

Summary

Lamport clocks give a total ordering consistent with causality but lose concurrency information; vector clocks capture the full happened-before partial order at the cost of O(n)O(n) space per timestamp.

clock-synchronizationlogical-clock
2long10 marks

What is distributed mutual exclusion? Explain the centralized, token-ring and Ricart-Agrawala (distributed) algorithms, comparing them in terms of message complexity and fault tolerance.

Distributed Mutual Exclusion

Mutual exclusion ensures that when several processes in a distributed system contend for a shared resource (the critical section, CS), only one process is in the CS at a time. Because there is no shared memory and no global clock, classical OS mechanisms (semaphores, monitors) cannot be used directly; instead processes coordinate by message passing. A correct algorithm must guarantee:

  • Safety / Mutual exclusion — at most one process in CS.
  • Liveness / No starvation — every request eventually granted.
  • Fairness / Ordering — requests honoured in happened-before order.

1. Centralized Algorithm

A single coordinator process manages access:

  1. To enter CS, a process sends a REQUEST to the coordinator.
  2. If CS is free, the coordinator replies GRANT; otherwise it queues the request (no reply / DENY).
  3. On exiting, the process sends RELEASE; the coordinator grants the next queued request.
  • Messages per CS entry: 3 (request, grant, release).
  • Pros: simple, fair (FIFO), easy to implement.
  • Cons: the coordinator is a single point of failure and a performance bottleneck.

2. Token-Ring Algorithm

Processes are logically arranged in a ring. A single token circulates around the ring.

  1. A process may enter the CS only when it holds the token.
  2. After leaving the CS (or if it does not need it) it passes the token to its successor.
  • Messages per CS entry: between 1 and \infty — when load is low the token may circulate many times before anyone needs it; throughput is bounded by ring traversal time.
  • Pros: no starvation (bounded wait = one ring traversal), simple.
  • Cons: lost token must be detected and regenerated; token holder crash breaks the ring and requires reconstruction.

3. Ricart–Agrawala (Distributed) Algorithm

A fully distributed algorithm using Lamport timestamps and an all-to-all scheme. To enter CS, PiP_i:

  1. Sends a timestamped REQUEST(ts, i) to all other n1n-1 processes.
  2. Enters CS only after receiving OK/REPLY from all of them.

On receiving a request, process PjP_j decides:

  • If PjP_j is not interested in CS → reply OK immediately.
  • If PjP_j is in CS → queue the request, reply later.
  • If PjP_j also wants CS → compare timestamps; the lower timestamp wins (ties broken by process id). The loser replies OK; the winner queues the request and replies after exiting.

On exit, PiP_i sends OK to every queued request.

  • Messages per CS entry: 2(n1)2(n-1)(n1)(n-1) requests + (n1)(n-1) replies.
  • Pros: no single point of failure; ordering by Lamport timestamps ensures fairness.
  • Cons: nn points of failure (any crash blocks progress unless detected); high message overhead; every process must know the full membership.

Comparison

AlgorithmMessages / CS entrySingle point of failureFault tolerance
Centralized3Yes (coordinator)Poor — coordinator crash halts all
Token-Ring1 … ∞Token / any nodeModerate — must handle lost token & node crash
Ricart–Agrawala2(n1)2(n-1)nn nodesPoor without failure detection — any crash blocks

Conclusion: Centralized is cheapest in messages but least robust; the token ring bounds waiting but wastes messages under low load and is fragile to token loss; Ricart–Agrawala is genuinely distributed and fair but costly (O(n)O(n) messages) and equally sensitive to any single crash.

mutual-exclusion
3long10 marks

Explain inter-process communication in distributed systems. Discuss the Remote Procedure Call (RPC) mechanism in detail, including parameter passing, binding and marshalling.

Inter-Process Communication (IPC) in Distributed Systems

In a distributed system, processes run on different machines with no shared memory, so they cooperate by exchanging messages over a network. IPC mechanisms include:

  • Message passingsend/receive primitives, which may be synchronous (blocking) or asynchronous (non-blocking).
  • Sockets — endpoints over TCP (reliable, connection-oriented) or UDP (unreliable, datagram).
  • Remote Procedure Call (RPC) and Remote Method Invocation (RMI) — higher-level abstractions that hide message passing.
  • Message-Oriented Middleware (MOM) — persistent, often queue-based, asynchronous messaging.

Remote Procedure Call (RPC)

RPC (Birrell & Nelson, 1984) lets a program call a procedure on a remote machine as if it were local, hiding the underlying message passing. It provides access and location transparency.

Working / Control Flow

Client ── call ──> Client Stub ── marshal ──> [network] ── unmarshal ──> Server Stub ── call ──> Server procedure
Client <─ return <─ Client Stub <─ unmarshal <─ [network] <─ marshal <─ Server Stub <─ return <─ Server procedure
  1. The client calls a local client stub (proxy) as an ordinary procedure.
  2. The stub marshals the parameters into a message and traps to the OS, which sends it.
  3. The message travels to the server; the server stub unmarshals the parameters.
  4. The server stub calls the actual server procedure; the result returns by the reverse path.

Parameter Passing

  • Call-by-value is straightforward: the value is copied into the message.
  • Call-by-reference is a problem — pointers/addresses are meaningless on a remote machine. Solutions: copy/restore (copy the referenced data in, copy modified data back) or flatten structures.
  • Data heterogeneity: different machines use different byte order (big- vs little-endian) and representations. A common format such as XDR (External Data Representation), CDR, or ASN.1 is used so both sides agree on encoding.

Marshalling

Marshalling is packing the procedure name and parameters into a flat, machine-independent message; unmarshalling is the reverse on receipt. The stubs do this automatically, usually generated from an IDL (Interface Definition Language) specification.

Binding

Binding is how a client locates a suitable server. Typically a binder / directory service (e.g. portmapper, RPC registry) is used:

  1. The server registers (exports) its interface, name and address with the binder.
  2. The client queries (imports) the binder to obtain the server's address.
  3. The client then sends RPCs directly to that server. Binding may be static (fixed at compile time) or dynamic (resolved at run time via the binder).

Failure Semantics

Because networks/servers can fail, RPC defines call semantics:

  • At-least-once (retry; safe only for idempotent operations),
  • At-most-once (duplicate filtering; no double execution),
  • Exactly-once (ideal but hard), and maybe (no guarantee).

Advantages / Limitations

  • Advantages: simple, familiar procedure-call abstraction; transparency; supports heterogeneity.
  • Limitations: call-by-reference and global variables are hard; partial failures and slower-than-local calls must be handled by the programmer.
rpccommunication
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

What is a distributed system? Explain the goals and characteristics of a distributed system.

Distributed System

A distributed system is a collection of independent (autonomous) computers connected by a network that appears to its users as a single coherent system, cooperating through message passing to achieve a common goal (Tanenbaum). There is no shared memory and no global clock.

Goals

  1. Resource sharing — share hardware, data, files, printers, and services.
  2. Transparency — hide the distribution (access, location, replication, etc.) from users.
  3. Openness — use standard, well-defined interfaces/protocols so components from different vendors interoperate and the system is extensible.
  4. Scalability — work efficiently as the number of users, resources, and geographic spread grows.
  5. Reliability / Fault tolerance — keep working despite partial failures (via replication and redundancy).
  6. High performance & availability through parallelism and load distribution.

Characteristics

  • Concurrency — many components execute in parallel and share resources.
  • No global clock — coordination relies on message passing and logical/physical clock synchronization.
  • Independent / partial failures — components fail independently; the system must tolerate this without total failure.
  • Heterogeneity — different hardware, OS, networks, and languages, bridged by middleware.
  • Resource sharing and scalability as above.
characteristics
5short5 marks

Explain Lamport's logical clock with an example.

Lamport's Logical Clock

Lamport's logical clock assigns a monotonically increasing integer timestamp to events so that the timestamps respect the happened-before relation (\rightarrow). It does not measure real time; it only orders events causally. Each process PiP_i keeps a counter CiC_i.

Rules:

  1. Before each event in PiP_i: Ci:=Ci+1C_i := C_i + 1.
  2. When PiP_i sends a message mm, it attaches the timestamp ts(m)=Cits(m) = C_i.
  3. When PjP_j receives mm: Cj:=max(Cj,ts(m))+1C_j := \max(C_j, ts(m)) + 1.

This enforces the clock condition: if aba \rightarrow b then C(a)<C(b)C(a) < C(b).

Example. Two processes, both starting at 0.

P1:  e1=1   e2=2 ──m(ts=2)──┐
P2:                 e3 receive ──> C2 = max(0,2)+1 = 3
  • P1P_1: event e1e_1 gets 11, event e2e_2 gets 22, then sends mm with ts=2ts=2.
  • P2P_2 (clock 0) receives mm: sets C2=max(0,2)+1=3C_2 = \max(0,2)+1 = 3, so the receive event is stamped 33. Since e2e3e_2 \rightarrow e_3 (send before receive), indeed C(e2)=2<C(e3)=3C(e_2)=2 < C(e_3)=3, satisfying the clock condition.

Note: C(a)<C(b)C(a)<C(b) does not imply aba \rightarrow b, so Lamport clocks cannot by themselves detect concurrent events (vector clocks are needed for that).

logical-clock
6short5 marks

What is a Remote Procedure Call (RPC)? Explain its working with a diagram.

Remote Procedure Call (RPC)

RPC is a communication mechanism that lets a program invoke a procedure on a remote machine as if it were a local call, hiding all message passing. It provides access and location transparency (Birrell & Nelson, 1984).

Working (with diagram)

  CLIENT MACHINE                          SERVER MACHINE
 ┌──────────────┐                        ┌──────────────┐
 │   Client     │                        │   Server     │
 │  (1) call    │                        │  (5) execute │
 │      ▼       │                        │      ▲       │
 │ Client Stub  │                        │ Server Stub  │
 │ (2) marshal  │                        │ (4) unmarshal│
 └──────┬───────┘                        └──────▲───────┘
        │ (3) request msg  ───────────────►      │
        ▲      ◄─────────────── (6) reply msg    │
                       [ Network ]

Steps:

  1. The client makes an ordinary procedure call to the client stub.
  2. The client stub marshals the parameters into a message.
  3. The message is sent over the network to the server.
  4. The server stub receives and unmarshals the parameters.
  5. The server stub calls the actual server procedure, which executes.
  6. The result is marshalled, sent back, unmarshalled by the client stub, and returned to the client as the call's return value.

The client and server are blocked (synchronous call) until the result returns. A binding step (via a directory/binder) lets the client locate the server, and call semantics (at-least-once / at-most-once) define behaviour under failures.

rpc
7short5 marks

Differentiate between centralized and distributed mutual exclusion algorithms.

Centralized vs Distributed Mutual Exclusion

Both ensure only one process is in the critical section at a time, but differ in coordination.

AspectCentralized algorithmDistributed (e.g. Ricart–Agrawala) algorithm
ControlA single coordinator grants/denies accessAll processes participate; decision is collective
DecisionCoordinator alone decides (queues requests)Based on Lamport timestamps; lowest timestamp wins
Messages per CS entry3 (request, grant, release)2(n1)2(n-1) (request + reply to/from all others)
Single point of failureYes — coordinator crash halts the systemNo single coordinator, but any node crash can block progress (nn points of failure)
ImplementationSimple, easyComplex; each node needs full membership and a request queue
BottleneckCoordinator is a performance bottleneckLoad spread across nodes
FairnessFIFO order by coordinatorTotal order by timestamps (ties by process id)

Summary: the centralized scheme is simpler and uses fewer messages but is fragile and can bottleneck; the distributed scheme removes the single coordinator and is fairer, at the cost of far more messages and greater complexity, while still requiring failure detection.

mutual-exclusion
8short5 marks

Explain distributed deadlock detection. What is a wait-for graph?

Distributed Deadlock Detection

A deadlock occurs when a set of processes are each blocked waiting for a resource held by another process in the set, so none can proceed. In a distributed system the processes and resources are spread across multiple sites, so no single node has the complete picture — this makes detection harder than on a single machine.

Wait-For Graph (WFG)

A wait-for graph is a directed graph in which:

  • Each node represents a process.
  • A directed edge PiPjP_i \rightarrow P_j means "process PiP_i is waiting for a resource currently held by PjP_j."

A cycle in the WFG indicates a deadlock. In a distributed system the graph is split into per-site local WFGs; a deadlock may form a cycle spanning several sites (a global cycle) that no single local graph reveals.

Detection approaches

  • Centralized: a coordinator collects local WFGs into a global WFG and searches for cycles (can give false deadlocks if updates arrive out of order).
  • Distributed: sites cooperate, e.g. the Chandy–Misra–Haas edge-chasing algorithm sends special probe messages along waiting edges; if a probe returns to its initiator, a cycle (deadlock) exists.
  • Hierarchical: sites organised in a tree, each handling deadlocks in its subtree.

Once a cycle is found, the deadlock is resolved by aborting / rolling back a victim process and releasing its resources.

deadlock
9short5 marks

Explain the Bully algorithm for electing a coordinator.

Bully Algorithm (Garcia-Molina)

The Bully algorithm elects a coordinator among processes after the current one fails. It assumes each process has a unique id, processes can communicate reliably, and the process with the highest id should become coordinator. Any process that notices the coordinator is not responding starts an election.

Steps

When process PP detects the coordinator has failed, PP holds an election:

  1. PP sends an ELECTION message to all processes with a higher id than itself.
  2. If no higher process replies (within a timeout), PP wins: it becomes the coordinator and sends a COORDINATOR message to all processes announcing itself.
  3. If a higher process replies with an OK message, that process takes over the election; PP drops out and waits.
  4. Each higher process that receives an ELECTION replies OK and then starts its own election among still-higher processes.
  5. Eventually the highest-id live process wins and broadcasts COORDINATOR to everyone.

If a previously crashed process with a higher id recovers, it starts a fresh election and 'bullies' the current coordinator out — hence the name.

Example

Processes 1..61..6, 66 was coordinator and crashes. P3P_3 notices and sends ELECTION to 4,5,64,5,6. 44 and 55 reply OK; P3P_3 stops. 44 and 55 run their own elections; 55 gets no reply from 66 (crashed), so P5P_5 becomes coordinator and broadcasts COORDINATOR to 1..41..4.

Message complexity: O(n2)O(n^2) in the worst case.

election
10short5 marks

What are the different kinds of transparency in a distributed system?

Kinds of Transparency in a Distributed System

Transparency means hiding the fact that a system's resources are distributed, so that it appears to users and applications as a single coherent system. The ISO/ANSA standard identifies the following kinds:

  1. Access transparency — hide differences in data representation and how a resource is accessed; local and remote resources are accessed in the same way.
  2. Location transparency — hide where a resource is physically located (users use a name, not an address).
  3. Migration (mobility) transparency — hide that a resource may move to another location; its name stays valid.
  4. Relocation transparency — hide that a resource may move while in use.
  5. Replication transparency — hide that several copies of a resource exist; the user sees one logical resource.
  6. Concurrency transparency — hide that a resource may be shared by several users at the same time (no interference).
  7. Failure transparency — hide the failure and recovery of a resource.
  8. Persistence transparency — hide whether a resource is in memory or on disk.

Full transparency is not always desirable — hiding everything can hurt performance, so designers balance transparency against efficiency.

transparency
11short5 marks

What is middleware? Explain its role in a distributed system.

Middleware

Middleware is a software layer that sits between the operating system / network and the distributed applications, on top of the OS of each machine. It hides the heterogeneity and complexity of the underlying network, OS, hardware and programming languages, and offers a uniform, high-level programming model so that a distributed application appears as a single coherent system.

┌──────────────────────────────────────┐
│        Distributed Applications        │
├──────────────────────────────────────┤
│              MIDDLEWARE                │  ← provides distribution transparency
├──────────────┬───────────────────────┤
│  OS / Network │  OS / Network  │ ...   │
└──────────────┴───────────────────────┘

Role in a Distributed System

  • Provides transparency — hides location, access, replication, failure, and migration differences from the application.
  • Masks heterogeneity of hardware, OS, networks and languages, enabling interoperability.
  • Offers high-level communication abstractions — RPC, RMI, message queues / MOM, publish-subscribe.
  • Provides common services — naming/directory, security/authentication, transactions, concurrency control, persistence, replication and fault tolerance.
  • Eases development by giving programmers a single uniform API instead of low-level socket programming.

Examples: CORBA, Java RMI, Microsoft DCOM/.NET Remoting, gRPC, and message brokers such as RabbitMQ/JMS.

middleware
12short5 marks

Explain Cristian's algorithm for physical clock synchronization.

Cristian's Algorithm (Physical Clock Synchronization)

Cristian's algorithm (1989) synchronizes a client's clock to a time server that has access to an accurate reference time source (e.g. UTC/an atomic clock). It is suitable when round-trip times are small relative to the required accuracy.

Procedure

  1. The client PP records its local send time T0T_0 and sends a request to the time server.
  2. The time server replies with its current time TserverT_{server} (often denoted CUTCC_{UTC}).
  3. The client records the reply-arrival time T1T_1.
  4. The client estimates the message round-trip delay and sets its clock to:
Tclient=Tserver+T1T02T_{client} = T_{server} + \frac{T_1 - T_0}{2}

The term T1T02\dfrac{T_1 - T_0}{2} assumes the request and reply legs take equal time, so half the round-trip is added to compensate for the propagation delay of the reply.

Improving accuracy

If the server's interrupt-handling time II is known, a better estimate is:

Tclient=Tserver+(T1T0)I2.T_{client} = T_{server} + \frac{(T_1 - T_0) - I}{2}.

The error is bounded by ±(T1T02Tmin)\pm\left(\dfrac{T_1 - T_0}{2} - T_{min}\right), where TminT_{min} is the minimum possible one-way transit time. Several requests can be made and the one with the smallest round-trip time chosen for best accuracy.

Limitations

  • Assumes symmetric (equal) network delays, which may not hold.
  • The single time server is a single point of failure and can be a bottleneck — addressed by the Berkeley algorithm / NTP.
  • A clock must never be set backward; instead it is slowed or sped up gradually to converge.

Example

If T0=100T_0 = 100, reply arrives at T1=120T_1 = 120, and the server time in the reply is Tserver=500T_{server} = 500, then

Tclient=500+1201002=500+10=510.T_{client} = 500 + \frac{120 - 100}{2} = 500 + 10 = 510.
clock-synchronization

Frequently asked questions

Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2074?
The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2074 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Distributed System (BSc CSIT, CSC462) 2074 paper come with solutions?
Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2074 paper?
The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2074 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Distributed System (BSc CSIT, CSC462) past paper free?
Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.