BSc CSIT (TU) Science Distributed System (BSc CSIT, CSC462) Question Paper 2077 Nepal

Q: Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2077?

The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2077 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Distributed System (BSc CSIT, CSC462) 2077 paper come with solutions?

Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2077 paper?

The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2077 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Distributed System (BSc CSIT, CSC462) past paper free?

Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is fault tolerance? Explain failure models, the use of redundancy, and the concept of agreement in faulty systems (Byzantine generals problem).

fault-tolerance

Answer 1

Fault Tolerance

Fault tolerance is the ability of a distributed system to continue operating correctly (providing its specified service) even when one or more of its components fail. The aim is to mask failures so that they are not visible to users, achieving high availability, reliability, safety and maintainability.

Key terms:

Fault – the underlying defect (e.g. a broken link).
Error – the manifestation of a fault in the system state.
Failure – when the system deviates from its specification.

Failure Models

Different kinds of failure must be anticipated:

Failure type	Description
Crash (fail-stop)	A process halts and stays halted; it had been working correctly until then.
Omission	A process or channel fails to send (send-omission) or receive (receive-omission) messages.
Timing	A response is correct but arrives outside the specified time interval (relevant to synchronous systems).
Response	The server responds incorrectly: a wrong value or a wrong state transition.
Arbitrary (Byzantine)	The most severe: a process may behave arbitrarily, send conflicting/wrong messages, or act maliciously.

Use of Redundancy

Fault tolerance is achieved by masking faults using redundancy:

Information redundancy – extra bits to detect/correct errors (e.g. checksums, error-correcting codes).
Time redundancy – repeating an action (e.g. retransmitting a message, retrying a transaction).
Physical (hardware/process) redundancy – replicating components or processes so that survivors mask the failure of others (e.g. Triple Modular Redundancy with voting, replicated servers).

Agreement in Faulty Systems — Byzantine Generals Problem

Non-faulty processes often must reach agreement on a value (e.g. commit/abort, a coordinator's order). With arbitrary (Byzantine) failures this is hard because faulty processes may send different values to different peers.

The Byzantine Generals Problem models this: several generals (processes) must agree on a common plan (attack/retreat) by exchanging messages, while up to m of them are traitors sending conflicting information. The goal: all loyal generals decide the same plan, and if the commander is loyal every loyal general follows the commander's order.

Key result (Lamport, Shostak, Pease): with m faulty processes, agreement is possible only if the total number of processes satisfies

n \ge 3m + 1.

That is, more than two-thirds of the processes must be correct. The algorithm requires $m+1$ rounds of message exchange (oral messages). With unforgeable signed messages (written messages), agreement is possible for any n > m.

Conclusion: Fault tolerance combines well-defined failure models with redundancy; tolerating arbitrary faults requires costly agreement protocols bounded by the $n \ge 3m+1$ condition.

Answer 2

Clock Synchronization in Distributed Systems

In a distributed system each node has its own physical clock that drifts at a slightly different rate, so the clocks disagree over time (clock skew). Many tasks (ordering events, timestamps, consistency, make) need clocks that agree, so clock synchronization keeps clocks close to one another (internal) or to real time / UTC (external). Because message delays are unpredictable, perfect synchronization is impossible; instead we either synchronize physical clocks approximately (Cristian's algorithm, Berkeley, NTP) or use logical clocks to capture only the ordering of events.

Happened-Before Relation (→)

Lamport defined the happened-before relation, a partial order capturing causality:

If a and b are events in the same process and a occurs before b, then $a \rightarrow b$ .
If a is the sending of a message and b is its receipt, then $a \rightarrow b$ .
Transitivity: if $a \rightarrow b$ and $b \rightarrow c$ then $a \rightarrow c$ .

If neither $a \rightarrow b$ nor $b \rightarrow a$ , the events are concurrent ( $a \parallel b$ ).

Lamport's Logical Clocks

Each process $P_i$ keeps a counter $C_i$ . Rules:

Before each event, $C_i = C_i + 1$ .
A sender attaches its timestamp $t$ to every message.
On receipt, the receiver sets $C_j = \max(C_j, t) + 1$ .

Property: if $a \rightarrow b$ then $C(a) < C(b)$ . (The converse need not hold.)

Example: Process P1 sends message m to P2.

In P1, event send(m) gets timestamp, say, 4.
P2's clock is currently 1; on receiving m it sets $\max(1,4)+1 = 5$ .

Thus send (4) < receive (5), preserving causality. Ties (equal timestamps in different processes) are broken using process IDs to give a total order.

Vector Clocks

Lamport clocks cannot tell whether two events are causally related or concurrent. Vector clocks fix this. Each process $P_i$ keeps a vector $V_i$ of size n:

Before an event: $V_i[i] = V_i[i] + 1$ .
Send: attach the whole vector $V_i$ to the message.
Receive vector $V_m$ : for all k, $V_j[k] = \max(V_j[k], V_m[k])$ , then increment $V_j[j]$ .

Comparison: $a \rightarrow b \iff V(a) < V(b)$ (every component $\le$ and at least one strictly $<$ ). If neither vector dominates, the events are concurrent.

Example: Start $V_1=(0,0,0)$ , $V_2=(0,0,0)$ . P1 does an event → $(1,0,0)$ and sends. P2 (currently $(0,0,0)$ ) receives, takes max and increments its own component → $(1,1,0)$ . Here $(1,0,0) < (1,1,0)$ , correctly showing the send happened-before the receive.

Conclusion

Logical and vector clocks order events without synchronized physical clocks; Lamport clocks give a consistent total order, while vector clocks additionally detect concurrency, both grounded in the happened-before relation.

Answer 3

Distributed Mutual Exclusion

Mutual exclusion ensures that at most one process at a time enters a critical section (CS) accessing a shared resource. In a distributed system there is no shared memory or common clock, so coordination is done purely by message passing. A correct algorithm must guarantee safety (at most one process in CS), liveness (no deadlock/starvation — requests eventually granted) and ideally fairness/ordering (requests served in happened-before order).

1. Centralized Algorithm

One process is elected coordinator. To enter the CS a process sends a REQUEST to the coordinator; the coordinator replies GRANT if the CS is free, otherwise queues the request (or sends nothing/denied). On exit the process sends RELEASE, and the coordinator grants the CS to the next queued request.

Messages per CS entry: 3 (request, grant, release).
Pros: simple, fair (FIFO queue), few messages.
Cons: single point of failure (coordinator crash blocks everyone); coordinator is a performance bottleneck; cannot distinguish a dead coordinator from a busy CS.

2. Token-Ring Algorithm

Processes are arranged in a logical ring. A single token circulates around the ring. A process may enter the CS only when it holds the token; it keeps the token while in the CS and passes it to the next neighbour afterwards. If it does not need the CS, it forwards the token immediately.

Messages per CS entry: between 1 and ∞ (the token may circulate many times when no one wants the CS).
Pros: no starvation; safety guaranteed by single token.
Cons: lost token must be detected and regenerated; a crashed process breaks the ring and must be bypassed; latency to acquire CS can be high.

3. Ricart–Agrawala (Distributed) Algorithm

A fully distributed algorithm using Lamport timestamps. To enter the CS, a process sends a timestamped REQUEST to all $n-1$ other processes and waits for REPLY (OK) from every one. A receiver replies OK immediately if it is not interested or not in the CS; if it is in the CS it queues the request; if it also wants the CS, it compares timestamps and replies OK only if the requester's timestamp is smaller (ties broken by process ID), otherwise queues. On exit, the process sends OK to all queued requests.

Messages per CS entry: $2(n-1)$ — one request and one reply per other process.
Pros: no single point of failure (no central coordinator); ensures ordering by timestamp; no starvation.
Cons: every process is a point of failure (one crash blocks all, since its reply is awaited — improved by Maekawa's quorum method); high message overhead; each process must know all others.

Comparison

Algorithm	Messages per entry	Single point of failure	Fault tolerance
Centralized	3	Yes (coordinator)	Poor
Token-Ring	1 → ∞	Lost token / broken ring	Moderate (needs token regeneration)
Ricart–Agrawala	$2(n-1)$	Any process crash blocks all	Poor unless extended

Conclusion: The centralized algorithm is cheapest in messages but least robust; the token ring avoids starvation but suffers from lost tokens; Ricart–Agrawala is truly distributed and ordering-correct but the most message-expensive and sensitive to crashes.

Answer 4

Distributed System

A distributed system is a collection of independent (autonomous) computers, connected by a network, that appears to its users as a single coherent system. The computers coordinate their actions only by passing messages, having no shared memory or global clock.

Goals

Resource sharing – let users share hardware, software, data and services.
Transparency – hide the distribution (location, access, replication, failure, etc.) so the system looks like one machine.
Openness – use standard interfaces/protocols so components can interoperate and be extended.
Scalability – continue to perform well as the number of users, resources or geographic span grows.
Reliability and fault tolerance – keep working despite partial failures.

Characteristics

Concurrency of components – many processes run and access shared resources simultaneously.
No global clock – coordination relies on message passing and logical ordering.
Independent failures – components can fail independently, and others may not immediately know.
Heterogeneity – varied hardware, OS, networks and languages, masked by middleware.
Geographical distribution of nodes communicating over a network.

In short: a distributed system shares resources across autonomous, concurrently-executing machines while presenting a single, transparent, scalable and fault-tolerant system to its users.

Answer 5

Lamport's Logical Clock

Physical clocks in a distributed system cannot be perfectly synchronized, yet processes still need to order events consistently. Lamport's logical clock provides a software counter that respects causality without using real time.

Happened-before relation (→):

Same process, a before b ⇒ $a \rightarrow b$ .
Send of a message a and its receipt b ⇒ $a \rightarrow b$ .
Transitive: $a \rightarrow b, b \rightarrow c \Rightarrow a \rightarrow c$ .

Clock rules: Each process $P_i$ keeps a counter $C_i$ .

Before every event: $C_i = C_i + 1$ .
Each message carries the sender's timestamp $t$ .
On receipt: $C_j = \max(C_j, t) + 1$ .

Property: $a \rightarrow b \Rightarrow C(a) < C(b)$ .

Example: Suppose P1's clock reaches 6 when it sends message m (timestamp $t=6$ ). P2's clock is currently 2. On receiving m, P2 sets

C_2 = \max(2, 6) + 1 = 7.

So send = 6 < receive = 7, correctly preserving the causal order. Equal timestamps in different processes are ordered by process ID to obtain a total order.

Answer 6

Remote Procedure Call (RPC)

A Remote Procedure Call lets a program call a procedure that executes in a different address space (typically on another machine) as if it were a normal local procedure call. RPC (Birrell & Nelson) hides the underlying message passing, providing access transparency so the programmer need not write explicit send/receive code.

Working

RPC uses client and server stubs plus the RPC runtime:

The client calls the client stub like an ordinary local procedure.
The client stub marshals (packs and serializes) the parameters into a message.
The client's OS/RPC runtime sends the message over the network to the server.
The server OS passes the message to the server stub, which unmarshals the parameters.
The server stub calls the actual server procedure, which executes and returns a result.
The server stub marshals the result and sends it back.
The client stub unmarshals the result and returns it to the caller, which resumes as from a local call.

Diagram (described)

 Client machine                         Server machine
 +-----------+   call   +-----------+   +-----------+   call   +-----------+
 |  Client   |--------->|  Client   |   |  Server   |--------->|  Server   |
 |  program  |<---------|   stub    |   |   stub    |<---------| procedure |
 +-----------+  return  +-----------+   +-----------+  return  +-----------+
                            |  ^             ^  |
                   marshal  |  | unmarshal   |  | marshal
                            v  |             |  v
                       +-----------+     +-----------+
                       | OS / net  |====>| OS / net  |  (message over network)
                       +-----------+     +-----------+

Note: RPC must handle issues local calls do not — parameter passing (no shared memory, so pass-by-value/marshalling), heterogeneous data representation (e.g. XDR), and failure semantics (at-least-once, at-most-once, exactly-once delivery guarantees).

Answer 7

Centralized vs Distributed Mutual Exclusion

Aspect	Centralized algorithm	Distributed algorithm (e.g. Ricart–Agrawala)
Coordination	A single elected coordinator grants/denies the critical section.	All processes participate equally; permission is obtained from all peers.
Messages per CS entry	3 (request, grant, release).	$2(n-1)$ (request to all, reply from all).
Decision basis	Coordinator's FIFO queue.	Lamport timestamps; lowest timestamp wins (ties by process ID).
Single point of failure	Yes — coordinator crash blocks the whole system.	No central point, but any process's crash blocks all (it must reply).
Bottleneck	Coordinator can be a performance bottleneck.	Load spread over all processes; more total traffic.
Complexity	Simple to implement.	More complex; every process must know all others.
Fairness	Coordinator can ensure FIFO fairness.	Ordering guaranteed by timestamps; no starvation.

Summary: The centralized scheme is simple and message-efficient but depends on one coordinator (single point of failure and bottleneck). The distributed scheme removes the central coordinator and orders requests by timestamps, at the cost of far more messages and sensitivity to any node failing.

Answer 8

Distributed Deadlock Detection

A deadlock occurs when a set of processes are each blocked waiting for a resource held by another process in the set, so none can proceed. In a distributed system, resources and processes are spread over many sites and no single site has the global state, so detecting deadlock is harder; the system must combine local information across sites.

Approach: Since prevention/avoidance are costly, distributed systems often use deadlock detection: allow deadlocks to form, then detect cycles and resolve them (by aborting/rolling back a victim process and releasing its resources). Common organizations:

Centralized detection – a coordinator builds a global wait-for graph from local graphs.
Distributed detection – sites cooperate (e.g. edge-chasing / Chandy–Misra–Haas algorithm, which circulates probe messages along wait-for edges; if a probe returns to its initiator, a cycle—deadlock—exists).
Hierarchical detection – sites organized in a tree, detecting deadlocks at the lowest common ancestor.

Detection must avoid phantom (false) deadlocks caused by out-of-date or inconsistent global snapshots.

Wait-For Graph (WFG)

A wait-for graph is a directed graph used to represent blocking relationships:

Nodes = processes (transactions).
An edge $P_i \rightarrow P_j$ means $P_i$ is waiting for a resource currently held by $P_j$ .

A cycle in the wait-for graph indicates a deadlock among the processes on the cycle. In a distributed system each site maintains a local WFG; a deadlock may span sites, so a global WFG (the union of local graphs plus inter-site edges) must be examined for cycles.

Example: $P_1 \rightarrow P_2 \rightarrow P_3 \rightarrow P_1$ forms a cycle, so $P_1, P_2, P_3$ are deadlocked; one is chosen as a victim and aborted to break the cycle.

Answer 9

Bully Algorithm (Election)

An election algorithm chooses one process to act as coordinator when the current one fails. The Bully algorithm (Garcia-Molina) assumes each process has a unique numeric ID and that processes can communicate reliably; the process with the highest ID should become the coordinator.

Steps: When a process P notices the coordinator is not responding, it starts an election:

P sends an ELECTION message to every process with a higher ID.
- If no higher process replies (within timeout), P wins, becomes the coordinator, and sends a COORDINATOR message to all processes announcing itself.
- If a higher process replies with an OK (alive) message, P drops out; that higher process now takes over the election.
Any process that receives an ELECTION message replies OK and then starts its own election (repeating step 1) among still-higher processes.
Eventually the highest-ID live process finds no one above it answers, declares victory and broadcasts COORDINATOR.
A recovered process with a higher ID than the current coordinator will start a new election and "bully" the others to reclaim the role — hence the name.

Messages: the COORDINATOR and ELECTION/OK exchange; worst case $O(n^2)$ messages.

Example: With processes 1–7 and coordinator 7 crashing, suppose 4 detects it. Process 4 sends ELECTION to 5, 6, 7. Processes 5 and 6 reply OK and start their own elections; 6 finds 7 dead, wins, and broadcasts COORDINATOR to all. Process 6 becomes the new coordinator.

Answer 10

Kinds of Transparency in a Distributed System

Transparency is the property of hiding the distributed nature of the system (separation of components, replication, failures, etc.) so that users and applications perceive a single coherent system. ISO/ANSA defines several kinds:

Transparency	What it hides
Access	Differences in data representation and how a resource is accessed (local vs remote access looks the same).
Location	Where a resource is physically located (a name carries no location information).
Migration	That a resource may move to another location; access remains unaffected.
Relocation	That a resource may move while in use.
Replication	That several copies of a resource exist; the user sees a single resource.
Concurrency	That a resource is shared by several competing users simultaneously.
Failure	The failure and recovery of a resource, so the system appears to work normally.
Persistence	Whether a (software) resource is in memory or on disk.

Summary: These transparencies (access, location, migration, relocation, replication, concurrency, failure, persistence) together let a distributed system present itself to users as one unified machine. Note that full transparency is not always desirable — e.g. hiding wide-area latency can mislead users.

Answer 11

Middleware

Middleware is a software layer that sits between the operating system / network and the distributed applications running on each machine. Its purpose is to mask the heterogeneity of the underlying hardware, operating systems, networks and programming languages, and to provide a common, higher-level programming abstraction so that the system appears as a single coherent system to applications and developers.

Role in a Distributed System

Hides heterogeneity – masks differences in machines, OSes, networks and data representations.
Provides high-level communication abstractions – such as RPC, Remote Method Invocation (RMI), message-oriented middleware (message queues), and publish/subscribe, instead of raw sockets.
Supports transparency – access, location, replication, failure and concurrency transparency are largely implemented in middleware.
Offers common services – naming/directory services, security/authentication, transactions, persistence, concurrency control and event notification.
Eases development and openness – standard interfaces (e.g. CORBA, Java RMI, web services / gRPC, DCOM) let independently developed components interoperate.

In short: Middleware is the "glue" that turns a collection of heterogeneous networked computers into a usable distributed system by providing communication, transparency and common services above the OS layer.

Answer 12

Cristian's Algorithm (Physical Clock Synchronization)

Cristian's algorithm (1989) synchronizes a client's physical clock to an external time server (e.g. a UTC-based time source) — a form of external synchronization suited to systems where round-trip times are small relative to the required accuracy.

Procedure:

The client records its local time $T_0$ and sends a request to the time server.
The server replies with its current time $T_{server}$ .
The client records the time $T_1$ when the reply arrives.

The round-trip time is $RTT = T_1 - T_0$ . Assuming the request and reply take roughly equal time, the message took about $RTT/2$ to travel back, so when the reply is received the true time is approximately

T_{client} = T_{server} + \frac{T_1 - T_0}{2}.

The client sets its clock to this value.

Handling server processing time: if the server reports its interrupt-handling/processing time $I$ , the estimate improves to

T_{client} = T_{server} + \frac{(T_1 - T_0) - I}{2}.

Example: $T_0 = 100$ , $T_1 = 120$ (so $RTT = 20$ ), server replies $T_{server} = 500$ . The client sets its clock to $500 + 20/2 = 510$ .

Points:

Clocks should be set forward smoothly (gradually advanced), and never set backward abruptly (to avoid breaking event ordering / timestamps); slow clocks are sped up and fast clocks slowed until corrected.
Limitation: accuracy depends on a symmetric, short round trip; variable network delay reduces precision, and the single time server is a point of failure (addressed by the Berkeley algorithm / NTP).

Level	BSc CSIT (TU)
Stream	Science
Subject	Distributed System (BSc CSIT, CSC462)
Year	2077 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions