BSc CSIT (TU) Science Distributed System (BSc CSIT, CSC462) Question Paper 2077 Nepal
This is the official BSc CSIT (TU) (Science stream) Distributed System (BSc CSIT, CSC462) question paper for 2077, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Distributed System (BSc CSIT, CSC462) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) exam or solving previous years' question papers, this 2077 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
What is fault tolerance? Explain failure models, the use of redundancy, and the concept of agreement in faulty systems (Byzantine generals problem).
Fault Tolerance
Fault tolerance is the ability of a distributed system to continue operating correctly (providing its specified service) even when one or more of its components fail. The aim is to mask failures so that they are not visible to users, achieving high availability, reliability, safety and maintainability.
Key terms:
- Fault – the underlying defect (e.g. a broken link).
- Error – the manifestation of a fault in the system state.
- Failure – when the system deviates from its specification.
Failure Models
Different kinds of failure must be anticipated:
| Failure type | Description |
|---|---|
| Crash (fail-stop) | A process halts and stays halted; it had been working correctly until then. |
| Omission | A process or channel fails to send (send-omission) or receive (receive-omission) messages. |
| Timing | A response is correct but arrives outside the specified time interval (relevant to synchronous systems). |
| Response | The server responds incorrectly: a wrong value or a wrong state transition. |
| Arbitrary (Byzantine) | The most severe: a process may behave arbitrarily, send conflicting/wrong messages, or act maliciously. |
Use of Redundancy
Fault tolerance is achieved by masking faults using redundancy:
- Information redundancy – extra bits to detect/correct errors (e.g. checksums, error-correcting codes).
- Time redundancy – repeating an action (e.g. retransmitting a message, retrying a transaction).
- Physical (hardware/process) redundancy – replicating components or processes so that survivors mask the failure of others (e.g. Triple Modular Redundancy with voting, replicated servers).
Agreement in Faulty Systems — Byzantine Generals Problem
Non-faulty processes often must reach agreement on a value (e.g. commit/abort, a coordinator's order). With arbitrary (Byzantine) failures this is hard because faulty processes may send different values to different peers.
The Byzantine Generals Problem models this: several generals (processes) must agree on a common plan (attack/retreat) by exchanging messages, while up to m of them are traitors sending conflicting information. The goal: all loyal generals decide the same plan, and if the commander is loyal every loyal general follows the commander's order.
Key result (Lamport, Shostak, Pease): with m faulty processes, agreement is possible only if the total number of processes satisfies
That is, more than two-thirds of the processes must be correct. The algorithm requires rounds of message exchange (oral messages). With unforgeable signed messages (written messages), agreement is possible for any n > m.
Conclusion: Fault tolerance combines well-defined failure models with redundancy; tolerating arbitrary faults requires costly agreement protocols bounded by the condition.
Explain clock synchronization in distributed systems. Discuss Lamport's logical clocks and vector clocks with examples, and describe how the happened-before relation orders events.
Clock Synchronization in Distributed Systems
In a distributed system each node has its own physical clock that drifts at a slightly different rate, so the clocks disagree over time (clock skew). Many tasks (ordering events, timestamps, consistency, make) need clocks that agree, so clock synchronization keeps clocks close to one another (internal) or to real time / UTC (external). Because message delays are unpredictable, perfect synchronization is impossible; instead we either synchronize physical clocks approximately (Cristian's algorithm, Berkeley, NTP) or use logical clocks to capture only the ordering of events.
Happened-Before Relation (→)
Lamport defined the happened-before relation, a partial order capturing causality:
- If a and b are events in the same process and a occurs before b, then .
- If a is the sending of a message and b is its receipt, then .
- Transitivity: if and then .
If neither nor , the events are concurrent ().
Lamport's Logical Clocks
Each process keeps a counter . Rules:
- Before each event, .
- A sender attaches its timestamp to every message.
- On receipt, the receiver sets .
Property: if then . (The converse need not hold.)
Example: Process P1 sends message m to P2.
- In P1, event send(m) gets timestamp, say, 4.
- P2's clock is currently 1; on receiving m it sets .
Thus send (4) < receive (5), preserving causality. Ties (equal timestamps in different processes) are broken using process IDs to give a total order.
Vector Clocks
Lamport clocks cannot tell whether two events are causally related or concurrent. Vector clocks fix this. Each process keeps a vector of size n:
- Before an event: .
- Send: attach the whole vector to the message.
- Receive vector : for all k, , then increment .
Comparison: (every component and at least one strictly ). If neither vector dominates, the events are concurrent.
Example: Start , . P1 does an event → and sends. P2 (currently ) receives, takes max and increments its own component → . Here , correctly showing the send happened-before the receive.
Conclusion
Logical and vector clocks order events without synchronized physical clocks; Lamport clocks give a consistent total order, while vector clocks additionally detect concurrency, both grounded in the happened-before relation.
What is distributed mutual exclusion? Explain the centralized, token-ring and Ricart-Agrawala (distributed) algorithms, comparing them in terms of message complexity and fault tolerance.
Distributed Mutual Exclusion
Mutual exclusion ensures that at most one process at a time enters a critical section (CS) accessing a shared resource. In a distributed system there is no shared memory or common clock, so coordination is done purely by message passing. A correct algorithm must guarantee safety (at most one process in CS), liveness (no deadlock/starvation — requests eventually granted) and ideally fairness/ordering (requests served in happened-before order).
1. Centralized Algorithm
One process is elected coordinator. To enter the CS a process sends a REQUEST to the coordinator; the coordinator replies GRANT if the CS is free, otherwise queues the request (or sends nothing/denied). On exit the process sends RELEASE, and the coordinator grants the CS to the next queued request.
- Messages per CS entry: 3 (request, grant, release).
- Pros: simple, fair (FIFO queue), few messages.
- Cons: single point of failure (coordinator crash blocks everyone); coordinator is a performance bottleneck; cannot distinguish a dead coordinator from a busy CS.
2. Token-Ring Algorithm
Processes are arranged in a logical ring. A single token circulates around the ring. A process may enter the CS only when it holds the token; it keeps the token while in the CS and passes it to the next neighbour afterwards. If it does not need the CS, it forwards the token immediately.
- Messages per CS entry: between 1 and ∞ (the token may circulate many times when no one wants the CS).
- Pros: no starvation; safety guaranteed by single token.
- Cons: lost token must be detected and regenerated; a crashed process breaks the ring and must be bypassed; latency to acquire CS can be high.
3. Ricart–Agrawala (Distributed) Algorithm
A fully distributed algorithm using Lamport timestamps. To enter the CS, a process sends a timestamped REQUEST to all other processes and waits for REPLY (OK) from every one. A receiver replies OK immediately if it is not interested or not in the CS; if it is in the CS it queues the request; if it also wants the CS, it compares timestamps and replies OK only if the requester's timestamp is smaller (ties broken by process ID), otherwise queues. On exit, the process sends OK to all queued requests.
- Messages per CS entry: — one request and one reply per other process.
- Pros: no single point of failure (no central coordinator); ensures ordering by timestamp; no starvation.
- Cons: every process is a point of failure (one crash blocks all, since its reply is awaited — improved by Maekawa's quorum method); high message overhead; each process must know all others.
Comparison
| Algorithm | Messages per entry | Single point of failure | Fault tolerance |
|---|---|---|---|
| Centralized | 3 | Yes (coordinator) | Poor |
| Token-Ring | 1 → ∞ | Lost token / broken ring | Moderate (needs token regeneration) |
| Ricart–Agrawala | Any process crash blocks all | Poor unless extended |
Conclusion: The centralized algorithm is cheapest in messages but least robust; the token ring avoids starvation but suffers from lost tokens; Ricart–Agrawala is truly distributed and ordering-correct but the most message-expensive and sensitive to crashes.
Section B: Short Answer Questions
Attempt any EIGHT questions.
What is a distributed system? Explain the goals and characteristics of a distributed system.
Distributed System
A distributed system is a collection of independent (autonomous) computers, connected by a network, that appears to its users as a single coherent system. The computers coordinate their actions only by passing messages, having no shared memory or global clock.
Goals
- Resource sharing – let users share hardware, software, data and services.
- Transparency – hide the distribution (location, access, replication, failure, etc.) so the system looks like one machine.
- Openness – use standard interfaces/protocols so components can interoperate and be extended.
- Scalability – continue to perform well as the number of users, resources or geographic span grows.
- Reliability and fault tolerance – keep working despite partial failures.
Characteristics
- Concurrency of components – many processes run and access shared resources simultaneously.
- No global clock – coordination relies on message passing and logical ordering.
- Independent failures – components can fail independently, and others may not immediately know.
- Heterogeneity – varied hardware, OS, networks and languages, masked by middleware.
- Geographical distribution of nodes communicating over a network.
In short: a distributed system shares resources across autonomous, concurrently-executing machines while presenting a single, transparent, scalable and fault-tolerant system to its users.
Explain Lamport's logical clock with an example.
Lamport's Logical Clock
Physical clocks in a distributed system cannot be perfectly synchronized, yet processes still need to order events consistently. Lamport's logical clock provides a software counter that respects causality without using real time.
Happened-before relation (→):
- Same process, a before b ⇒ .
- Send of a message a and its receipt b ⇒ .
- Transitive: .
Clock rules: Each process keeps a counter .
- Before every event: .
- Each message carries the sender's timestamp .
- On receipt: .
Property: .
Example: Suppose P1's clock reaches 6 when it sends message m (timestamp ). P2's clock is currently 2. On receiving m, P2 sets
So send = 6 < receive = 7, correctly preserving the causal order. Equal timestamps in different processes are ordered by process ID to obtain a total order.
What is a Remote Procedure Call (RPC)? Explain its working with a diagram.
Remote Procedure Call (RPC)
A Remote Procedure Call lets a program call a procedure that executes in a different address space (typically on another machine) as if it were a normal local procedure call. RPC (Birrell & Nelson) hides the underlying message passing, providing access transparency so the programmer need not write explicit send/receive code.
Working
RPC uses client and server stubs plus the RPC runtime:
- The client calls the client stub like an ordinary local procedure.
- The client stub marshals (packs and serializes) the parameters into a message.
- The client's OS/RPC runtime sends the message over the network to the server.
- The server OS passes the message to the server stub, which unmarshals the parameters.
- The server stub calls the actual server procedure, which executes and returns a result.
- The server stub marshals the result and sends it back.
- The client stub unmarshals the result and returns it to the caller, which resumes as from a local call.
Diagram (described)
Client machine Server machine
+-----------+ call +-----------+ +-----------+ call +-----------+
| Client |--------->| Client | | Server |--------->| Server |
| program |<---------| stub | | stub |<---------| procedure |
+-----------+ return +-----------+ +-----------+ return +-----------+
| ^ ^ |
marshal | | unmarshal | | marshal
v | | v
+-----------+ +-----------+
| OS / net |====>| OS / net | (message over network)
+-----------+ +-----------+
Note: RPC must handle issues local calls do not — parameter passing (no shared memory, so pass-by-value/marshalling), heterogeneous data representation (e.g. XDR), and failure semantics (at-least-once, at-most-once, exactly-once delivery guarantees).
Differentiate between centralized and distributed mutual exclusion algorithms.
Centralized vs Distributed Mutual Exclusion
| Aspect | Centralized algorithm | Distributed algorithm (e.g. Ricart–Agrawala) |
|---|---|---|
| Coordination | A single elected coordinator grants/denies the critical section. | All processes participate equally; permission is obtained from all peers. |
| Messages per CS entry | 3 (request, grant, release). | (request to all, reply from all). |
| Decision basis | Coordinator's FIFO queue. | Lamport timestamps; lowest timestamp wins (ties by process ID). |
| Single point of failure | Yes — coordinator crash blocks the whole system. | No central point, but any process's crash blocks all (it must reply). |
| Bottleneck | Coordinator can be a performance bottleneck. | Load spread over all processes; more total traffic. |
| Complexity | Simple to implement. | More complex; every process must know all others. |
| Fairness | Coordinator can ensure FIFO fairness. | Ordering guaranteed by timestamps; no starvation. |
Summary: The centralized scheme is simple and message-efficient but depends on one coordinator (single point of failure and bottleneck). The distributed scheme removes the central coordinator and orders requests by timestamps, at the cost of far more messages and sensitivity to any node failing.
Explain distributed deadlock detection. What is a wait-for graph?
Distributed Deadlock Detection
A deadlock occurs when a set of processes are each blocked waiting for a resource held by another process in the set, so none can proceed. In a distributed system, resources and processes are spread over many sites and no single site has the global state, so detecting deadlock is harder; the system must combine local information across sites.
Approach: Since prevention/avoidance are costly, distributed systems often use deadlock detection: allow deadlocks to form, then detect cycles and resolve them (by aborting/rolling back a victim process and releasing its resources). Common organizations:
- Centralized detection – a coordinator builds a global wait-for graph from local graphs.
- Distributed detection – sites cooperate (e.g. edge-chasing / Chandy–Misra–Haas algorithm, which circulates probe messages along wait-for edges; if a probe returns to its initiator, a cycle—deadlock—exists).
- Hierarchical detection – sites organized in a tree, detecting deadlocks at the lowest common ancestor.
Detection must avoid phantom (false) deadlocks caused by out-of-date or inconsistent global snapshots.
Wait-For Graph (WFG)
A wait-for graph is a directed graph used to represent blocking relationships:
- Nodes = processes (transactions).
- An edge means is waiting for a resource currently held by .
A cycle in the wait-for graph indicates a deadlock among the processes on the cycle. In a distributed system each site maintains a local WFG; a deadlock may span sites, so a global WFG (the union of local graphs plus inter-site edges) must be examined for cycles.
Example: forms a cycle, so are deadlocked; one is chosen as a victim and aborted to break the cycle.
Explain the Bully algorithm for electing a coordinator.
Bully Algorithm (Election)
An election algorithm chooses one process to act as coordinator when the current one fails. The Bully algorithm (Garcia-Molina) assumes each process has a unique numeric ID and that processes can communicate reliably; the process with the highest ID should become the coordinator.
Steps: When a process P notices the coordinator is not responding, it starts an election:
- P sends an ELECTION message to every process with a higher ID.
-
- If no higher process replies (within timeout), P wins, becomes the coordinator, and sends a COORDINATOR message to all processes announcing itself.
- If a higher process replies with an OK (alive) message, P drops out; that higher process now takes over the election.
- Any process that receives an ELECTION message replies OK and then starts its own election (repeating step 1) among still-higher processes.
- Eventually the highest-ID live process finds no one above it answers, declares victory and broadcasts COORDINATOR.
- A recovered process with a higher ID than the current coordinator will start a new election and "bully" the others to reclaim the role — hence the name.
Messages: the COORDINATOR and ELECTION/OK exchange; worst case messages.
Example: With processes 1–7 and coordinator 7 crashing, suppose 4 detects it. Process 4 sends ELECTION to 5, 6, 7. Processes 5 and 6 reply OK and start their own elections; 6 finds 7 dead, wins, and broadcasts COORDINATOR to all. Process 6 becomes the new coordinator.
What are the different kinds of transparency in a distributed system?
Kinds of Transparency in a Distributed System
Transparency is the property of hiding the distributed nature of the system (separation of components, replication, failures, etc.) so that users and applications perceive a single coherent system. ISO/ANSA defines several kinds:
| Transparency | What it hides |
|---|---|
| Access | Differences in data representation and how a resource is accessed (local vs remote access looks the same). |
| Location | Where a resource is physically located (a name carries no location information). |
| Migration | That a resource may move to another location; access remains unaffected. |
| Relocation | That a resource may move while in use. |
| Replication | That several copies of a resource exist; the user sees a single resource. |
| Concurrency | That a resource is shared by several competing users simultaneously. |
| Failure | The failure and recovery of a resource, so the system appears to work normally. |
| Persistence | Whether a (software) resource is in memory or on disk. |
Summary: These transparencies (access, location, migration, relocation, replication, concurrency, failure, persistence) together let a distributed system present itself to users as one unified machine. Note that full transparency is not always desirable — e.g. hiding wide-area latency can mislead users.
What is middleware? Explain its role in a distributed system.
Middleware
Middleware is a software layer that sits between the operating system / network and the distributed applications running on each machine. Its purpose is to mask the heterogeneity of the underlying hardware, operating systems, networks and programming languages, and to provide a common, higher-level programming abstraction so that the system appears as a single coherent system to applications and developers.
Role in a Distributed System
- Hides heterogeneity – masks differences in machines, OSes, networks and data representations.
- Provides high-level communication abstractions – such as RPC, Remote Method Invocation (RMI), message-oriented middleware (message queues), and publish/subscribe, instead of raw sockets.
- Supports transparency – access, location, replication, failure and concurrency transparency are largely implemented in middleware.
- Offers common services – naming/directory services, security/authentication, transactions, persistence, concurrency control and event notification.
- Eases development and openness – standard interfaces (e.g. CORBA, Java RMI, web services / gRPC, DCOM) let independently developed components interoperate.
In short: Middleware is the "glue" that turns a collection of heterogeneous networked computers into a usable distributed system by providing communication, transparency and common services above the OS layer.
Explain Cristian's algorithm for physical clock synchronization.
Cristian's Algorithm (Physical Clock Synchronization)
Cristian's algorithm (1989) synchronizes a client's physical clock to an external time server (e.g. a UTC-based time source) — a form of external synchronization suited to systems where round-trip times are small relative to the required accuracy.
Procedure:
- The client records its local time and sends a request to the time server.
- The server replies with its current time .
- The client records the time when the reply arrives.
The round-trip time is . Assuming the request and reply take roughly equal time, the message took about to travel back, so when the reply is received the true time is approximately
The client sets its clock to this value.
Handling server processing time: if the server reports its interrupt-handling/processing time , the estimate improves to
Example: , (so ), server replies . The client sets its clock to .
Points:
- Clocks should be set forward smoothly (gradually advanced), and never set backward abruptly (to avoid breaking event ordering / timestamps); slow clocks are sped up and fast clocks slowed until corrected.
- Limitation: accuracy depends on a symmetric, short round trip; variable network delay reduces precision, and the single time server is a point of failure (addressed by the Berkeley algorithm / NTP).
Frequently asked questions
- Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2077?
- The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2077 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Distributed System (BSc CSIT, CSC462) 2077 paper come with solutions?
- Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2077 paper?
- The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2077 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Distributed System (BSc CSIT, CSC462) past paper free?
- Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.