BSc CSIT (TU) Science Distributed System (BSc CSIT, CSC462) Question Paper 2079 Nepal
This is the official BSc CSIT (TU) (Science stream) Distributed System (BSc CSIT, CSC462) question paper for 2079, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Distributed System (BSc CSIT, CSC462) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) exam or solving previous years' question papers, this 2079 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
Explain the architecture of a distributed file system. Discuss the design and working of the Network File System (NFS) or Andrew File System (AFS).
Distributed File System (DFS)
A distributed file system is a file system that allows files to be stored on multiple networked machines but accessed by clients as if they were on a single local file system. It provides location transparency, access transparency, and a uniform namespace.
Architecture (Client–Server model)
A DFS is typically built from three logical components:
- Client module – runs on the user machine, intercepts file system calls and forwards them to remote servers (often through a Virtual File System / VFS layer).
- File service – stores and manages the actual file data and provides operations (read, write, create, delete).
- Directory/name service – maps human-readable file names to file identifiers (location of the file).
Files may be replicated (for availability and performance) and cached at clients (to reduce network traffic). The design must balance consistency, performance, availability, and scalability.
Network File System (NFS) – Sun Microsystems
NFS is a stateless, RPC-based DFS.
- Architecture: Built on top of the Virtual File System (VFS) layer. The VFS routes requests for local files to the local FS and requests for remote files to the NFS client, which uses RPC + XDR (External Data Representation) to talk to the NFS server.
- Mount protocol: A remote directory is mounted into the client's local namespace; the server returns a file handle identifying the exported directory.
- Stateless server: The server keeps no per-client open-file state. Every request (e.g.
read(handle, offset, count)) is self-contained and idempotent, so after a crash the server simply restarts and clients retransmit — giving simple crash recovery. - Caching: Clients cache file blocks and attributes; consistency is weak (validated periodically), which can cause clients to see slightly stale data.
Andrew File System (AFS) – CMU
AFS is designed for large scale and is stateful.
- Whole-file caching: When a client opens a file, the entire file is copied to the client's local disk and all reads/writes happen locally; the file is written back on close. This minimizes server load and gives excellent scalability.
- Callbacks: The server keeps state (a callback promise). If another client modifies the file, the server sends a callback break to invalidate the cached copy, providing stronger consistency than NFS.
- Session semantics: Changes become visible to others only after the file is closed.
NFS vs AFS
| Feature | NFS | AFS |
|---|---|---|
| Server state | Stateless | Stateful (callbacks) |
| Caching unit | Blocks (in memory) | Whole file (on local disk) |
| Consistency | Weak (poll-based) | Stronger (callback-based) |
| Scalability | Moderate | High |
Conclusion: NFS favors simplicity and easy recovery via statelessness, while AFS favors scalability and consistency via whole-file caching and callbacks.
What is fault tolerance? Explain failure models, the use of redundancy, and the concept of agreement in faulty systems (Byzantine generals problem).
Fault Tolerance
Fault tolerance is the ability of a distributed system to continue providing correct service even when one or more of its components fail. A fault is the underlying defect, an error is its manifestation, and a failure occurs when the system deviates from its specification. The key requirements are availability, reliability, safety, and maintainability.
Failure Models
Failures are classified by how a faulty component behaves:
| Failure type | Description |
|---|---|
| Crash (fail-stop) | A process halts and stays halted; others can detect it. |
| Omission | A process/channel fails to send (send-omission) or receive (receive-omission) messages. |
| Timing | Response is correct but arrives too late/early (in synchronous systems). |
| Response | Server returns an incorrect value or wrong state transition. |
| Arbitrary / Byzantine | A component behaves arbitrarily — may send conflicting or malicious messages. This is the hardest to tolerate. |
Use of Redundancy
Fault tolerance is achieved primarily through redundancy (masking failures by replicating):
- Information redundancy: extra bits, e.g. error-correcting codes (Hamming, CRC).
- Time redundancy: repeat an action (retransmit a request, retry a transaction).
- Physical/Hardware redundancy: replicate components, e.g. multiple processors/disks (RAID), Triple Modular Redundancy (TMR with voters).
- Software/Process redundancy: replicate processes into groups and use voting so that the failure of a minority is masked.
Agreement in Faulty Systems — Byzantine Generals Problem
Non-faulty processes must reach agreement on a value even when some processes are faulty/treacherous.
- Statement: Several generals (processes) must agree on a common plan (attack/retreat) by exchanging messages, but some generals are traitors who send contradictory messages. Loyal generals must (i) all decide the same plan, and (ii) follow a loyal commander's order.
- Lamport–Shostak–Pease result: With arbitrary (Byzantine) failures and oral (unsigned) messages, agreement is possible only if , where is the total number of processes and is the number of faulty ones. Equivalently, more than two-thirds of the processes must be correct. It requires rounds of message exchange.
- With signed (authenticated) messages, agreement is possible for any .
Example: With traitor, at least generals are needed; with 3 generals one of whom is a traitor, the two loyal ones cannot agree.
Conclusion: Fault tolerance combines failure detection, redundancy for masking, and agreement protocols (like Byzantine agreement) to keep the system correct despite faults.
Explain clock synchronization in distributed systems. Discuss Lamport's logical clocks and vector clocks with examples, and describe how the happened-before relation orders events.
Clock Synchronization in Distributed Systems
In a distributed system every machine has its own physical clock, and these clocks drift apart over time. Since there is no global clock, processes need a way to order events and agree on time. Physical clock synchronization keeps real-time clocks close; logical clocks only capture the ordering of events, which is often all that is needed.
The Happened-Before Relation ()
Lamport defined the happened-before relation to capture causality:
- If and are events in the same process and occurs before , then .
- If is the sending of a message and is its receipt, then .
- Transitivity: if and , then .
If neither nor , the events are concurrent ().
Lamport's Logical Clocks
Each process keeps a counter . Rules:
- IR1: before each event, .
- IR2: a message carries its send timestamp . On receipt, .
Property: if then . (The converse need not hold.)
Example: sends a message at local time 6; it arrives at whose clock reads 4. By IR2 the receive event gets , so the send (6) < receive (7), preserving causal order.
Vector Clocks
Lamport clocks cannot tell whether implies causality. Vector clocks fix this. Each process keeps a vector :
- VR1: before an internal/send event, .
- VR2: a message carries ; on receipt, for all , then .
Comparison: iff every component and at least one is strictly smaller. Then:
If neither nor , the events are concurrent.
Example: Start , . does an event and sends to . (at ) receives then bump . Since , the send causally precedes the receive.
Conclusion: The happened-before relation defines partial event ordering; Lamport clocks give a consistent total order but lose information, while vector clocks precisely capture causality and detect concurrency.
Section B: Short Answer Questions
Attempt any EIGHT questions.
What is a distributed system? Explain the goals and characteristics of a distributed system.
Distributed System
A distributed system is a collection of independent (autonomous) computers connected by a network that appears to its users as a single coherent system. The components cooperate by passing messages and share no common physical clock or memory.
Goals
- Resource sharing – share hardware, software and data (printers, files, databases) across the network.
- Transparency – hide the distribution so the system looks like one machine (access, location, replication, concurrency, failure transparency).
- Openness – use standard, well-defined interfaces/protocols so components from different vendors can interoperate and be extended.
- Scalability – grow in size, geography, or administration without major performance loss or redesign.
Characteristics
- Concurrency: many processes execute simultaneously and may share resources, requiring coordination.
- No global clock: coordination relies on message passing; only logical ordering of events is possible.
- Independent failures: components can fail independently and partially while the rest of the system keeps running, so the system must tolerate faults.
- Heterogeneity: different hardware, OS, and networks are handled, usually via middleware.
In short: a distributed system provides resource sharing with transparency, openness, scalability and fault tolerance over autonomous, concurrently-executing, clock-less machines.
Explain Lamport's logical clock with an example.
Lamport's Logical Clock
Because distributed processes have no shared physical clock, Lamport's logical clock assigns a monotonically increasing counter to events so that causally related events are correctly ordered. It implements the happened-before () relation such that:
Rules
Each process keeps a counter :
- IR1 (internal/send): before timestamping any event, increment: .
- IR2 (receive): a message carries its send timestamp ; the receiver sets .
Example
Two processes with different tick rates:
- ticks: 1, 2, 3, 4, 5, 6 …
- sends message m at its time 6.
- 's clock currently reads 4 when m arrives.
- By IR2: receive timestamp .
Thus the send (6) < receive (7), so the causal order send happened-before receive is preserved. Without the rule, the receive (4) would wrongly appear earlier than the send (6).
Total ordering: ties (equal timestamps in different processes) are broken using process IDs, giving a consistent global total order used in algorithms like Lamport's mutual exclusion.
Limitation: does not imply ; logical clocks cannot detect concurrency (vector clocks are needed for that).
What is a Remote Procedure Call (RPC)? Explain its working with a diagram.
Remote Procedure Call (RPC)
RPC is a communication mechanism that lets a program call a procedure located on a remote machine as if it were a local procedure call, hiding the underlying message passing from the programmer. It provides access transparency and was introduced by Birrell and Nelson.
Working (steps)
Client Server
| 1. call proc(args) |
v |
[Client stub] --2. marshal args--> |
| (pack into message) |
[Client OS] --3. send msg over network--> [Server OS]
|
4. msg passed to [Server stub]
| 5. unmarshal args
v
6. call actual procedure
|
7. result -> marshal
[Client OS] <--8. reply msg-------------- [Server OS]
|
[Client stub] 9. unmarshal result
v
10. return value to client
- The client calls the client stub (a local proxy procedure).
- The client stub marshals (packs) the parameters into a message.
- The client OS sends the message to the remote server.
- The server OS hands the message to the server stub.
- The server stub unmarshals the parameters.
- The server stub calls the actual procedure on the server. 7–10. The result is marshaled, sent back, unmarshaled by the client stub, and returned to the caller.
Key points
- Stubs hide marshaling/network code; IDL (Interface Definition Language) generates them.
- Parameter passing: call-by-value works directly; call-by-reference is hard (no shared address space) and is handled by copy/restore.
- Failure semantics: because the network can lose messages, RPC offers semantics such as at-least-once or at-most-once delivery.
In short: RPC makes distributed communication look like ordinary procedure calls through client/server stubs and marshaling.
Differentiate between centralized and distributed mutual exclusion algorithms.
Centralized vs Distributed Mutual Exclusion
Mutual exclusion ensures that only one process at a time enters a critical section (CS) accessing a shared resource.
Centralized Algorithm
A single coordinator grants permission. To enter the CS a process sends a request to the coordinator; the coordinator replies with a grant if the resource is free, otherwise queues the request. On exit the process sends a release and the coordinator grants the next queued request.
Distributed Algorithm (e.g. Ricart–Agrawala)
No central node: a process wanting the CS multicasts a timestamped request to all other processes and enters only after receiving OK/reply from all. Requests are ordered by Lamport timestamps to break ties.
Differences
| Aspect | Centralized | Distributed |
|---|---|---|
| Control | One coordinator decides | All processes participate |
| Messages per CS entry | 3 (request, grant, release) | 2(n−1) (request + reply to/from all) |
| Single point of failure | Yes (coordinator) | No single point (but more points can fail) |
| Bottleneck | Coordinator can be a bottleneck | Load is distributed |
| Implementation | Simple, easy | More complex, needs ordering |
| Fairness/Starvation | Fair, no starvation (FIFO queue) | Fair via timestamps |
Conclusion: The centralized scheme is simple and message-efficient but has a single point of failure and a bottleneck; the distributed scheme removes the single coordinator at the cost of more messages and complexity.
Explain distributed deadlock detection. What is a wait-for graph?
Distributed Deadlock Detection
A deadlock occurs when a set of processes are each waiting for a resource held by another in the set, so none can proceed. In a distributed system the resources, processes and the wait-for information are spread across many sites, making detection harder.
Wait-For Graph (WFG)
A wait-for graph is a directed graph in which:
- each node represents a process, and
- a directed edge means is waiting for a resource currently held by .
A cycle in the WFG indicates a deadlock. In a distributed system the full graph is the union of local WFGs held at different sites (a global WFG).
Approaches to Distributed Deadlock Detection
- Centralized: a coordinator builds a global WFG from the local WFGs and searches for cycles. Simple but a single point of failure and can report phantom (false) deadlocks due to message delays.
- Distributed: every site participates in detection (e.g. Chandy–Misra–Haas edge-chasing): when a process blocks, it sends a probe message along its outgoing wait-for edges; if the probe returns to its initiator, a cycle (deadlock) exists.
- Hierarchical: sites are organized in a tree; deadlock detection is done at the lowest common ancestor.
Recovery
Once detected, a deadlock is broken by victim selection — aborting/rolling back one process to release its resources.
In short: distributed deadlock detection finds cycles in the (distributed) wait-for graph, commonly via edge-chasing probe messages, then resolves them by aborting a victim.
Explain the Bully algorithm for electing a coordinator.
Bully Algorithm (Coordinator Election)
The Bully algorithm (Garcia-Molina) elects the process with the highest process ID as the coordinator when the current coordinator fails. It assumes the system is synchronous, each process knows the IDs of all others, and messages are reliable.
Message types
- ELECTION – announces an election to higher-ID processes.
- OK (ANSWER) – a reply that a higher process is alive and takes over.
- COORDINATOR – announces the winner to all processes.
Algorithm
When a process notices the coordinator is not responding:
- sends an ELECTION message to all processes with a higher ID.
- If no one with a higher ID responds (within a timeout), wins: it becomes the coordinator and sends a COORDINATOR message to all lower-ID processes.
- If a higher-ID process replies with OK, drops out; that higher process now holds the election (it repeats step 1 among still-higher processes).
- Eventually the highest-ID live process wins and broadcasts COORDINATOR.
A recovered process (or one with the highest ID) can immediately bully the others by starting an election and taking over — hence the name.
Example
Processes 1–7, coordinator 7 crashes. Process 4 detects it and sends ELECTION to 5, 6, 7. Processes 5 and 6 reply OK; 4 stops. 5 and 6 each hold elections; 6 replies OK to 5, so 5 stops. 6 sends ELECTION to 7 (no reply), so 6 wins and broadcasts COORDINATOR to all.
Cost: worst case messages.
What are the different kinds of transparency in a distributed system?
Types of Transparency in a Distributed System
Transparency means hiding the fact that the system's resources and processes are distributed across multiple machines, so users and applications perceive a single coherent system. The ISO/ANSA reference model defines the following kinds (ISO RM-ODP):
| Transparency | What it hides |
|---|---|
| Access | Differences in data representation and how a resource is accessed (local vs remote access look the same). |
| Location | Where a resource is physically located (name does not reveal location). |
| Migration | That a resource may move to another location; names stay valid. |
| Relocation | That a resource may move while in use. |
| Replication | That a resource is replicated; the user sees one logical copy. |
| Concurrency | That a resource is shared by several competing users simultaneously. |
| Failure | The failure and recovery of a resource, so the system appears to keep working. |
| Persistence | Whether a resource is in memory or on disk. |
In short: these transparencies (access, location, migration, relocation, replication, concurrency, failure, persistence) let the distributed system appear as one unified machine to its users.
What is middleware? Explain its role in a distributed system.
Middleware
Middleware is a software layer that sits between the operating system/network and the distributed applications. It hides the heterogeneity of the underlying networks, hardware and operating systems and provides a uniform programming model and common services, so developers can build distributed applications more easily.
+-------------------------------------------------+
| Distributed Application |
+-------------------------------------------------+
| MIDDLEWARE (common services, API) | <-- provides transparency
+-------------------------------------------------+
| Local OS | Local OS | Local OS | ... |
+-------------------------------------------------+
| Network |
+-------------------------------------------------+
Role in a Distributed System
- Provides transparency – masks access, location, replication, and failure differences so remote resources look local.
- Hides heterogeneity – lets different OS, hardware, and languages interoperate via standard interfaces.
- Communication abstractions – offers RPC, remote method invocation (RMI), and message-oriented middleware (MOM) instead of raw sockets.
- Common services – naming, persistence, security/authentication, transactions, concurrency control, and replication.
- Eases development – higher-level API so programmers focus on application logic, not networking details.
Examples: CORBA, Java RMI, DCOM, gRPC, and message brokers such as RabbitMQ/JMS.
In short: middleware is the glue that makes a collection of heterogeneous networked machines behave as one coherent distributed system.
Explain Cristian's algorithm for physical clock synchronization.
Cristian's Algorithm (Physical Clock Synchronization)
Cristian's algorithm synchronizes a client's clock with a time server (which holds accurate UTC time, e.g. from a radio/atomic source). It is suitable when round-trip times are small compared to the required accuracy.
Procedure
- The client sends a request to the time server at local time .
- The server replies with its current time (UTC).
- The client records the reply-receipt time . The round-trip time is .
- The message took roughly to travel back, so the client sets its clock to:
If the server's interrupt-handling time is known, a better estimate is:
Accuracy
The error is bounded by , where is the minimum one-way transmission time. Taking the average of several requests (discarding outliers with large RTT) improves accuracy.
Example
If , (so ms) and the server time , the client sets its clock to .
Limitations
- Clocks must only ever move forward, so if the new time is behind the current time, the clock is slowed down gradually rather than set back.
- Relies on a single time server (single point of failure); the Berkeley algorithm avoids needing an accurate external server.
Frequently asked questions
- Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2079?
- The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Distributed System (BSc CSIT, CSC462) 2079 paper come with solutions?
- Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2079 paper?
- The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Distributed System (BSc CSIT, CSC462) past paper free?
- Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.