BSc CSIT (TU) Science Distributed System (BSc CSIT, CSC462) Question Paper 2081 Nepal
This is the official BSc CSIT (TU) (Science stream) Distributed System (BSc CSIT, CSC462) question paper for 2081, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Distributed System (BSc CSIT, CSC462) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) exam or solving previous years' question papers, this 2081 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
What is data replication in distributed systems? Explain different consistency models and replica management techniques used to maintain consistency.
Data Replication in Distributed Systems
Data replication is the technique of maintaining multiple copies (replicas) of the same data on different machines (nodes) in a distributed system. The main reasons for replication are:
- Reliability/availability – if one replica fails, others can serve requests.
- Performance – clients access a nearby replica, reducing latency, and load is shared among replicas (scalability).
The key challenge is consistency: when one replica is updated, all copies must eventually reflect the change. This creates a trade-off between consistency, availability and performance.
Consistency Models
A consistency model is a contract between the data store and processes about how reads and writes behave.
1. Data-centric models
- Strict consistency – any read returns the result of the most recent write (requires global time; impossible in practice).
- Sequential consistency – the result of any execution is the same as if all operations were executed in some sequential order that respects each process's program order.
- Causal consistency – writes that are causally related must be seen in the same order by all processes; concurrent writes may be seen in different orders.
- FIFO (PRAM) consistency – writes from a single process are seen by others in the order issued.
- Weak / release / entry consistency – use synchronization variables (acquire/release) so consistency is enforced only at synchronization points.
2. Client-centric models (for eventual-consistency stores):
- Eventual consistency – if no new updates occur, all replicas eventually converge.
- Monotonic reads, monotonic writes, read-your-writes, writes-follow-reads – guarantees for a single client across replicas.
Replica Management Techniques
Replica placement
- Permanent replicas (e.g., mirror servers).
- Server-initiated replicas (created dynamically near demand, e.g., CDNs).
- Client-initiated replicas (caches).
Update propagation
- Push (server-based) vs Pull (client-based), or a lease-based hybrid.
- Propagate the update operation vs propagate the modified data vs invalidate.
Consistency protocols
- Primary-based protocols – a primary copy serializes all writes (remote-write or local-write).
- Replicated-write protocols – active replication (write sent to all replicas, often with totally ordered multicast) or quorum-based protocols where a read quorum and write quorum must satisfy and to prevent read–write and write–write conflicts.
Conclusion
Replication improves availability and performance but requires a chosen consistency model and an appropriate replica-management/consistency protocol to keep copies coherent.
Explain the architecture of a distributed file system. Discuss the design and working of the Network File System (NFS) or Andrew File System (AFS).
Distributed File System (DFS) Architecture
A distributed file system allows files stored on remote servers to be accessed by clients over a network as if they were local, providing access, location and migration transparency.
General architecture components:
- Client module – intercepts file system calls (often via a Virtual File System / VFS layer) and forwards remote operations to servers.
- File service / file server – stores files and serves read/write/lookup requests.
- Directory (name) service – maps human-readable path names to file identifiers.
- Caching – client-side caches reduce network traffic and latency; require a cache-consistency mechanism.
Services are usually built on RPC, are typically stateless or stateful, and aim for transparency, scalability and fault tolerance.
Network File System (NFS)
NFS (by Sun) is the classic DFS model.
- Architecture: Client and server communicate via RPC. The VFS layer on the client distinguishes local from remote files; an NFS client maps VFS operations to NFS RPC calls handled by the NFS server.
- Mounting: A remote directory is mounted into the client's local namespace; after mounting it is accessed transparently.
- Statelessness (classic NFSv2/v3): The server keeps no per-client open-file state. Every request (e.g.,
read,write) is self-contained and carries a file handle plus offset. This makes crash recovery simple — after a server reboot, clients just retry. Operations are made idempotent so retries are safe. - Caching: Clients cache data and attributes; consistency is approximate (validated using timestamps), so NFS provides only near one-copy semantics.
- Access model: Remote access model (operations executed on the server).
Andrew File System (AFS) — alternative
- Design goal: scalability for thousands of clients.
- Whole-file caching: When a file is opened, the entire file is copied to the client's local disk; reads/writes happen locally; on close, the file is written back to the server (the upload/download model).
- Callbacks: The server keeps state and issues a callback promise; if another client modifies the file, the server sends a callback invalidating the cached copy. This gives stronger consistency than NFS while minimizing server load.
- Session semantics: Changes become visible to others only after the file is closed.
Comparison
| Feature | NFS (v3) | AFS |
|---|---|---|
| Caching unit | Blocks | Whole file (on local disk) |
| Server state | Stateless | Stateful (callbacks) |
| Consistency | Validation by timestamps | Callback + session semantics |
| Scalability | Moderate | High |
What is fault tolerance? Explain failure models, the use of redundancy, and the concept of agreement in faulty systems (Byzantine generals problem).
Fault Tolerance
Fault tolerance is the ability of a distributed system to continue providing correct service (possibly degraded) even when some of its components fail. A system is dependable if it is available, reliable, safe and maintainable. The basic terms: a fault (defect) may cause an error (incorrect state) which may lead to a failure (system deviates from correct service).
Failure Models
Failures are classified by how a process/server can behave incorrectly:
- Crash (fail-stop) failure – a process halts and stays halted; it was working correctly until then.
- Omission failure – a server fails to respond (receive omission: it doesn't receive incoming messages; send omission: it doesn't send replies).
- Timing failure – the response is correct but arrives outside the specified time interval (too late/early).
- Response failure – the response is incorrect (value failure: wrong value; state-transition failure: wrong control flow).
- Arbitrary / Byzantine failure – the most severe: a process may produce arbitrary, even malicious or contradictory, outputs at arbitrary times.
Redundancy
Fault tolerance is achieved primarily through redundancy (masking failures):
- Information redundancy – extra bits, e.g., error-correcting codes, checksums.
- Time redundancy – repeat an operation (retransmission, re-execution), useful for transient faults.
- Physical (hardware/software) redundancy – replicate components/processes. Example: Triple Modular Redundancy (TMR) uses three units feeding voters that mask one faulty unit. To mask up to crash faults, replicas suffice; to mask Byzantine faults, replicas are needed.
Agreement in Faulty Systems: Byzantine Generals Problem
Processes must reach agreement (consensus) on a value despite some processes being faulty/traitorous. In the Byzantine Generals Problem, several generals must agree on a common plan (attack/retreat), but some generals are traitors who send conflicting messages.
Lamport's result: Agreement among processes tolerating Byzantine (arbitrary) faulty processes is possible only if
Thus with traitor at least generals are required; with 3 generals and 1 traitor, no solution exists. With reliable signed messages (authentication) this bound can be relaxed. Reaching agreement also requires enough rounds of message exchange.
Conclusion
Fault tolerance combines a clear failure model, redundancy to mask faults, and agreement protocols (subject to the bound) to keep distributed systems dependable.
Section B: Short Answer Questions
Attempt any EIGHT questions.
What is a distributed system? Explain the goals and characteristics of a distributed system.
A distributed system is a collection of independent computers (nodes) that appears to its users as a single coherent system, with the nodes coordinating their actions by passing messages over a network (no shared memory or global clock).
Goals:
- Resource sharing – share hardware, software and data among users.
- Transparency – hide distribution (access, location, replication, etc.) so the system looks like one machine.
- Openness – use standard interfaces/protocols, enabling interoperability and portability.
- Scalability – handle growth in users, data and geographic size without major redesign.
Characteristics:
- No global clock – processes coordinate using message passing and logical clocks.
- Concurrency – many components execute in parallel and share resources.
- Independent failures – components can fail independently; the rest should keep working (fault tolerance).
- Heterogeneity – different hardware, OS and networks, hidden by middleware.
- Autonomy of nodes – each node manages its own resources.
Explain Lamport's logical clock with an example.
Lamport's Logical Clock
Because distributed systems lack a global physical clock, Lamport's logical clock assigns numeric timestamps to events so that the happened-before relation () is captured, giving a consistent ordering of events.
Happened-before rules:
- If and are in the same process and occurs before , then .
- If is the sending of a message and is its receipt, then .
- Transitivity: if and then .
Clock update rules (each process keeps a counter ):
- Before each event, increment the local clock: .
- A message carries the sender's timestamp .
- On receipt, the receiver sets .
This guarantees the clock condition: if then .
Example
Processes and . sends message at its event with timestamp.
- : event (send ), , so .
- had local clock when it does an internal event (), then receives (timestamp ): .
Since the send (1) happened-before the receive (6), — the ordering is preserved.
Note: Lamport clocks give , but not the converse; vector clocks are needed to detect concurrency.
What is a Remote Procedure Call (RPC)? Explain its working with a diagram.
Remote Procedure Call (RPC)
RPC is a communication mechanism that allows a program to call a procedure that executes on a different (remote) machine, while making the call look like an ordinary local procedure call. It hides the underlying message passing, providing access transparency.
Working (steps)
Client machine Server machine
client server
| 1. call |
client stub -- 2. marshal & send --> server stub
| | 3. unmarshal, call
RPC runtime ----- network ------> RPC runtime
| | 4. server executes
client stub <-- 6. send reply ----- server stub <- 5. result
| 7. unmarshal |
return value
- The client calls the client stub as if it were a normal local procedure.
- The client stub marshals (packs) the parameters into a message and asks the local OS/RPC runtime to send it.
- The message travels over the network to the server; the server stub unmarshals the parameters.
- The server stub calls the actual server procedure, which executes and produces a result.
- The result is returned to the server stub, which marshals it into a reply message.
- The reply is sent back to the client machine.
- The client stub unmarshals the result and returns it to the client, which resumes as if the call returned locally.
Key points
- Stubs hide marshalling/networking; generated from an IDL interface definition.
- Parameter passing uses call by value (binding by reference is hard across address spaces).
- Binding locates the server (e.g., via a binder/port-mapper).
- Failure semantics: at-least-once, at-most-once, or exactly-once (ideal but hard); idempotent operations ease retries.
Differentiate between centralized and distributed mutual exclusion algorithms.
Centralized vs Distributed Mutual Exclusion
Mutual exclusion ensures that only one process at a time enters a critical section (CS) accessing a shared resource.
Centralized algorithm: One process acts as a coordinator. A process sends a REQUEST to the coordinator, which grants permission (sends OK) if the CS is free, otherwise queues the request; the process sends RELEASE on exit, and the coordinator grants the next queued request.
Distributed algorithm (e.g., Ricart–Agrawala): A process wishing to enter the CS sends a timestamped REQUEST to all other processes and enters only after receiving OK/REPLY from all of them. Decisions are made collectively using logical-clock timestamps to order requests.
| Aspect | Centralized | Distributed |
|---|---|---|
| Decision making | Single coordinator decides | All processes participate |
| Messages per CS entry | 3 (request, grant, release) | 2(N−1) (request + reply to/from others) |
| Single point of failure | Yes (coordinator) | No single coordinator; but failure of any node can block (needs failure handling) |
| Bottleneck | Coordinator can be a bottleneck | Load distributed |
| Implementation | Simple, easy | Complex; needs synchronized logical clocks |
| Fairness/ordering | FIFO by coordinator | By timestamp ordering |
| Number of failure points | 1 | N |
Summary: The centralized scheme is simple and message-efficient but has a single point of failure and bottleneck; the distributed scheme removes the single coordinator but costs more messages and is more complex and sensitive to individual node failures.
Explain distributed deadlock detection. What is a wait-for graph?
Distributed Deadlock Detection
A deadlock occurs when a set of processes are each blocked waiting for resources held by others in the set, so none can proceed. In a distributed system, processes and resources are spread across nodes, so no single site has the complete picture, making detection harder.
Wait-For Graph (WFG)
A wait-for graph is a directed graph used to detect deadlock:
- Nodes = processes.
- An edge means " is waiting for a resource currently held by ."
A deadlock exists if and only if the WFG contains a cycle (with single-instance resources). Example: is a cycle, so are deadlocked.
In a distributed system the global WFG is the union of local WFGs at each site connected by edges crossing sites.
Detection approaches
- Centralized detection – a coordinator maintains the global WFG; sites report local edges. Simple but a single point of failure and can detect phantom (false) deadlocks due to message delays.
- Distributed detection – each site participates; e.g., edge-chasing / probe-based algorithms (Chandy–Misra–Haas): a blocked process sends a probe message along outgoing wait-for edges. If a probe returns to its initiator, a cycle (deadlock) is detected.
- Hierarchical detection – sites organized in a tree; a parent detects deadlocks among its children.
Resolution: break the cycle by aborting/rolling back a victim process and releasing its resources.
Explain the Bully algorithm for electing a coordinator.
Bully Algorithm (Coordinator Election)
The Bully algorithm (Garcia-Molina) elects the process with the highest ID as the new coordinator after the current coordinator fails. It assumes each process has a unique ID, processes know each other's IDs, and message delivery is reliable.
Algorithm
When a process notices the coordinator is not responding, it starts an election:
- sends an ELECTION message to all processes with a higher ID than itself.
- If no higher process responds, wins and becomes the coordinator; it sends a COORDINATOR message to all other processes announcing itself.
- If a higher process responds (with an OK/ALIVE message), that process takes over the election; steps down and waits.
- Each higher process that received an ELECTION then repeats the procedure (holds its own election among still-higher processes).
- Eventually the highest-ID alive process becomes coordinator and broadcasts the COORDINATOR message.
If a previously crashed process with a higher ID recovers, it starts a new election and "bullies" the current coordinator out — hence the name.
Example
Processes ; coordinator crashes. Process detects this and sends ELECTION to . Processes and reply OK and start their own elections; does not reply. sends ELECTION to , gets no reply, so becomes the new coordinator and broadcasts COORDINATOR to all.
Message cost: in the worst case.
What are the different kinds of transparency in a distributed system?
Kinds of Transparency in a Distributed System
Transparency means hiding the fact that resources are distributed across multiple computers, so the system appears as a single coherent system to users and applications. The main kinds (ISO/ANSA classification) are:
- Access transparency – hides differences in data representation and in how a resource is accessed; local and remote resources are accessed using identical operations.
- Location transparency – hides where a resource is physically located (names do not reveal location).
- Migration transparency – hides that a resource may move to another location; its name stays the same after moving.
- Relocation transparency – hides that a resource may be moved while in use.
- Replication transparency – hides that multiple copies of a resource exist; the user sees a single logical resource.
- Concurrency transparency – hides that a resource may be shared by several competing users; access appears exclusive/consistent.
- Failure transparency – hides the failure and recovery of a resource; the user is unaware of faults.
- Persistence transparency – hides whether a resource is in memory or on disk.
Full transparency is not always desirable (e.g., it can hurt performance or conflict with location-awareness), so it is balanced against other goals.
What is middleware? Explain its role in a distributed system.
Middleware
Middleware is a software layer that sits between the operating system/network and the distributed applications. It provides a common programming abstraction across a heterogeneous collection of computers, networks and operating systems, hiding their differences from application developers.
Application 1 Application 2 Application 3
---------------------------------------------------
Middleware (common services & abstractions)
---------------------------------------------------
OS/Network OS/Network OS/Network
Role / Functions in a Distributed System
- Provides transparency – masks heterogeneity of hardware, OS and networks, and offers access/location/replication transparency so distributed resources look local.
- Communication abstractions – offers high-level mechanisms such as RPC, Remote Method Invocation (RMI), message-oriented middleware (MOM) and publish/subscribe, instead of raw sockets.
- Common services – naming/directory, security/authentication, transactions, concurrency control, persistence, and replication services.
- Interoperability & openness – standard interfaces (e.g., CORBA, RMI, web services, gRPC) let components written in different languages/platforms work together.
- Eases development – application programmers focus on business logic, not low-level networking and synchronization.
Examples: CORBA, Java RMI, Microsoft DCOM/.NET Remoting, gRPC, message brokers (RabbitMQ/Kafka).
Explain Cristian's algorithm for physical clock synchronization.
Cristian's Algorithm (Physical Clock Synchronization)
Cristian's algorithm synchronizes a client's clock with a time server that has access to an accurate (UTC) time source. It is suitable when round-trip times are short compared to the required accuracy.
Procedure
- The client process sends a request to the time server at local time asking for the current time.
- The server, on receipt, replies with its time .
- The client receives the reply at local time .
- The client estimates the message round-trip delay as and assumes the reply took half of it, so it sets its clock to:
If the (minimum) one-way message time (interrupt-handling/processing time at server) is known, a better estimate is:
Accuracy bound
The time value lies within the interval , so the error is at most
Example
If ms, reply received at ms, and server time , then
Limitation: Relies on a single time server (single point of failure) and assumes symmetric, small network delays; clocks must never be set backward (adjust gradually instead).
Frequently asked questions
- Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2081?
- The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2081 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Distributed System (BSc CSIT, CSC462) 2081 paper come with solutions?
- Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2081 paper?
- The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Distributed System (BSc CSIT, CSC462) past paper free?
- Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.