BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) Question Paper 2079 Nepal
This is the official BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) question paper for 2079, as set in the regular annual examination. It carries 80 full marks and a time allowance of 180 minutes, across 11 questions. On Kekkei you can attempt this Distributed System (IOE, CT 702 / ENCT 411) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) exam or solving previous years' question papers, this 2079 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt all / any as specified.
What is a distributed system? Explain its characteristics and the key challenges (heterogeneity, openness, scalability, failure handling, concurrency, transparency) faced in designing one. With suitable diagrams, differentiate between the client-server model and the peer-to-peer model of distributed systems, stating one practical application of each.
Distributed System
A distributed system is a collection of independent (autonomous) computers, connected by a network and equipped with distribution middleware, that appears to its users as a single coherent system. The components coordinate their actions only by passing messages, as they have no shared memory or global clock.
Characteristics
- Resource sharing of hardware, data and services across nodes.
- Concurrency — many users and processes execute simultaneously.
- No global clock — coordination is achieved purely by message passing.
- Independent failures — components can fail independently and partially.
Key Challenges in Design
| Challenge | Description |
|---|---|
| Heterogeneity | Nodes differ in hardware, OS, network and programming language; masked using middleware, common standards (IP, XML/JSON) and virtual machines. |
| Openness | The system can be extended and reimplemented; achieved through published, standard interfaces (e.g. RFCs, IDL). |
| Scalability | Should remain efficient as users/resources grow; needs avoidance of bottlenecks, replication, caching and decentralised algorithms. |
| Failure handling | Must detect, mask, tolerate and recover from partial failures using redundancy, timeouts and replication. |
| Concurrency | Shared resources accessed concurrently must remain consistent via synchronisation and mutual exclusion. |
| Transparency | Hide the distributed nature (access, location, replication, failure, concurrency, migration transparency). |
Client–Server vs Peer-to-Peer Model
Client–Server model (description of diagram): several clients send request messages to one (or a few) dedicated server(s); the server processes the request and returns a reply. Roles are fixed and asymmetric.
Client1 ---req---> +--------+
Client2 ---req---> | Server | ---> shared resource/DB
Client3 <--reply-- +--------+
Peer-to-Peer model (description of diagram): all nodes (peers) are functionally equivalent; each acts as both client and server, sharing resources directly with other peers without a central server.
Peer A <----> Peer B
^ \ / ^
| \ / |
v \/ v
Peer D <----> Peer C
| Aspect | Client–Server | Peer-to-Peer |
|---|---|---|
| Roles | Fixed (client / server) | Symmetric (every node both) |
| Control | Centralised | Decentralised |
| Scalability | Limited by server | High, self-scaling |
| Single point of failure | Yes (server) | No |
| Example application | Web (HTTP), email, NFS | BitTorrent file sharing, Bitcoin/blockchain |
Practical applications: Client–server → World Wide Web / a web mail service. Peer-to-peer → BitTorrent file distribution.
(a) Why is clock synchronization necessary in a distributed system? Explain Cristian's algorithm for synchronizing a client's clock with a time server, deriving the expression for the estimated time and the error bound. [7]
(b) Define Lamport's happened-before relation. For the following events, assign Lamport logical timestamps and explain why logical clocks alone cannot capture causality, motivating the use of vector clocks. [5]
(a) Clock Synchronization and Cristian's Algorithm [7]
Need for clock synchronization: Each computer has its own physical clock that drifts; there is no global clock. Yet many distributed tasks need a common notion of time — ordering events, timestamping transactions, cache/lease expiry, make/file-timestamp consistency, and authentication (e.g. Kerberos ticket validity). Hence clocks must be kept close to each other (internal sync) or to UTC (external sync).
Cristian's Algorithm
A client synchronizes with a time server holding an accurate (UTC) clock:
- Client records local send time and sends a request to the time server.
- Server inserts its current time in the reply.
- Client receives the reply at local time .
The round-trip time is . Assuming the request and reply paths take roughly equal time, the message took about to arrive, so the client sets its clock to:
Error bound: Let be the minimum one-way transit time. The server's timestamp was generated somewhere between and , so the uncertainty (accuracy) is bounded by:
A smaller and more symmetric gives a tighter bound; taking the reply with the smallest over several trials improves accuracy.
(b) Lamport's Happened-Before Relation [5]
Definition — the happened-before relation is the smallest relation satisfying:
- If and are events in the same process and comes before , then .
- If is the send of a message and is its receive, then .
- Transitivity: if and , then .
If neither nor , the events are concurrent ().
Lamport timestamps (rules): each process keeps a counter . Before every event . A message carries its sender's timestamp ; on receipt the receiver sets .
Example: Process P1 events a, b, c get . If b sends a message to event d in P2 (where P2's clock was 1), then . The clock guarantees: if then .
Why logical clocks cannot capture causality: The implication is one-way. , but the converse is false: does not imply — the two events may be concurrent yet receive ordered timestamps. So scalar timestamps cannot tell whether two events are causally related or independent.
Motivation for vector clocks: A vector clock keeps one counter per process. Then (component-wise, with at least one strictly smaller), and when neither vector dominates. Thus vector clocks fully and exactly characterise causality, which scalar Lamport clocks cannot.
(a) Describe the Ricart–Agrawala distributed algorithm for mutual exclusion. State the number of messages required per critical-section entry and compare it with the centralized and token-ring approaches in terms of message complexity and fault tolerance. [7]
(b) Explain the Bully election algorithm with an example of 5 processes when the current coordinator (highest-id process) crashes. How many messages are exchanged in the worst case? [5]
(a) Ricart–Agrawala Distributed Mutual Exclusion [7]
This is a timestamp-based, fully distributed algorithm using Lamport logical clocks among processes.
To request the critical section (CS), process :
- Sets its state to Wanted, builds a timestamped REQUEST and sends it to all other processes.
- Waits until it has received an OK/REPLY from all processes, then enters the CS.
On receiving a REQUEST , process :
- If is in the CS, it defers the reply (queues it).
- If is also Wanted, it compares timestamps: it replies OK if the incoming request has the lower pair, otherwise it defers.
- If is Released (not interested), it replies OK immediately.
On exiting the CS: sends the deferred OK replies to all queued processes.
Message complexity: messages per CS entry — REQUESTs + REPLYs.
Comparison
| Approach | Messages / CS entry | Fault tolerance |
|---|---|---|
| Centralized (coordinator) | 3 (request, grant, release) | Poor — coordinator is a single point of failure |
| Token-ring | 1 to ∞ (depends on token position) | Poor — token loss / node crash breaks it |
| Ricart–Agrawala | Poor — any crashed process blocks everyone (failure of any node prevents replies); needs failure detection to remove dead nodes |
Ricart–Agrawala removes the central bottleneck but costs more messages and tolerates failures worse, because every process must respond.
(b) Bully Election Algorithm [5]
Used to elect the process with the highest id as coordinator.
Steps: When a process notices the coordinator has crashed, it sends an ELECTION message to all processes with higher ids. If none replies, wins and broadcasts a COORDINATOR message. If a higher process replies with OK, drops out and that higher process restarts the election among still-higher processes.
Example (processes 1–5, coordinator 5 crashes):
- Suppose process 2 detects the crash. It sends ELECTION to 3, 4, 5.
- 3 and 4 reply OK (5 is dead); 2 steps down.
- 3 sends ELECTION to 4, 5; 4 replies OK; 3 steps down.
- 4 sends ELECTION to 5; no reply (5 dead). 4 wins and sends COORDINATOR to 1, 2, 3.
Process 4 becomes the new coordinator.
Worst-case messages: The worst case occurs when the lowest-id process detects the failure, giving messages — approximately on the order of messages.
Discuss the architecture and design goals of the Sun Network File System (NFS). Explain the role of the Virtual File System (VFS), stateless server design, and the use of mount protocol. How does NFS differ from the Andrew File System (AFS) with respect to caching, callbacks, and scalability?
Sun Network File System (NFS)
NFS allows a client to access files stored on a remote server transparently, as if they were local, over a network using RPC.
Design Goals
- Access transparency — same system calls (
open,read,write) work on remote files. - OS and hardware independence — protocol defined via an external IDL (XDR).
- Crash recovery — a stateless server simplifies recovery.
- Reasonable performance with caching, and good portability.
Architecture
Virtual File System (VFS): An abstraction layer in the kernel that presents a uniform file-system interface. It maintains a vnode for every open file. The VFS decides, per request, whether to route the operation to the local file system or to the NFS client, which issues an RPC to the remote server. This makes the location of the file invisible to applications.
Stateless server design: The NFS server keeps no per-client state (no open-file table, no record of past requests). Every request is self-contained, carrying a file handle and an explicit byte offset. Benefits: if the server crashes and restarts, the client simply retries — no recovery protocol is needed; operations are made idempotent. The drawback is that true open/close semantics and server-enforced locking are hard (handled by a separate lock manager).
Mount protocol: A client mounts a remote exported directory into its own namespace. The client sends the pathname to the server's mount service, which returns a file handle for the root of that subtree. Thereafter the client uses this handle. Mounting can be done at boot or on demand (automounter).
NFS vs Andrew File System (AFS)
| Feature | NFS | AFS |
|---|---|---|
| Caching | Caches blocks in memory; checks freshness with the server frequently | Caches whole files on local disk on open |
| Consistency model | Validation on access (timestamp checks), weak | Session semantics — writes visible on close |
| Callbacks | None — client must poll/validate (stateless) | Server keeps callbacks (promises); it notifies clients when a cached file changes |
| Server state | Stateless | Stateful (tracks callbacks) |
| Scalability | Limited — frequent server validation traffic | High — whole-file caching + callbacks drastically cut server load, so AFS scales to far more clients |
Summary: NFS favours simplicity and crash-resilience via statelessness but generates more validation traffic; AFS uses whole-file caching with server callbacks to greatly reduce server interaction, giving much better scalability at the cost of a stateful server.
Section B: Short Answer Questions
Attempt all / any as specified.
Explain the working of a Remote Procedure Call (RPC) with a neat diagram showing the role of client stub, server stub, and marshalling/unmarshalling. List the different RPC call semantics (at-least-once, at-most-once, exactly-once) and state when each is appropriate.
Remote Procedure Call (RPC)
RPC lets a program call a procedure located on another machine as if it were a local call, hiding the message passing from the programmer.
Working (description of diagram)
Client process Server process
call foo(x) foo(x){...}
| ^
v |
[Client Stub] --marshal--> network --> [Server Stub] --unmarshal--> server proc
^ |
| <--unmarshal-- network <--marshal-- v
return value return result
- The client calls a local client stub (proxy) with parameters.
- The client stub marshals the parameters into a message (packs them into a standard byte format) and sends it to the server.
- The server stub receives and unmarshals the message, extracting the parameters.
- The server stub calls the actual server procedure.
- The result is marshalled by the server stub, sent back, unmarshalled by the client stub, and returned to the caller.
Marshalling = converting parameters/results into a flat, machine-independent transmission format; unmarshalling = the reverse.
RPC Call Semantics
| Semantics | Behaviour | When appropriate |
|---|---|---|
| At-least-once | Client keeps retransmitting until a reply arrives; the call may execute more than once | Only safe for idempotent operations (e.g. read a value) |
| At-most-once | Server filters duplicate requests so the procedure runs 0 or 1 time; an error is returned if it failed | The usual, practical choice for general (non-idempotent) operations like "add money" |
| Exactly-once | The call always runs exactly once despite failures | The ideal but hard to guarantee fully; approximated by at-most-once with reliable retry/logging for critical transactions |
Maybe semantics (no retransmission) also exists but is unreliable and rarely used.
Differentiate between connection-oriented (TCP) and connectionless (UDP) interprocess communication. Explain the request-reply protocol and the problems of message loss, duplication, and ordering in IPC, along with the mechanisms used to handle them.
TCP vs UDP Interprocess Communication
| Feature | Connection-oriented (TCP) | Connectionless (UDP) |
|---|---|---|
| Connection | Handshake establishes a connection first | No connection; send and forget |
| Reliability | Reliable — acks, retransmission | Unreliable — best effort |
| Ordering | Guaranteed byte-stream order | No ordering guarantee |
| Duplicates | Removed | Possible |
| Flow/congestion control | Yes | No |
| Overhead / speed | Higher overhead, slower | Low overhead, fast |
| Use | File transfer, HTTP, email | DNS, VoIP, streaming, request-reply |
Request–Reply Protocol
The basic protocol underlying RPC/RMI. The client sends a request message (operation + arguments) and blocks; the server executes and returns a reply. Often a third acknowledge-reply is added. It is typically built over UDP for efficiency, with reliability added at the application level.
Problems in IPC and their Handling
- Message loss: A request or reply may be dropped. Handled by timeouts and retransmission of the request.
- Duplication: Retransmission can cause the server to receive the same request twice. Handled by attaching a unique request ID; the server keeps a history/cache to detect and discard duplicates and re-send the saved reply (giving at-most-once semantics).
- Ordering: Messages may arrive out of order over a connectionless transport. Handled by sequence numbers so the receiver can reorder or discard out-of-order messages.
TCP solves all three at the transport layer; with UDP these mechanisms must be implemented in the request-reply layer itself.
What is replication in a distributed system? Explain the difference between active replication and passive (primary-backup) replication, and discuss how each handles a replica failure.
Replication
Replication is maintaining multiple copies (replicas) of data or a service on different machines to improve reliability/fault tolerance, availability, and performance (load sharing and lower latency).
Active vs Passive Replication
Active replication (state-machine replication)
- Every replica is a full server and processes every client request in the same order (using a totally-ordered, reliable multicast).
- All replicas independently compute and produce identical results; the client takes the first or majority response.
- Replica failure handling: Because every replica does the work, the failure of one replica is transparent — the remaining replicas still produce the result; no failover delay. It can even mask Byzantine faults using voting (with enough replicas).
Passive (primary-backup) replication
- One replica is the primary; it handles all requests, updates its state, and propagates the new state to one or more backups, then replies to the client.
- Backups stay idle, only receiving state updates.
- Replica failure handling: If a backup fails, service continues unaffected. If the primary fails, a backup is promoted to primary (failover/election); requests in flight may need to be re-sent, causing a short interruption. Tolerates crash (fail-stop) faults but not Byzantine faults.
| Active | Passive | |
|---|---|---|
| Work done | All replicas | Only primary |
| Failure masking | Immediate, transparent | Failover needed for primary |
| Cost | High (every replica computes) | Lower |
| Ordering requirement | Totally ordered multicast | Primary defines order |
Describe the token-ring algorithm for distributed mutual exclusion. What happens when (a) the token is lost and (b) a process holding the token crashes? Suggest recovery mechanisms for each case.
Token-Ring Algorithm for Mutual Exclusion
The processes are arranged in a logical ring, each knowing its successor. A single token circulates around the ring in one direction.
Rule: A process may enter the critical section (CS) only if it holds the token. After leaving the CS (or if it does not need the CS), it passes the token to its successor. This guarantees mutual exclusion (only one token) and freedom from starvation (token visits everyone in bounded time).
Message cost: between 1 and infinitely many messages per entry, since the token keeps circulating even when no one needs the CS.
Failure Cases and Recovery
(a) Token is lost (e.g. dropped on a link or by a crash while in transit): No process can ever enter the CS — the system deadlocks. Recovery: Detect the loss using a timeout (no token seen for too long) and regenerate a single token. A monitor/coordinator or a leader election (e.g. Bully) is used to ensure exactly one new token is created, avoiding duplicate tokens.
(b) Process holding the token crashes: The token is lost and the ring is broken — the dead process can neither use nor forward the token, and its predecessor cannot reach its successor. Recovery: Detect the crash (acknowledged token hand-off: the sender keeps the token until the successor acknowledges receipt, so it can retry). Reconfigure the ring to bypass the dead node by connecting its predecessor to the next live successor, then regenerate the token via election. Acknowledged hand-off plus ring repair restores operation.
Explain the Berkeley algorithm for internal clock synchronization with an example. How does it differ from Cristian's algorithm, and why is it suitable when no machine has an accurate UTC source?
Berkeley Algorithm (Internal Synchronization)
The Berkeley algorithm keeps a group of machines' clocks mutually consistent without assuming any machine has accurate UTC.
Steps
- One machine is chosen as the master (coordinator); the rest are slaves.
- The master polls every slave, asking for its current time.
- Each slave replies; the master estimates each slave's time accounting for round-trip delay.
- The master computes the average of all the clock values (its own included), often discarding outliers.
- Instead of sending absolute times, the master sends each machine the adjustment (offset) it must apply to its clock to match the average.
Example
Master reads 3:00:00. Slaves report 2:59:50 and 3:00:10 (after RTT correction). Average . Master tells slave 1 to advance by +10 s, slave 2 to set back by −10 s, and itself by 0. All clocks now agree at 3:00:00. (Clocks are slowed/sped gradually, never set backward abruptly.)
Difference from Cristian's Algorithm
| Cristian's | Berkeley |
|---|---|
| External sync to a UTC time server | Internal sync (machines agree among themselves) |
| Passive server just gives its time; clients adjust to it | Active master polls all, computes average, sends offsets |
| Needs an accurate UTC source | Needs no accurate UTC source |
Why suitable without UTC: Berkeley only aims to keep clocks close to each other, using their average as the reference. It therefore works perfectly in an isolated network where no machine has access to a UTC/atomic source.
Define Remote Method Invocation (RMI). Explain the role of the proxy, the remote reference module, and the object registry. How does RMI differ from conventional RPC?
Remote Method Invocation (RMI)
RMI extends the RPC concept to objects: it lets an object in one process (JVM) invoke methods on a remote object living in another process/machine, as if it were local. It is object-oriented and supports passing object references and exceptions.
Key Components
- Proxy (stub): A local stand-in object on the client that implements the same remote interface. The client calls the proxy; it marshals the method id and arguments, forwards them to the remote object, and unmarshals the result. It makes the remote object look local.
- Remote Reference Module: Translates between local and remote object references. It maintains the mapping (remote reference ↔ local object via a remote object table) and is responsible for creating remote references and routing invocations to the correct object.
- Object Registry (e.g.
rmiregistry): A naming/lookup service. The server binds a remote object to a name; clients look up that name to obtain a remote reference (proxy). This is how a client first locates a remote object.
A skeleton (server-side stub) unmarshals the request and dispatches it to the actual object.
RMI vs Conventional RPC
| RPC | RMI |
|---|---|
| Procedure-oriented — calls functions | Object-oriented — invokes methods on objects |
| Parameters are simple data | Can pass objects (by value via serialization) and remote references |
| No object identity / inheritance / polymorphism | Supports object references, inheritance, polymorphism |
| Language/platform neutral (e.g. C) | Tied to an object model (e.g. Java RMI between JVMs) |
| No built-in distributed garbage collection | Supports distributed garbage collection |
In short, RMI is the object-oriented evolution of RPC.
List and briefly explain any four types of transparency (access, location, replication, failure, concurrency, migration) that a distributed system should provide to its users.
Types of Transparency in a Distributed System
Transparency means hiding the distributed nature of the system from users and programmers, so it appears as a single coherent system. Four important types:
-
Access transparency — Local and remote resources are accessed using identical operations. A user reads a remote file with the same
open/readcalls used for a local file; differences in data representation and access mechanism are hidden. -
Location transparency — Resources are accessed without knowing their physical (network/host) location. A URL or logical name identifies the resource regardless of which server holds it.
-
Replication transparency — Multiple copies (replicas) of a resource may exist for reliability and performance, but the user sees a single logical resource and is unaware that replication exists or which replica is used.
-
Failure transparency — Faults in components are hidden (masked); the system continues to function (via retries, replication) so users and applications can complete their work despite partial failures.
(Others: concurrency transparency — concurrent users share resources without interference; migration transparency — resources can move without affecting access.)
Frequently asked questions
- Where can I find the BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) question paper 2079?
- The full BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Distributed System (IOE, CT 702 / ENCT 411) 2079 paper come with solutions?
- Yes. Every question on this Distributed System (IOE, CT 702 / ENCT 411) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) 2079 paper?
- The BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) 2079 paper carries 80 full marks and is meant to be completed in 180 minutes, across 11 questions.
- Is practising this Distributed System (IOE, CT 702 / ENCT 411) past paper free?
- Yes — reading and attempting this Distributed System (IOE, CT 702 / ENCT 411) past paper on Kekkei is completely free.