BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) Question Paper 2078 Nepal
This is the official BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) question paper for 2078, as set in the regular annual examination. It carries 80 full marks and a time allowance of 180 minutes, across 11 questions. On Kekkei you can attempt this Distributed System (IOE, CT 702 / ENCT 411) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) exam or solving previous years' question papers, this 2078 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt all / any as specified.
Define a distributed system and explain its key characteristics. With suitable diagrams, compare and contrast the client-server model and the peer-to-peer model as architectural models of distributed systems. Discuss the major design challenges (such as heterogeneity, openness, scalability, and transparency) that a distributed system must address.
Distributed System
A distributed system is a collection of independent (autonomous) computers connected by a network that appears to its users as a single coherent system. The computers coordinate by passing messages and share resources without a global clock or shared memory.
Key Characteristics
- Concurrency of components — multiple machines execute simultaneously and share resources.
- No global clock — coordination relies on message passing, not a common physical time.
- Independent failures — components can fail individually without halting the whole system.
- Resource sharing — hardware, data, and services are shared across nodes.
- Transparency — the distribution is hidden from users.
Architectural Models
Client–Server Model
Client ---- request ----> [ Server ]
Client <--- response ---- (resource/service)
A central server holds resources/services; clients send requests and receive responses. Roles are fixed and asymmetric.
Peer-to-Peer (P2P) Model
Peer <----> Peer
^ \ / ^
| \ / |
Peer <-- X --> Peer (every node is both client and server)
All nodes (peers) are functionally equivalent — each acts as both a client and a server, sharing its own resources directly with others.
Comparison
| Aspect | Client–Server | Peer-to-Peer |
|---|---|---|
| Roles | Asymmetric (fixed client/server) | Symmetric (every peer is both) |
| Resource location | Centralized on server | Distributed across peers |
| Scalability | Limited — server is a bottleneck | High — capacity grows with peers |
| Reliability | Server is a single point of failure | No single point of failure |
| Management/security | Easier, centrally controlled | Harder, decentralized |
| Examples | Web (HTTP), email, DBMS | BitTorrent, blockchain, Gnutella |
Major Design Challenges
- Heterogeneity — diverse hardware, OS, networks, and languages; addressed by middleware and common standards/protocols (e.g., XML, IDL).
- Openness — the system can be extended and re-implemented via published, standard interfaces.
- Scalability — performance must remain acceptable as the number of users/resources grows; achieved through replication, caching, and avoiding centralized bottlenecks.
- Transparency — hiding distribution: access, location, concurrency, replication, failure, migration, performance, and scaling transparency.
- Fault tolerance, security, and concurrency control are further key concerns.
(a) Explain why clock synchronization is necessary in a distributed system and describe Cristian's algorithm for synchronizing a client clock with a time server, including how it compensates for message propagation delay. [6]
(b) Define Lamport's happened-before relation. Given the following events across three processes, assign Lamport logical clock timestamps to every event and show how the algorithm enforces causal ordering of messages. [6]
(a) Clock Synchronization and Cristian's Algorithm [6]
Why synchronization is needed: Each node has its own physical clock that drifts at a different rate, so clocks gradually diverge. Many distributed tasks — ordering events, timestamps, make-style consistency checks, cache expiry, authentication tickets — require nodes to agree on time. Hence clocks must be periodically synchronized.
Cristian's Algorithm
A passive time server holds the reference time (e.g., from UTC). A client synchronizes as follows:
- At time (by its own clock), sends a request to .
- receives it, reads its time , and replies with .
- receives the reply at (its own clock).
The round-trip time is . Assuming the request and reply legs take equal time, the one-way propagation delay is estimated as . The client therefore sets its clock to:
Compensation: because the server's timestamp is already old by the time it arrives, adding half the round-trip delay corrects for message propagation. If the minimum one-way transmission time is known, accuracy is bounded by . The clock should be adjusted gradually (slewing) rather than set backward, to avoid time running in reverse.
(b) Lamport's Happened-Before and Logical Clocks [6]
Happened-before relation (): the smallest relation satisfying:
- If and are events in the same process and occurs before , then .
- If is the send of a message and is its receive, then .
- Transitivity: if and then .
If neither nor , the events are concurrent ().
Lamport clock rules: each process keeps a counter .
- Before each event, .
- A message carries timestamp ; on receipt, .
Worked example — three processes; arrows = messages:
| Step | P1 | P2 | P3 |
|---|---|---|---|
| local event | 1 | ||
| send m1: P1→P2 | 2 | recv → max(0,2)+1 = 3 | |
| local event | 4 | ||
| send m2: P2→P3 | 5 | recv → max(0,5)+1 = 6 | |
| send m3: P3→P1 | recv → max(2,6)+1 = 7 | 7 |
For every message, the send timestamp is strictly less than the receive timestamp (e.g., m1: ; m2: ), so the property holds, enforcing causal ordering. (Note: the converse is not guaranteed — equal/ordered clock values do not imply causality; that requires vector clocks.)
(a) Describe the Ricart-Agrawala distributed mutual exclusion algorithm. State the number of messages required per critical-section entry and analyse its correctness with respect to safety, liveness, and fairness. [7]
(b) Explain the Bully election algorithm with an example of how a new coordinator is elected when the current coordinator crashes. [5]
(a) Ricart–Agrawala Algorithm [7]
A fully distributed, timestamp-based mutual exclusion algorithm (no coordinator). Each process keeps a Lamport clock.
To enter the critical section (CS), process :
- Sets its state to Wanted, builds timestamp , and sends a
REQUEST(T_i, i)to all other processes. - Waits until it has received an
OK(reply) from all processes, then enters the CS.
On receiving REQUEST(T_j, j) at :
- If is not interested in the CS → reply
OKimmediately. - If is in the CS → queue the request (defer reply).
- If also wants the CS → compare timestamps; the lower wins. If the requester wins, reply
OK; otherwise defer.
On exiting the CS: send OK to all deferred (queued) requests.
Message Cost
Per CS entry: requests replies messages.
Correctness
- Safety (mutual exclusion): two processes wanting the CS concurrently cannot both proceed, because the lower-timestamp request is granted first and the other's reply is deferred until exit. Lamport timestamps with process IDs break ties, giving a total order — only one can hold all OKs at once.
- Liveness (no deadlock/starvation): the timestamp total order ensures the oldest request is always served, so every request eventually succeeds; there is no circular wait.
- Fairness: requests are honored in increasing timestamp order (FCFS by logical time), so no process can be indefinitely overtaken.
(b) Bully Election Algorithm [5]
Used to elect a coordinator (highest process ID) when the current one fails. When notices the coordinator is unresponsive:
- sends an
ELECTIONmessage to all processes with higher IDs. - If no higher process replies (with
OK/ALIVE) within a timeout, wins and sendsCOORDINATORto all → it becomes the new coordinator. - If a higher process replies, drops out; that higher process now runs its own election.
Example — processes ; coordinator crashes. Suppose detects this:
- sends
ELECTIONto . - and reply
OK(so steps down) and each start their own elections. - sends
ELECTIONto ; is down, no reply. - gets no higher reply → becomes coordinator and broadcasts
COORDINATORto all lower processes.
The highest live process always "bullies" the others into submission. Worst-case message complexity is .
Explain the working of Remote Procedure Call (RPC) with a clear diagram showing the roles of the client stub, server stub (skeleton), and the RPC runtime. Discuss the steps of parameter marshalling and the different call semantics (at-least-once, at-most-once, maybe). How does Remote Method Invocation (RMI) differ from RPC?
Remote Procedure Call (RPC)
RPC lets a program call a procedure on a remote machine as if it were local, hiding the message passing. It provides access and location transparency.
Components and Flow
Client Server
------ ------
call f(args) real f(args)
| ^
v |
[Client Stub] --marshal--> network --> [Server Stub/Skeleton]
^ |
| <--unmarshal-- network <--marshal-- (result)
RPC Runtime <===== messages =====> RPC Runtime
- Client invokes the local client stub (proxy) like an ordinary procedure.
- The client stub marshals the parameters into a message and asks the RPC runtime to send it.
- The runtime transmits the request over the network to the server's runtime.
- The server stub (skeleton) unmarshals the parameters and calls the real server procedure.
- The result is marshalled back through the skeleton → runtime → network → client stub, which unmarshals it and returns the value to the client.
Parameter Marshalling
Marshalling is packing parameters/results into a flat, machine-independent message; unmarshalling reverses it. It handles differences in byte order (big- vs little-endian), data representation, and alignment, often using a standard format (e.g., XDR). Pointers cannot be passed directly, so call-by-reference is usually emulated by copy/restore of the referenced data.
Call Semantics
- Maybe — no retransmission; the call may or may not execute (used when failures are acceptable; weakest guarantee).
- At-least-once — the client retransmits on timeout, so the call executes one or more times. Safe only for idempotent operations.
- At-most-once — duplicates are filtered (via request IDs), so the call executes exactly zero or one time; the strongest practical guarantee.
RMI vs RPC
| RPC | RMI |
|---|---|
| Procedural (calls functions) | Object-oriented (invokes methods on remote objects) |
| Language-/platform-neutral (C-style) | Java-based; works with objects and inheritance |
| Passes simple data parameters | Can pass objects (serialized), including remote references |
| No object references | Uses remote object references and supports distributed garbage collection |
In short, RMI is the object-oriented analogue of RPC: it extends the same client-stub/server-skeleton mechanism to method invocation on remote objects with object serialization.
Section B: Short Answer Questions
Attempt all / any as specified.
Differentiate between synchronous and asynchronous communication, and between blocking and non-blocking send/receive primitives in interprocess communication. Explain how sockets support the message-passing model with a brief description of TCP-based and UDP-based communication.
Synchronous vs Asynchronous Communication
- Synchronous: sender and receiver are tightly coupled in time — the sending and receiving operations are coordinated. The sender blocks until the message is received (and often acknowledged); both must be ready together.
- Asynchronous: the send operation is decoupled — the sender continues without waiting for the receiver. Messages may be buffered, so sender and receiver need not be active simultaneously.
Blocking vs Non-Blocking Primitives
- Blocking send: the sender is suspended until the message has been transmitted (or buffered/delivered). Blocking receive: the receiver waits until a message arrives.
- Non-blocking send: control returns immediately after the message is copied to a buffer; the process continues, checking completion later. Non-blocking receive: returns at once whether or not a message is available (the process polls or is signaled later).
Blocking primitives are simpler and safer; non-blocking primitives give higher concurrency but require care (buffer must not be reused until the send completes).
Sockets and the Message-Passing Model
A socket is an endpoint of communication identified by an (IP address, port) pair. Two sockets, one per process, form a channel over which messages are exchanged — directly realizing message passing.
- TCP-based (stream) sockets: connection-oriented and reliable. A connection is established (3-way handshake); data is delivered as an ordered, error-checked byte stream with retransmission and flow control. Used when reliability matters (file transfer, HTTP).
- UDP-based (datagram) sockets: connectionless and unreliable. Independent datagrams are sent with no setup, no ordering, and no delivery guarantee, giving lower latency/overhead. Used for DNS, streaming, and latency-sensitive messaging.
Explain the architecture of a Distributed File System (DFS). With reference to Sun NFS, describe the role of the virtual file system (VFS), the use of stateless servers, and how caching is used to improve performance.
Distributed File System (DFS) Architecture
A DFS lets files stored on remote servers be accessed by clients across a network as if they were local, providing location and access transparency. Typical components:
- Client module — intercepts file operations and forwards remote ones.
- Flat file / storage service — stores file contents on servers.
- Directory (name) service — maps human-readable names to file identifiers.
Sun NFS
NFS is a widely used DFS based on the client–server model using RPC over (originally) UDP/TCP.
Virtual File System (VFS)
The VFS is an abstraction layer in the OS kernel that gives a uniform interface for both local and remote files. It distinguishes local files from NFS-mounted files using vnodes: each open file is represented by a vnode that points either to a local inode or to a remote NFS file (via a file handle). This lets applications use the same system calls regardless of where the file physically lives, achieving access transparency.
Stateless Servers
NFS servers (classically NFSv2/v3) are stateless — the server keeps no record of which clients have which files open. Each request (e.g., read with an explicit offset and file handle) is self-contained and idempotent.
- Advantage: crash recovery is trivial — after a server reboot, clients simply retry; there is no per-client state to rebuild.
- Trade-off: the server cannot easily support file locking on its own (handled by a separate lock manager).
Caching for Performance
- Client-side caching of file data blocks and attributes reduces repeated network round-trips and server load.
- Read-ahead and delayed write improve throughput.
- Cached attributes are revalidated using timestamps/timeouts (validation period), and writes are flushed periodically, balancing performance against consistency (NFS offers approximate, not strict, consistency).
What is replication and why is it used in distributed systems? Explain the difference between active replication and passive (primary-backup) replication, and discuss how each handles the failure of a replica.
Replication
Replication is maintaining multiple copies (replicas) of data or services on different machines. It is used to:
- Improve reliability / fault tolerance — if one replica fails, others continue.
- Improve performance / availability — requests are served from a nearby or less-loaded replica, increasing throughput and reducing latency.
The central challenge is keeping replicas consistent.
Active Replication
Every replica is a full server and processes every client request independently in the same order (using totally-ordered multicast). The client (or a front end) collects responses and uses the majority/first result.
- Handling failure: because all replicas execute every request, the crash of one replica is masked automatically — surviving replicas already have identical state and continue without any takeover delay. It can even tolerate Byzantine failures via voting (with enough replicas).
Passive (Primary–Backup) Replication
One replica is the primary; it handles all requests and, before responding, propagates its state updates to one or more backups. Backups stay idle, only receiving updates.
- Handling failure: if a backup fails, nothing changes for clients. If the primary fails, the remaining replicas elect a new primary from the backups, which takes over using the last received state. This introduces a short fail-over delay, and updates not yet propagated may be lost (tolerates only crash, not Byzantine, failures).
Comparison
| Active | Passive (Primary-Backup) | |
|---|---|---|
| Who processes requests | All replicas | Only the primary |
| Failure masking | Immediate, no takeover | Requires fail-over to new primary |
| Overhead | High (every replica computes) | Lower (backups idle) |
| Fault types | Crash + Byzantine (with voting) | Crash only |
Describe the Berkeley algorithm for clock synchronization. How does it differ from Cristian's algorithm, and in what situations is it more appropriate? Illustrate with an example of a master polling three slave clocks.
Berkeley Algorithm
The Berkeley algorithm is an internal clock synchronization method that brings a set of clocks into agreement without a reference UTC source.
Procedure
- A coordinator (master/time daemon) is chosen; it polls all slaves periodically for their clock values.
- The master estimates each slave's time (accounting for round-trip delay, like Cristian's) and computes the average of all clocks, discarding outliers (clocks differing too much).
- Instead of sending the absolute time, the master sends each machine an adjustment (offset) — how much to advance or retard its clock — and adjusts its own.
Clocks are slewed gradually so time never jumps backward.
Difference from Cristian's Algorithm
| Cristian's | Berkeley |
|---|---|
| Passive server; clients request the time | Active master polls all slaves |
| Needs an accurate external time source (UTC) | No external reference needed |
| Synchronizes a client to the server's time | Averages all clocks toward a common value |
| External synchronization | Internal synchronization |
When More Appropriate
Berkeley is better when no machine has an accurate UTC source and the goal is merely that all nodes agree with each other (internal consistency), e.g., an isolated LAN.
Example — master polls three slaves
Suppose master time = 3:00:00. After delay-corrected polling it gets: master 3:00:00, slave A 3:00:25, slave B 2:59:50, slave C 3:00:15.
- Sum of offsets relative to master s over 4 clocks → average offset s, so target time .
- Master sends adjustments: master +7.5 s, A −17.5 s, B +17.5 s, C −7.5 s.
All four clocks converge to the agreed average time.
Compare the centralized, token-ring, and distributed approaches to mutual exclusion in distributed systems in terms of the number of messages per entry, single point of failure, and delay before entry. State one advantage and one disadvantage of each approach.
Mutual Exclusion Approaches Compared
Comparison Table
| Approach | Messages per CS entry | Single point of failure | Delay before entry (in msg times) |
|---|---|---|---|
| Centralized | 3 (request, grant, release) | Yes — the coordinator | 2 (request + grant) |
| Token-ring | 1 to (token circulates) | Yes (token loss / process crash breaks ring) | 0 to |
| Distributed (Ricart–Agrawala) | No (but failure of any process can block it) |
Centralized
A single coordinator grants the token/permission.
- Advantage: simple, fair (FCFS), and minimal messages (only 3 per entry).
- Disadvantage: the coordinator is a single point of failure and a performance bottleneck.
Token-Ring
Processes are logically arranged in a ring; a token circulates and only its holder may enter the CS.
- Advantage: no starvation — every process gets the token in turn; correctness is easy to ensure.
- Disadvantage: if the token is lost or a process crashes, the ring breaks and must be regenerated; wasted messages when no one needs the CS.
Distributed (Ricart–Agrawala)
A process requests permission from all others using timestamps and enters when all reply.
- Advantage: no single coordinator — fully decentralized.
- Disadvantage: high message overhead () and the crash of any process can stall the system (every process is a potential point of failure).
Define fault tolerance and distinguish among the following failure types in a distributed system: crash failure, omission failure, timing failure, and Byzantine (arbitrary) failure. Briefly explain how redundancy can be used to mask such failures.
Fault Tolerance
Fault tolerance is the ability of a distributed system to continue providing correct service even when some of its components fail. A system is dependable if it remains available, reliable, safe, and maintainable despite faults.
Failure Types
- Crash (fail-stop) failure: a component halts completely and permanently; before failing it worked correctly, and after failing it does nothing (e.g., a server that stops responding).
- Omission failure: a component fails to perform some action — e.g., a send omission (server fails to send a reply) or receive omission (server never receives a request). Messages are lost.
- Timing failure: a (correct) response is produced outside its required time interval — too late (or too early). Relevant in real-time systems where deadlines matter.
- Byzantine (arbitrary) failure: the most severe — a component behaves arbitrarily/maliciously, may produce wrong or inconsistent results, send conflicting information to different nodes, or actively appear correct. Hardest to detect and tolerate.
Masking Failures with Redundancy
- Information redundancy: add extra bits (checksums, error-correcting codes) to detect/correct corrupted data.
- Time redundancy: repeat an operation (retransmit/retry); masks transient and omission failures.
- Physical (hardware/process) redundancy: replicate components — e.g., Triple Modular Redundancy (TMR) runs three units and votes on the output, masking a single faulty unit. Tolerating Byzantine faults requires replicas.
By combining replication with voting and acknowledgements, failures are masked so the system as a whole continues to deliver correct results.
Explain the Ring (Chang-Roberts) election algorithm. Show, with an example ring of five processes, how a coordinator is elected after the current coordinator fails, and compare its message complexity with that of the Bully algorithm.
Ring (Chang–Roberts) Election Algorithm
Processes are arranged in a logical ring; each knows its successor. The algorithm elects the process with the highest ID as coordinator.
Procedure
- A process noticing the coordinator has failed starts an election: it creates an
ELECTIONmessage containing its own ID and sends it to its successor (skipping dead nodes). - On receiving an
ELECTIONmessage, each process adds its ID (Chang–Roberts optimization: it forwards only if its own ID is higher; otherwise just relays the highest) and passes it on. - When the message returns to the initiator (it sees its own ID, now the maximum), that process knows the highest ID and sends a
COORDINATORmessage around the ring announcing the winner.
Example — ring of five processes
Ring order: . Coordinator fails. Suppose detects it:
- sends
ELECTION(1)to . - : → forwards
ELECTION(4)to . - : → forwards
ELECTION(4)to (skipping dead ). - : → forwards
ELECTION(4)to . - receives ID 4 (> its own 1) → forwards to ; sees its own ID → is the new coordinator and circulates
COORDINATOR(4).
(With dead, — the highest surviving ID — wins.)
Message Complexity vs Bully
| Ring (Chang–Roberts) | Bully | |
|---|---|---|
| Best case | — | |
| Typical/worst | — about to messages |
The Ring algorithm has lower (linear) message complexity and no message storms, whereas the Bully algorithm is in the worst case but can elect a coordinator faster when the highest process detects the failure itself.
Frequently asked questions
- Where can I find the BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) question paper 2078?
- The full BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) 2078 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Distributed System (IOE, CT 702 / ENCT 411) 2078 paper come with solutions?
- Yes. Every question on this Distributed System (IOE, CT 702 / ENCT 411) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) 2078 paper?
- The BE Computer Engineering (IOE, TU) Distributed System (IOE, CT 702 / ENCT 411) 2078 paper carries 80 full marks and is meant to be completed in 180 minutes, across 11 questions.
- Is practising this Distributed System (IOE, CT 702 / ENCT 411) past paper free?
- Yes — reading and attempting this Distributed System (IOE, CT 702 / ENCT 411) past paper on Kekkei is completely free.