BSc CSIT (TU) Science Distributed System (BSc CSIT, CSC462) Question Paper 2079 Nepal

Q: Where can I find the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) question paper 2079?

The full BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2079 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Distributed System (BSc CSIT, CSC462) 2079 paper come with solutions?

Yes. Every question on this Distributed System (BSc CSIT, CSC462) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2079 paper?

The BSc CSIT (TU) Distributed System (BSc CSIT, CSC462) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Distributed System (BSc CSIT, CSC462) past paper free?

Yes — reading and attempting this Distributed System (BSc CSIT, CSC462) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Explain the architecture of a distributed file system. Discuss the design and working of the Network File System (NFS) or Andrew File System (AFS).

distributed-fs

Answer 1

Distributed File System (DFS)

A distributed file system is a file system that allows files to be stored on multiple networked machines but accessed by clients as if they were on a single local file system. It provides location transparency, access transparency, and a uniform namespace.

Architecture (Client–Server model)

A DFS is typically built from three logical components:

Client module – runs on the user machine, intercepts file system calls and forwards them to remote servers (often through a Virtual File System / VFS layer).
File service – stores and manages the actual file data and provides operations (read, write, create, delete).
Directory/name service – maps human-readable file names to file identifiers (location of the file).

Files may be replicated (for availability and performance) and cached at clients (to reduce network traffic). The design must balance consistency, performance, availability, and scalability.

Network File System (NFS) – Sun Microsystems

NFS is a stateless, RPC-based DFS.

Architecture: Built on top of the Virtual File System (VFS) layer. The VFS routes requests for local files to the local FS and requests for remote files to the NFS client, which uses RPC + XDR (External Data Representation) to talk to the NFS server.
Mount protocol: A remote directory is mounted into the client's local namespace; the server returns a file handle identifying the exported directory.
Stateless server: The server keeps no per-client open-file state. Every request (e.g. read(handle, offset, count)) is self-contained and idempotent, so after a crash the server simply restarts and clients retransmit — giving simple crash recovery.
Caching: Clients cache file blocks and attributes; consistency is weak (validated periodically), which can cause clients to see slightly stale data.

Andrew File System (AFS) – CMU

AFS is designed for large scale and is stateful.

Whole-file caching: When a client opens a file, the entire file is copied to the client's local disk and all reads/writes happen locally; the file is written back on close. This minimizes server load and gives excellent scalability.
Callbacks: The server keeps state (a callback promise). If another client modifies the file, the server sends a callback break to invalidate the cached copy, providing stronger consistency than NFS.
Session semantics: Changes become visible to others only after the file is closed.

NFS vs AFS

Feature	NFS	AFS
Server state	Stateless	Stateful (callbacks)
Caching unit	Blocks (in memory)	Whole file (on local disk)
Consistency	Weak (poll-based)	Stronger (callback-based)
Scalability	Moderate	High

Conclusion: NFS favors simplicity and easy recovery via statelessness, while AFS favors scalability and consistency via whole-file caching and callbacks.

Answer 2

Fault Tolerance

Fault tolerance is the ability of a distributed system to continue providing correct service even when one or more of its components fail. A fault is the underlying defect, an error is its manifestation, and a failure occurs when the system deviates from its specification. The key requirements are availability, reliability, safety, and maintainability.

Failure Models

Failures are classified by how a faulty component behaves:

Failure type	Description
Crash (fail-stop)	A process halts and stays halted; others can detect it.
Omission	A process/channel fails to send (send-omission) or receive (receive-omission) messages.
Timing	Response is correct but arrives too late/early (in synchronous systems).
Response	Server returns an incorrect value or wrong state transition.
Arbitrary / Byzantine	A component behaves arbitrarily — may send conflicting or malicious messages. This is the hardest to tolerate.

Use of Redundancy

Fault tolerance is achieved primarily through redundancy (masking failures by replicating):

Information redundancy: extra bits, e.g. error-correcting codes (Hamming, CRC).
Time redundancy: repeat an action (retransmit a request, retry a transaction).
Physical/Hardware redundancy: replicate components, e.g. multiple processors/disks (RAID), Triple Modular Redundancy (TMR with voters).
Software/Process redundancy: replicate processes into groups and use voting so that the failure of a minority is masked.

Agreement in Faulty Systems — Byzantine Generals Problem

Non-faulty processes must reach agreement on a value even when some processes are faulty/treacherous.

Statement: Several generals (processes) must agree on a common plan (attack/retreat) by exchanging messages, but some generals are traitors who send contradictory messages. Loyal generals must (i) all decide the same plan, and (ii) follow a loyal commander's order.
Lamport–Shostak–Pease result: With arbitrary (Byzantine) failures and oral (unsigned) messages, agreement is possible only if $n \ge 3m + 1$ , where $n$ is the total number of processes and $m$ is the number of faulty ones. Equivalently, more than two-thirds of the processes must be correct. It requires $m+1$ rounds of message exchange.
With signed (authenticated) messages, agreement is possible for any $n > m$ .

Example: With $m = 1$ traitor, at least $n = 4$ generals are needed; with 3 generals one of whom is a traitor, the two loyal ones cannot agree.

Conclusion: Fault tolerance combines failure detection, redundancy for masking, and agreement protocols (like Byzantine agreement) to keep the system correct despite faults.

Answer 3

Clock Synchronization in Distributed Systems

In a distributed system every machine has its own physical clock, and these clocks drift apart over time. Since there is no global clock, processes need a way to order events and agree on time. Physical clock synchronization keeps real-time clocks close; logical clocks only capture the ordering of events, which is often all that is needed.

The Happened-Before Relation ( $\rightarrow$ )

Lamport defined the happened-before relation to capture causality:

If $a$ and $b$ are events in the same process and $a$ occurs before $b$ , then $a \rightarrow b$ .
If $a$ is the sending of a message and $b$ is its receipt, then $a \rightarrow b$ .
Transitivity: if $a \rightarrow b$ and $b \rightarrow c$ , then $a \rightarrow c$ .

If neither $a \rightarrow b$ nor $b \rightarrow a$ , the events are concurrent ( $a \parallel b$ ).

Lamport's Logical Clocks

Each process $P_i$ keeps a counter $C_i$ . Rules:

IR1: before each event, $C_i = C_i + 1$ .
IR2: a message carries its send timestamp $t$ . On receipt, $C_j = \max(C_j, t) + 1$ .

Property: if $a \rightarrow b$ then $C(a) < C(b)$ . (The converse need not hold.)

Example: $P_1$ sends a message at local time 6; it arrives at $P_2$ whose clock reads 4. By IR2 the receive event gets $\max(4,6)+1 = 7$ , so the send (6) < receive (7), preserving causal order.

Vector Clocks

Lamport clocks cannot tell whether $C(a) < C(b)$ implies causality. Vector clocks fix this. Each process keeps a vector $V_i[1..n]$ :

VR1: before an internal/send event, $V_i[i] = V_i[i] + 1$ .
VR2: a message carries $V_i$ ; on receipt, $V_j[k] = \max(V_j[k], V_{msg}[k])$ for all $k$ , then $V_j[j] = V_j[j] + 1$ .

Comparison: $V(a) < V(b)$ iff every component $V(a)[k] \le V(b)[k]$ and at least one is strictly smaller. Then:

a \rightarrow b \iff V(a) < V(b)

If neither $V(a) < V(b)$ nor $V(b) < V(a)$ , the events are concurrent.

Example: Start $P_1=(0,0,0)$ , $P_2=(0,0,0)$ . $P_1$ does an event $\Rightarrow (1,0,0)$ and sends to $P_2$ . $P_2$ (at $(0,0,0)$ ) receives $\Rightarrow \max((0,0,0),(1,0,0))=(1,0,0)$ then bump $\Rightarrow (1,1,0)$ . Since $(1,0,0) < (1,1,0)$ , the send causally precedes the receive.

Conclusion: The happened-before relation defines partial event ordering; Lamport clocks give a consistent total order but lose information, while vector clocks precisely capture causality and detect concurrency.

Answer 4

Distributed System

A distributed system is a collection of independent (autonomous) computers connected by a network that appears to its users as a single coherent system. The components cooperate by passing messages and share no common physical clock or memory.

Goals

Resource sharing – share hardware, software and data (printers, files, databases) across the network.
Transparency – hide the distribution so the system looks like one machine (access, location, replication, concurrency, failure transparency).
Openness – use standard, well-defined interfaces/protocols so components from different vendors can interoperate and be extended.
Scalability – grow in size, geography, or administration without major performance loss or redesign.

Characteristics

Concurrency: many processes execute simultaneously and may share resources, requiring coordination.
No global clock: coordination relies on message passing; only logical ordering of events is possible.
Independent failures: components can fail independently and partially while the rest of the system keeps running, so the system must tolerate faults.
Heterogeneity: different hardware, OS, and networks are handled, usually via middleware.

In short: a distributed system provides resource sharing with transparency, openness, scalability and fault tolerance over autonomous, concurrently-executing, clock-less machines.

Answer 5

Lamport's Logical Clock

Because distributed processes have no shared physical clock, Lamport's logical clock assigns a monotonically increasing counter to events so that causally related events are correctly ordered. It implements the happened-before ( $\rightarrow$ ) relation such that:

a \rightarrow b \Rightarrow C(a) < C(b)

Rules

Each process $P_i$ keeps a counter $C_i$ :

IR1 (internal/send): before timestamping any event, increment: $C_i = C_i + 1$ .
IR2 (receive): a message carries its send timestamp $t$ ; the receiver sets $C_j = \max(C_j, t) + 1$ .

Example

Two processes with different tick rates:

$P_1$ ticks: 1, 2, 3, 4, 5, 6 …
$P_1$ sends message m at its time 6.
$P_2$ 's clock currently reads 4 when m arrives.
By IR2: receive timestamp $= \max(4, 6) + 1 = 7$ .

Thus the send (6) < receive (7), so the causal order send happened-before receive is preserved. Without the rule, the receive (4) would wrongly appear earlier than the send (6).

Total ordering: ties (equal timestamps in different processes) are broken using process IDs, giving a consistent global total order used in algorithms like Lamport's mutual exclusion.

Limitation: $C(a) < C(b)$ does not imply $a \rightarrow b$ ; logical clocks cannot detect concurrency (vector clocks are needed for that).

Answer 6

Remote Procedure Call (RPC)

RPC is a communication mechanism that lets a program call a procedure located on a remote machine as if it were a local procedure call, hiding the underlying message passing from the programmer. It provides access transparency and was introduced by Birrell and Nelson.

Working (steps)

Client                                Server
  | 1. call proc(args)                  |
  v                                     |
[Client stub] --2. marshal args-->      |
  | (pack into message)                 |
[Client OS] --3. send msg over network--> [Server OS]
                                          |
                                  4. msg passed to [Server stub]
                                          | 5. unmarshal args
                                          v
                                  6. call actual procedure
                                          |
                                  7. result -> marshal
[Client OS] <--8. reply msg-------------- [Server OS]
  |
[Client stub] 9. unmarshal result
  v
10. return value to client

The client calls the client stub (a local proxy procedure).
The client stub marshals (packs) the parameters into a message.
The client OS sends the message to the remote server.
The server OS hands the message to the server stub.
The server stub unmarshals the parameters.
The server stub calls the actual procedure on the server. 7–10. The result is marshaled, sent back, unmarshaled by the client stub, and returned to the caller.

Key points

Stubs hide marshaling/network code; IDL (Interface Definition Language) generates them.
Parameter passing: call-by-value works directly; call-by-reference is hard (no shared address space) and is handled by copy/restore.
Failure semantics: because the network can lose messages, RPC offers semantics such as at-least-once or at-most-once delivery.

In short: RPC makes distributed communication look like ordinary procedure calls through client/server stubs and marshaling.

Answer 7

Centralized vs Distributed Mutual Exclusion

Mutual exclusion ensures that only one process at a time enters a critical section (CS) accessing a shared resource.

Centralized Algorithm

A single coordinator grants permission. To enter the CS a process sends a request to the coordinator; the coordinator replies with a grant if the resource is free, otherwise queues the request. On exit the process sends a release and the coordinator grants the next queued request.

Distributed Algorithm (e.g. Ricart–Agrawala)

No central node: a process wanting the CS multicasts a timestamped request to all other processes and enters only after receiving OK/reply from all. Requests are ordered by Lamport timestamps to break ties.

Differences

Aspect	Centralized	Distributed
Control	One coordinator decides	All processes participate
Messages per CS entry	3 (request, grant, release)	2(n−1) (request + reply to/from all)
Single point of failure	Yes (coordinator)	No single point (but more points can fail)
Bottleneck	Coordinator can be a bottleneck	Load is distributed
Implementation	Simple, easy	More complex, needs ordering
Fairness/Starvation	Fair, no starvation (FIFO queue)	Fair via timestamps

Conclusion: The centralized scheme is simple and message-efficient but has a single point of failure and a bottleneck; the distributed scheme removes the single coordinator at the cost of more messages and complexity.

Answer 8

Distributed Deadlock Detection

A deadlock occurs when a set of processes are each waiting for a resource held by another in the set, so none can proceed. In a distributed system the resources, processes and the wait-for information are spread across many sites, making detection harder.

Wait-For Graph (WFG)

A wait-for graph is a directed graph in which:

each node represents a process, and
a directed edge $P_i \rightarrow P_j$ means $P_i$ is waiting for a resource currently held by $P_j$ .

A cycle in the WFG indicates a deadlock. In a distributed system the full graph is the union of local WFGs held at different sites (a global WFG).

Approaches to Distributed Deadlock Detection

Centralized: a coordinator builds a global WFG from the local WFGs and searches for cycles. Simple but a single point of failure and can report phantom (false) deadlocks due to message delays.
Distributed: every site participates in detection (e.g. Chandy–Misra–Haas edge-chasing): when a process blocks, it sends a probe message along its outgoing wait-for edges; if the probe returns to its initiator, a cycle (deadlock) exists.
Hierarchical: sites are organized in a tree; deadlock detection is done at the lowest common ancestor.

Recovery

Once detected, a deadlock is broken by victim selection — aborting/rolling back one process to release its resources.

In short: distributed deadlock detection finds cycles in the (distributed) wait-for graph, commonly via edge-chasing probe messages, then resolves them by aborting a victim.

Answer 9

Bully Algorithm (Coordinator Election)

The Bully algorithm (Garcia-Molina) elects the process with the highest process ID as the coordinator when the current coordinator fails. It assumes the system is synchronous, each process knows the IDs of all others, and messages are reliable.

Message types

ELECTION – announces an election to higher-ID processes.
OK (ANSWER) – a reply that a higher process is alive and takes over.
COORDINATOR – announces the winner to all processes.

Algorithm

When a process $P$ notices the coordinator is not responding:

$P$ sends an ELECTION message to all processes with a higher ID.
If no one with a higher ID responds (within a timeout), $P$ wins: it becomes the coordinator and sends a COORDINATOR message to all lower-ID processes.
If a higher-ID process replies with OK, $P$ drops out; that higher process now holds the election (it repeats step 1 among still-higher processes).
Eventually the highest-ID live process wins and broadcasts COORDINATOR.

A recovered process (or one with the highest ID) can immediately bully the others by starting an election and taking over — hence the name.

Example

Processes 1–7, coordinator 7 crashes. Process 4 detects it and sends ELECTION to 5, 6, 7. Processes 5 and 6 reply OK; 4 stops. 5 and 6 each hold elections; 6 replies OK to 5, so 5 stops. 6 sends ELECTION to 7 (no reply), so 6 wins and broadcasts COORDINATOR to all.

Cost: worst case $O(n^2)$ messages.

Answer 10

Types of Transparency in a Distributed System

Transparency means hiding the fact that the system's resources and processes are distributed across multiple machines, so users and applications perceive a single coherent system. The ISO/ANSA reference model defines the following kinds (ISO RM-ODP):

Transparency	What it hides
Access	Differences in data representation and how a resource is accessed (local vs remote access look the same).
Location	Where a resource is physically located (name does not reveal location).
Migration	That a resource may move to another location; names stay valid.
Relocation	That a resource may move while in use.
Replication	That a resource is replicated; the user sees one logical copy.
Concurrency	That a resource is shared by several competing users simultaneously.
Failure	The failure and recovery of a resource, so the system appears to keep working.
Persistence	Whether a resource is in memory or on disk.

In short: these transparencies (access, location, migration, relocation, replication, concurrency, failure, persistence) let the distributed system appear as one unified machine to its users.

Answer 11

Middleware

Middleware is a software layer that sits between the operating system/network and the distributed applications. It hides the heterogeneity of the underlying networks, hardware and operating systems and provides a uniform programming model and common services, so developers can build distributed applications more easily.

+-------------------------------------------------+
|              Distributed Application             |
+-------------------------------------------------+
|        MIDDLEWARE (common services, API)         |  <-- provides transparency
+-------------------------------------------------+
|   Local OS   |   Local OS   |   Local OS  | ...   |
+-------------------------------------------------+
|                   Network                        |
+-------------------------------------------------+

Role in a Distributed System

Provides transparency – masks access, location, replication, and failure differences so remote resources look local.
Hides heterogeneity – lets different OS, hardware, and languages interoperate via standard interfaces.
Communication abstractions – offers RPC, remote method invocation (RMI), and message-oriented middleware (MOM) instead of raw sockets.
Common services – naming, persistence, security/authentication, transactions, concurrency control, and replication.
Eases development – higher-level API so programmers focus on application logic, not networking details.

Examples: CORBA, Java RMI, DCOM, gRPC, and message brokers such as RabbitMQ/JMS.

In short: middleware is the glue that makes a collection of heterogeneous networked machines behave as one coherent distributed system.

Answer 12

Cristian's Algorithm (Physical Clock Synchronization)

Cristian's algorithm synchronizes a client's clock with a time server (which holds accurate UTC time, e.g. from a radio/atomic source). It is suitable when round-trip times are small compared to the required accuracy.

Procedure

The client sends a request to the time server at local time $T_0$ .
The server replies with its current time $T$ (UTC).
The client records the reply-receipt time $T_1$ . The round-trip time is $RTT = T_1 - T_0$ .
The message took roughly $RTT/2$ to travel back, so the client sets its clock to:

T_{client} = T + \frac{T_1 - T_0}{2}

If the server's interrupt-handling time $I$ is known, a better estimate is:

T_{client} = T + \frac{(T_1 - T_0) - I}{2}

Accuracy

The error is bounded by $\pm\left(\dfrac{RTT}{2} - T_{min}\right)$ , where $T_{min}$ is the minimum one-way transmission time. Taking the average of several requests (discarding outliers with large RTT) improves accuracy.

Example

If $T_0 = 5{:}00{:}00.000$ , $T_1 = 5{:}00{:}00.020$ (so $RTT = 20$ ms) and the server time $T = 5{:}10{:}00.000$ , the client sets its clock to $5{:}10{:}00.000 + 10\text{ ms} = 5{:}10{:}00.010$ .

Limitations

Clocks must only ever move forward, so if the new time is behind the current time, the clock is slowed down gradually rather than set back.
Relies on a single time server (single point of failure); the Berkeley algorithm avoids needing an accurate external server.

Level	BSc CSIT (TU)
Stream	Science
Subject	Distributed System (BSc CSIT, CSC462)
Year	2079 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

Section A: Long Answer Questions

Distributed File System (DFS)

Architecture (Client–Server model)

Network File System (NFS) – Sun Microsystems

Andrew File System (AFS) – CMU

NFS vs AFS

Fault Tolerance

Failure Models

Use of Redundancy

Agreement in Faulty Systems — Byzantine Generals Problem

Clock Synchronization in Distributed Systems

The Happened-Before Relation (→\rightarrow→)

Lamport's Logical Clocks

Vector Clocks

Section B: Short Answer Questions

Distributed System

Goals

Characteristics

Lamport's Logical Clock

Rules

Example

Remote Procedure Call (RPC)

Working (steps)

Key points

Centralized vs Distributed Mutual Exclusion

Centralized Algorithm

Distributed Algorithm (e.g. Ricart–Agrawala)

Differences

Distributed Deadlock Detection

Wait-For Graph (WFG)

Approaches to Distributed Deadlock Detection

Recovery

Bully Algorithm (Coordinator Election)

Message types

Algorithm

Example

Types of Transparency in a Distributed System

Middleware

Role in a Distributed System

Cristian's Algorithm (Physical Clock Synchronization)

Procedure

Accuracy

Example

Limitations

Frequently asked questions

The Happened-Before Relation ( $\rightarrow$ )