BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2080 Nepal

Q: Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2080?

The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2080 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Advanced Database (BSc CSIT, CSC461) 2080 paper come with solutions?

Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2080 paper?

The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?

Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is a distributed database management system (DDBMS)? Explain data fragmentation, replication and allocation techniques used in distributed databases with suitable examples.

distributed-database

Answer 1

Distributed Database Management System (DDBMS)

A distributed database is a collection of logically interrelated databases distributed over a computer network. A DDBMS is the software that manages the distributed database and makes the distribution transparent to users, so the system appears as a single, centralized database even though the data is physically stored across multiple sites.

Key transparencies: location transparency, fragmentation transparency, replication transparency, and concurrency/failure transparency.

Data Fragmentation

Fragmentation divides a global relation into smaller pieces (fragments) that can be stored at different sites. Rules: completeness (every tuple/attribute appears in some fragment), reconstruction (the original relation can be rebuilt), and disjointness (for horizontal fragmentation).

Horizontal fragmentation — splits a relation by rows using a selection predicate.

EMP_1 = \sigma_{Dept = 'Sales'}(EMP), \quad EMP_2 = \sigma_{Dept = 'HR'}(EMP)

Reconstruction: $EMP = EMP_1 \cup EMP_2$ .

Vertical fragmentation — splits a relation by columns, each fragment keeping the primary key.

EMP_A = \pi_{Eno, Name}(EMP), \quad EMP_B = \pi_{Eno, Salary}(EMP)

Reconstruction: $EMP = EMP_A \bowtie EMP_B$ (join on key).

Mixed/Hybrid fragmentation — combination of horizontal and vertical (e.g., horizontally fragment, then vertically fragment a fragment).

Replication

Replication stores copies of a fragment/relation at multiple sites.

Full replication — every site has a full copy: fast reads, high availability, but expensive updates.
Partial replication — only selected fragments are replicated (frequently accessed data).
No replication — each fragment stored at exactly one site: cheap updates, but lower availability.

Replication improves read performance and reliability but increases update cost and the burden of keeping copies consistent.

Data Allocation

Allocation decides which fragment is stored at which site. Strategies are guided by an allocation model that minimizes total cost (storage + query + update + communication) based on the access frequency of each fragment at each site. A fragment is typically allocated to the site(s) where it is most frequently accessed.

Example: A bank with branches in Kathmandu and Pokhara horizontally fragments the ACCOUNT table by branch; the Kathmandu fragment is allocated to the Kathmandu server (where it is mostly used) and replicated read-only to Pokhara for occasional reporting.

Answer 2

Object-Oriented Database (OODB) Model

An object-oriented database stores data as objects, integrating object-oriented programming concepts with database capabilities (persistence, concurrency, recovery). Core features: object identity (OID), encapsulation (state + behavior/methods), classes, inheritance, polymorphism, and support for complex objects. It overcomes the impedance mismatch between the relational model and OO programming languages.

ODMG Object Model

The ODMG (Object Data Management Group) standard defines a common model for OODBs so that applications are portable across compliant systems. Components:

Object Model — defines objects (with mutable state, identified by OID) and literals (immutable values). Supports atomic, structured and collection types (Set, Bag, List, Array, Dictionary), relationships, attributes and operations.
ODL — Object Definition Language (schema definition).
OQL — Object Query Language (declarative querying).
A binding to OO languages (C++, Java, Smalltalk).

Object Definition Language (ODL)

ODL is a programming-language-independent language used to define the schema (classes, attributes, relationships, methods).

class Student (extent students key rollNo) {
  attribute string  rollNo;
  attribute string  name;
  attribute integer age;
  relationship Set<Course> takes inverse Course::takenBy;
  void enroll(in Course c);
};

class Course (extent courses key code) {
  attribute string code;
  attribute string title;
  relationship Set<Student> takenBy inverse Student::takes;
};

Here extent is the set of all objects of the class, key enforces uniqueness, and relationship ... inverse declares a bidirectional relationship.

Object Query Language (OQL)

OQL is an SQL-like declarative query language for the ODMG model that supports path expressions, method calls and complex/nested results.

-- Names of students older than 20
SELECT s.name
FROM   s IN students
WHERE  s.age > 20;

-- Path expression: titles of courses taken by 'Ram'
SELECT c.title
FROM   s IN students, c IN s.takes
WHERE  s.name = "Ram";

Unlike SQL, OQL can return objects, structures, and nested collections, and navigate relationships through path expressions (s.takes).

Answer 3

Concurrency Control

Concurrency control is the activity of coordinating the concurrent (interleaved) execution of transactions in a multi-user database so that the result is equivalent to some serial execution (serializability), while preserving the ACID properties. Without it, anomalies such as lost update, dirty read, and unrepeatable read can occur.

Two-Phase Locking (2PL) Protocol

2PL is a lock-based protocol. A transaction must obtain a shared lock (S) before reading and an exclusive lock (X) before writing a data item. Locking proceeds in two phases:

Growing phase — the transaction may acquire locks but cannot release any.
Shrinking phase — the transaction may release locks but cannot acquire any new lock.

The point after the last lock is acquired is the lock point. 2PL guarantees conflict serializability.

Variants: Strict 2PL holds all exclusive locks until commit/abort (prevents cascading rollback); Rigorous 2PL holds all locks until commit.

Example:

T1: lock-X(A); read(A); A=A-50; write(A);
    lock-X(B); release(A); read(B); B=B+50; write(B); release(B);

All locks are acquired before any is released, so T1 obeys 2PL. A drawback is the possibility of deadlock.

Timestamp-Based Ordering Protocol

Each transaction $T_i$ gets a unique timestamp $TS(T_i)$ at start. Each data item $X$ has two values: W-timestamp(X) (largest timestamp of a transaction that wrote X) and R-timestamp(X) (largest timestamp that read X). The protocol orders transactions by their timestamps and ensures the schedule is equivalent to executing them in timestamp order.

Read rule (Ti reads X):

If $TS(T_i) < W\text{-}TS(X)$ → $T_i$ is too late; reject and roll back $T_i$ .
Else read is allowed; set $R\text{-}TS(X) = \max(R\text{-}TS(X), TS(T_i))$ .

Write rule (Ti writes X):

If $TS(T_i) < R\text{-}TS(X)$ or $TS(T_i) < W\text{-}TS(X)$ → reject and roll back $T_i$ .
Else write is allowed; set $W\text{-}TS(X) = TS(T_i)$ .

Example: Suppose $TS(T_1)=5$ , $TS(T_2)=10$ . If $T_2$ writes $X$ (so $W\text{-}TS(X)=10$ ) and then $T_1$ (timestamp 5) tries to write $X$ , since $TS(T_1)=5 < W\text{-}TS(X)=10$ , $T_1$ is rolled back and restarted with a new timestamp.

Timestamp ordering is deadlock-free (no waiting) but can cause repeated rollbacks/starvation.

Answer 4

Deductive Database

A deductive database is a database system that can derive new facts (the intensional database, IDB) from a set of stored base facts (the extensional database, EDB) using logical inference rules. It combines a relational database with a logic-programming style rule engine, allowing recursive queries that are hard to express in plain SQL.

Datalog

Datalog is a declarative, rule-based query language (a restricted, non-procedural subset of Prolog). A Datalog program consists of:

Facts (ground atoms) — the stored data.
Rules of the form head :- body, meaning if the body is true, infer the head.

Datalog supports recursion, making it powerful for transitive queries.

Example — ancestor relationship:

% Facts (EDB)
parent(ram, sita).
parent(sita, hari).

% Rules (IDB)
ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

% Query
?- ancestor(ram, hari).   % returns true

The recursive rule derives ancestor(ram, hari) even though it is not stored directly, demonstrating inference that ordinary relational queries cannot easily express.

Answer 5

Two-Phase Locking (2PL) Protocol

2PL is a lock-based concurrency-control protocol that guarantees conflict-serializable schedules. A transaction acquires a shared lock (S) before reading and an exclusive lock (X) before writing a data item, and all its locking activity is divided into two non-overlapping phases:

Growing phase — the transaction may acquire (upgrade) locks but may not release any lock.
Shrinking phase — the transaction may release (downgrade) locks but may not acquire any new lock.

The boundary between the two phases is called the lock point. Holding to this rule ensures serializability.

Variants

Strict 2PL — all exclusive locks are held until commit/abort, preventing cascading rollbacks.
Rigorous 2PL — all locks (S and X) held until commit.

Example

lock-X(A); read(A); write(A);     -- growing
lock-X(B);                        -- growing (last lock = lock point)
unlock(A); read(B); write(B);     -- shrinking
unlock(B);

Drawback: 2PL can lead to deadlocks, which must be handled by deadlock detection or prevention.

Answer 6

Two-Phase Commit (2PC) Protocol

2PC is an atomic commit protocol that ensures all sites participating in a distributed transaction either commit or abort together, preserving atomicity across the network. One site acts as the coordinator; the others are participants (cohorts).

Phase 1 — Voting / Prepare Phase

The coordinator sends a PREPARE message to all participants.
Each participant that can commit writes a <ready T> record to its log, forces the log to disk, and replies VOTE-COMMIT (Yes). If it cannot commit, it writes <abort T> and replies VOTE-ABORT (No).

Phase 2 — Decision / Commit Phase

If all participants voted Yes, the coordinator writes <commit T> to its log and sends GLOBAL-COMMIT to all; otherwise it writes <abort T> and sends GLOBAL-ABORT.
Each participant carries out the decision, writes the corresponding log record, and sends an ACK to the coordinator.
After receiving all ACKs, the coordinator completes the transaction.

Drawback

2PC is a blocking protocol: if the coordinator fails after participants have voted Yes, the participants are stuck holding locks (in-doubt) until the coordinator recovers. Three-phase commit (3PC) addresses this blocking problem.

Answer 7

Indexing

An index is an auxiliary data structure (typically a sorted file of (search-key, pointer) entries or a B+-tree) built on one or more attributes of a relation to speed up data retrieval, avoiding a full scan. It trades extra storage and update overhead for faster query access.

Comparison of Index Types

Feature	Primary Index	Clustering Index	Secondary Index
Built on	Ordering key field (unique)	Ordering non-key field	Non-ordering field (key or non-key)
File order	File sorted on this field	File sorted on this field	File not sorted on this field
Entries	One per block (sparse)	One per distinct value	One per record (dense)
Number per table	At most one	At most one	Many allowed
Density	Sparse	Sparse	Dense

Key points

A primary index is defined on the field by which the file is physically sorted and which is the primary key; it is sparse (one entry per data block).
A clustering index is also on the field by which the file is sorted, but that field has duplicate values (non-key); one index entry per distinct value points to the first block containing it.
A secondary index is defined on a field that is not the physical ordering field; it is dense and a table may have many secondary indexes.

Answer 8

Parallel Database Architecture

A parallel database uses multiple CPUs and disks operating in parallel to improve performance (throughput and response time) through speedup and scaleup. The main hardware architectures are:

Shared-memory — all processors share a common main memory and disks via an interconnect. Easy to program; limited scalability due to memory-bus contention.
Shared-disk — each processor has its own private memory but all share the disks. Good fault tolerance; interconnect to disks can be a bottleneck.
Shared-nothing — each processor has its own memory and disks; nodes communicate only by message passing. Highly scalable (used in big-data systems); more complex to program.
Hierarchical (hybrid) — combines the above (e.g., shared-nothing cluster of shared-memory nodes).

Inter-query vs Intra-query Parallelism

Inter-query parallelism — different queries/transactions execute simultaneously on different processors. It increases throughput (transactions per second) but does not speed up a single query. Common in OLTP systems.
Intra-query parallelism — a single query is executed faster by running its parts in parallel. It is of two kinds:
- Intra-operation parallelism — a single operation (e.g., sort, join) is parallelized across processors.
- Inter-operation parallelism — different operators of one query plan run in parallel (pipelined or independent). This reduces the response time of an individual query and is useful for OLAP/decision-support workloads.

Answer 9

Object-Relational Database (ORDBMS)

An object-relational database extends the traditional relational model with object-oriented features while retaining SQL and the relational foundation. It is a hybrid: data is still organized in tables, but the system supports rich, user-defined and complex types. ORDBMS features (in SQL:1999+) include:

User-defined / abstract data types (UDTs) and structured types.
Type inheritance (subtypes/supertypes) and table inheritance.
Collection types — arrays, multisets, nested tables.
Methods / user-defined functions attached to types.
References (REF types) to row objects and object identifiers (OIDs).
Large objects (BLOB/CLOB).

Difference from a Purely Relational Database

Aspect	Relational DB	Object-Relational DB
Data types	Only atomic (1NF) built-in types	Atomic plus complex/user-defined types
Structure	Flat tables, attributes are atomic	Tables may contain nested/complex attributes
Behavior	No methods stored with data	Methods/functions attached to types
Inheritance	Not supported	Type and table inheritance supported
Identity	Tuples identified by primary key	Row objects can have OIDs and REFs

In short, a purely relational database enforces the first normal form (atomic attribute values) and stores only data, whereas an ORDBMS relaxes 1NF to allow complex objects, inheritance, and behavior — bridging the relational and object-oriented worlds while remaining backward-compatible with SQL.

Answer 10

XML Data Model

XML (eXtensible Markup Language) is a self-describing, semi-structured data model that represents data as a hierarchy (tree) of nested, user-defined tags (elements) with optional attributes. Unlike the rigid tabular relational model, XML is flexible — schemas may be irregular or partially defined — which makes it ideal for data exchange between heterogeneous systems and the web.

Key constructs: elements (nested, ordered), attributes, text content, a single root element, and structure validation via DTD or XML Schema (XSD).

Example document:

<library>
  <book isbn="123">
    <title>Advanced Database</title>
    <author>Ram</author>
    <price>500</price>
  </book>
</library>

Representing and Querying XML in Databases

Storage: XML can be stored as (a) a native XML database, (b) a column of an XML data type in an object-relational DBMS, or (c) shredded into relational tables (mapping elements/attributes to columns).

Querying:

XPath — navigates the XML tree using path expressions to select nodes.
```
/library/book[price > 400]/title
```
XQuery — a full query language (FLWOR: For-Let-Where-Order-Return) for transforming and combining XML.
```
for $b in doc("library.xml")/library/book
where $b/price > 400
return $b/title
```

These let applications query hierarchical XML data much as SQL queries relational tables.

Answer 11

Log-Based Recovery

A log (transaction log) is a sequential file recording every update so the database can be restored to a consistent state after a failure. Each log record stores <Ti, X, old_value, new_value> for a write, plus <Ti start>, <Ti commit>, <Ti abort> markers. Recovery relies on the Write-Ahead Logging (WAL) rule: the log record must be written to stable storage before the corresponding data change reaches the database.

1. Deferred Database Modification

Updates are not applied to the database until the transaction commits — they are only recorded in the log. On recovery, transactions with both <Ti start> and <Ti commit> are redone (redo(Ti)); those without a commit are simply ignored (no undo needed, since nothing was written).

2. Immediate Database Modification

Updates are written to the database before commit. The log keeps both old and new values. On recovery:

redo(Ti) if the log has both <Ti start> and <Ti commit> — reapply new values.
undo(Ti) if the log has <Ti start> but no <Ti commit> — restore old values (in reverse order).

Checkpoints

To limit the amount of log scanned, the system periodically writes a checkpoint record after flushing buffers and the log to disk. During recovery only transactions active at or after the most recent checkpoint need to be examined, greatly speeding up recovery.

Answer 12

MapReduce Programming Model

MapReduce is a programming model and framework (popularized by Google, implemented in Apache Hadoop) for processing very large data sets in parallel across a shared-nothing cluster of commodity machines. The programmer writes two functions; the framework handles parallelization, data distribution, fault tolerance and load balancing automatically.

Two Phases

Map — takes an input (key, value) pair and emits a set of intermediate (key, value) pairs.

map(k_1, v_1) \rightarrow list(k_2, v_2)

Shuffle & Sort — the framework groups all intermediate values by key and routes them to reducers.
Reduce — takes a key and the list of all its values and produces the final output.

reduce(k_2, list(v_2)) \rightarrow list(v_3)

Example — Word Count

Map(doc):    for each word w in doc -> emit(w, 1)
Reduce(w, counts): emit(w, sum(counts))

Input "a b a" → Map emits (a,1)(b,1)(a,1) → Shuffle groups a:[1,1], b:[1] → Reduce gives (a,2)(b,1).

Features

Scalability — scales horizontally to thousands of nodes.
Fault tolerance — failed tasks are automatically re-executed; data replicated in HDFS.
Data locality — computation moved to where data resides, reducing network traffic.
Simplicity — hides the complexity of distributed/parallel programming.

Limitation: batch-oriented and disk-heavy; less suited to iterative or real-time workloads (addressed by Spark).

Level	BSc CSIT (TU)
Stream	Science
Subject	Advanced Database (BSc CSIT, CSC461)
Year	2080 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2080 Nepal

Section A: Long Answer Questions

Distributed Database Management System (DDBMS)

Data Fragmentation

Replication

Data Allocation

Object-Oriented Database (OODB) Model

ODMG Object Model

Object Definition Language (ODL)

Object Query Language (OQL)

Concurrency Control

Two-Phase Locking (2PL) Protocol

Timestamp-Based Ordering Protocol

Section B: Short Answer Questions

Deductive Database

Datalog

Two-Phase Locking (2PL) Protocol

Two-Phase Commit (2PC) Protocol

Phase 1 — Voting / Prepare Phase

Phase 2 — Decision / Commit Phase

Drawback

Indexing

Comparison of Index Types

Parallel Database Architecture

Inter-query vs Intra-query Parallelism

Object-Relational Database (ORDBMS)

Difference from a Purely Relational Database

XML Data Model

Representing and Querying XML in Databases

Log-Based Recovery

1. Deferred Database Modification

2. Immediate Database Modification

Checkpoints

MapReduce Programming Model

Two Phases

Example — Word Count

Features

Frequently asked questions