BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2080 Nepal
This is the official BSc CSIT (TU) (Science stream) Advanced Database (BSc CSIT, CSC461) question paper for 2080, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Advanced Database (BSc CSIT, CSC461) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) exam or solving previous years' question papers, this 2080 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
What is a distributed database management system (DDBMS)? Explain data fragmentation, replication and allocation techniques used in distributed databases with suitable examples.
Distributed Database Management System (DDBMS)
A distributed database is a collection of logically interrelated databases distributed over a computer network. A DDBMS is the software that manages the distributed database and makes the distribution transparent to users, so the system appears as a single, centralized database even though the data is physically stored across multiple sites.
Key transparencies: location transparency, fragmentation transparency, replication transparency, and concurrency/failure transparency.
Data Fragmentation
Fragmentation divides a global relation into smaller pieces (fragments) that can be stored at different sites. Rules: completeness (every tuple/attribute appears in some fragment), reconstruction (the original relation can be rebuilt), and disjointness (for horizontal fragmentation).
- Horizontal fragmentation — splits a relation by rows using a selection predicate.
Reconstruction: .
- Vertical fragmentation — splits a relation by columns, each fragment keeping the primary key.
Reconstruction: (join on key).
- Mixed/Hybrid fragmentation — combination of horizontal and vertical (e.g., horizontally fragment, then vertically fragment a fragment).
Replication
Replication stores copies of a fragment/relation at multiple sites.
- Full replication — every site has a full copy: fast reads, high availability, but expensive updates.
- Partial replication — only selected fragments are replicated (frequently accessed data).
- No replication — each fragment stored at exactly one site: cheap updates, but lower availability.
Replication improves read performance and reliability but increases update cost and the burden of keeping copies consistent.
Data Allocation
Allocation decides which fragment is stored at which site. Strategies are guided by an allocation model that minimizes total cost (storage + query + update + communication) based on the access frequency of each fragment at each site. A fragment is typically allocated to the site(s) where it is most frequently accessed.
Example: A bank with branches in Kathmandu and Pokhara horizontally fragments the ACCOUNT table by branch; the Kathmandu fragment is allocated to the Kathmandu server (where it is mostly used) and replicated read-only to Pokhara for occasional reporting.
Explain the object-oriented database model. Discuss the ODMG object model along with the Object Definition Language (ODL) and Object Query Language (OQL) with examples.
Object-Oriented Database (OODB) Model
An object-oriented database stores data as objects, integrating object-oriented programming concepts with database capabilities (persistence, concurrency, recovery). Core features: object identity (OID), encapsulation (state + behavior/methods), classes, inheritance, polymorphism, and support for complex objects. It overcomes the impedance mismatch between the relational model and OO programming languages.
ODMG Object Model
The ODMG (Object Data Management Group) standard defines a common model for OODBs so that applications are portable across compliant systems. Components:
- Object Model — defines objects (with mutable state, identified by OID) and literals (immutable values). Supports atomic, structured and collection types (Set, Bag, List, Array, Dictionary), relationships, attributes and operations.
- ODL — Object Definition Language (schema definition).
- OQL — Object Query Language (declarative querying).
- A binding to OO languages (C++, Java, Smalltalk).
Object Definition Language (ODL)
ODL is a programming-language-independent language used to define the schema (classes, attributes, relationships, methods).
class Student (extent students key rollNo) {
attribute string rollNo;
attribute string name;
attribute integer age;
relationship Set<Course> takes inverse Course::takenBy;
void enroll(in Course c);
};
class Course (extent courses key code) {
attribute string code;
attribute string title;
relationship Set<Student> takenBy inverse Student::takes;
};
Here extent is the set of all objects of the class, key enforces uniqueness, and relationship ... inverse declares a bidirectional relationship.
Object Query Language (OQL)
OQL is an SQL-like declarative query language for the ODMG model that supports path expressions, method calls and complex/nested results.
-- Names of students older than 20
SELECT s.name
FROM s IN students
WHERE s.age > 20;
-- Path expression: titles of courses taken by 'Ram'
SELECT c.title
FROM s IN students, c IN s.takes
WHERE s.name = "Ram";
Unlike SQL, OQL can return objects, structures, and nested collections, and navigate relationships through path expressions (s.takes).
What is concurrency control? Explain the two-phase locking protocol and timestamp-based ordering protocol for concurrency control with suitable examples.
Concurrency Control
Concurrency control is the activity of coordinating the concurrent (interleaved) execution of transactions in a multi-user database so that the result is equivalent to some serial execution (serializability), while preserving the ACID properties. Without it, anomalies such as lost update, dirty read, and unrepeatable read can occur.
Two-Phase Locking (2PL) Protocol
2PL is a lock-based protocol. A transaction must obtain a shared lock (S) before reading and an exclusive lock (X) before writing a data item. Locking proceeds in two phases:
- Growing phase — the transaction may acquire locks but cannot release any.
- Shrinking phase — the transaction may release locks but cannot acquire any new lock.
The point after the last lock is acquired is the lock point. 2PL guarantees conflict serializability.
Variants: Strict 2PL holds all exclusive locks until commit/abort (prevents cascading rollback); Rigorous 2PL holds all locks until commit.
Example:
T1: lock-X(A); read(A); A=A-50; write(A);
lock-X(B); release(A); read(B); B=B+50; write(B); release(B);
All locks are acquired before any is released, so T1 obeys 2PL. A drawback is the possibility of deadlock.
Timestamp-Based Ordering Protocol
Each transaction gets a unique timestamp at start. Each data item has two values: W-timestamp(X) (largest timestamp of a transaction that wrote X) and R-timestamp(X) (largest timestamp that read X). The protocol orders transactions by their timestamps and ensures the schedule is equivalent to executing them in timestamp order.
Read rule (Ti reads X):
- If → is too late; reject and roll back .
- Else read is allowed; set .
Write rule (Ti writes X):
- If or → reject and roll back .
- Else write is allowed; set .
Example: Suppose , . If writes (so ) and then (timestamp 5) tries to write , since , is rolled back and restarted with a new timestamp.
Timestamp ordering is deadlock-free (no waiting) but can cause repeated rollbacks/starvation.
Section B: Short Answer Questions
Attempt any EIGHT questions.
What is a deductive database? Explain the use of Datalog with an example.
Deductive Database
A deductive database is a database system that can derive new facts (the intensional database, IDB) from a set of stored base facts (the extensional database, EDB) using logical inference rules. It combines a relational database with a logic-programming style rule engine, allowing recursive queries that are hard to express in plain SQL.
Datalog
Datalog is a declarative, rule-based query language (a restricted, non-procedural subset of Prolog). A Datalog program consists of:
- Facts (ground atoms) — the stored data.
- Rules of the form
head :- body, meaning if the body is true, infer the head.
Datalog supports recursion, making it powerful for transitive queries.
Example — ancestor relationship:
% Facts (EDB)
parent(ram, sita).
parent(sita, hari).
% Rules (IDB)
ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).
% Query
?- ancestor(ram, hari). % returns true
The recursive rule derives ancestor(ram, hari) even though it is not stored directly, demonstrating inference that ordinary relational queries cannot easily express.
Explain the two-phase locking (2PL) protocol for concurrency control.
Two-Phase Locking (2PL) Protocol
2PL is a lock-based concurrency-control protocol that guarantees conflict-serializable schedules. A transaction acquires a shared lock (S) before reading and an exclusive lock (X) before writing a data item, and all its locking activity is divided into two non-overlapping phases:
- Growing phase — the transaction may acquire (upgrade) locks but may not release any lock.
- Shrinking phase — the transaction may release (downgrade) locks but may not acquire any new lock.
The boundary between the two phases is called the lock point. Holding to this rule ensures serializability.
Variants
- Strict 2PL — all exclusive locks are held until commit/abort, preventing cascading rollbacks.
- Rigorous 2PL — all locks (S and X) held until commit.
Example
lock-X(A); read(A); write(A); -- growing
lock-X(B); -- growing (last lock = lock point)
unlock(A); read(B); write(B); -- shrinking
unlock(B);
Drawback: 2PL can lead to deadlocks, which must be handled by deadlock detection or prevention.
Explain the two-phase commit (2PC) protocol used in distributed transactions.
Two-Phase Commit (2PC) Protocol
2PC is an atomic commit protocol that ensures all sites participating in a distributed transaction either commit or abort together, preserving atomicity across the network. One site acts as the coordinator; the others are participants (cohorts).
Phase 1 — Voting / Prepare Phase
- The coordinator sends a PREPARE message to all participants.
- Each participant that can commit writes a
<ready T>record to its log, forces the log to disk, and replies VOTE-COMMIT (Yes). If it cannot commit, it writes<abort T>and replies VOTE-ABORT (No).
Phase 2 — Decision / Commit Phase
- If all participants voted Yes, the coordinator writes
<commit T>to its log and sends GLOBAL-COMMIT to all; otherwise it writes<abort T>and sends GLOBAL-ABORT. - Each participant carries out the decision, writes the corresponding log record, and sends an ACK to the coordinator.
- After receiving all ACKs, the coordinator completes the transaction.
Drawback
2PC is a blocking protocol: if the coordinator fails after participants have voted Yes, the participants are stuck holding locks (in-doubt) until the coordinator recovers. Three-phase commit (3PC) addresses this blocking problem.
What is indexing? Differentiate between primary, secondary and clustering indexes.
Indexing
An index is an auxiliary data structure (typically a sorted file of (search-key, pointer) entries or a B+-tree) built on one or more attributes of a relation to speed up data retrieval, avoiding a full scan. It trades extra storage and update overhead for faster query access.
Comparison of Index Types
| Feature | Primary Index | Clustering Index | Secondary Index |
|---|---|---|---|
| Built on | Ordering key field (unique) | Ordering non-key field | Non-ordering field (key or non-key) |
| File order | File sorted on this field | File sorted on this field | File not sorted on this field |
| Entries | One per block (sparse) | One per distinct value | One per record (dense) |
| Number per table | At most one | At most one | Many allowed |
| Density | Sparse | Sparse | Dense |
Key points
- A primary index is defined on the field by which the file is physically sorted and which is the primary key; it is sparse (one entry per data block).
- A clustering index is also on the field by which the file is sorted, but that field has duplicate values (non-key); one index entry per distinct value points to the first block containing it.
- A secondary index is defined on a field that is not the physical ordering field; it is dense and a table may have many secondary indexes.
Explain the architecture of parallel databases. What is inter-query and intra-query parallelism?
Parallel Database Architecture
A parallel database uses multiple CPUs and disks operating in parallel to improve performance (throughput and response time) through speedup and scaleup. The main hardware architectures are:
- Shared-memory — all processors share a common main memory and disks via an interconnect. Easy to program; limited scalability due to memory-bus contention.
- Shared-disk — each processor has its own private memory but all share the disks. Good fault tolerance; interconnect to disks can be a bottleneck.
- Shared-nothing — each processor has its own memory and disks; nodes communicate only by message passing. Highly scalable (used in big-data systems); more complex to program.
- Hierarchical (hybrid) — combines the above (e.g., shared-nothing cluster of shared-memory nodes).
Inter-query vs Intra-query Parallelism
- Inter-query parallelism — different queries/transactions execute simultaneously on different processors. It increases throughput (transactions per second) but does not speed up a single query. Common in OLTP systems.
- Intra-query parallelism — a single query is executed faster by running its parts in parallel. It is of two kinds:
- Intra-operation parallelism — a single operation (e.g., sort, join) is parallelized across processors.
- Inter-operation parallelism — different operators of one query plan run in parallel (pipelined or independent). This reduces the response time of an individual query and is useful for OLAP/decision-support workloads.
What is an object-relational database? How does it differ from a purely relational database?
Object-Relational Database (ORDBMS)
An object-relational database extends the traditional relational model with object-oriented features while retaining SQL and the relational foundation. It is a hybrid: data is still organized in tables, but the system supports rich, user-defined and complex types. ORDBMS features (in SQL:1999+) include:
- User-defined / abstract data types (UDTs) and structured types.
- Type inheritance (subtypes/supertypes) and table inheritance.
- Collection types — arrays, multisets, nested tables.
- Methods / user-defined functions attached to types.
- References (REF types) to row objects and object identifiers (OIDs).
- Large objects (BLOB/CLOB).
Difference from a Purely Relational Database
| Aspect | Relational DB | Object-Relational DB |
|---|---|---|
| Data types | Only atomic (1NF) built-in types | Atomic plus complex/user-defined types |
| Structure | Flat tables, attributes are atomic | Tables may contain nested/complex attributes |
| Behavior | No methods stored with data | Methods/functions attached to types |
| Inheritance | Not supported | Type and table inheritance supported |
| Identity | Tuples identified by primary key | Row objects can have OIDs and REFs |
In short, a purely relational database enforces the first normal form (atomic attribute values) and stores only data, whereas an ORDBMS relaxes 1NF to allow complex objects, inheritance, and behavior — bridging the relational and object-oriented worlds while remaining backward-compatible with SQL.
What is XML data model? Explain how XML is used to represent and query data in databases.
XML Data Model
XML (eXtensible Markup Language) is a self-describing, semi-structured data model that represents data as a hierarchy (tree) of nested, user-defined tags (elements) with optional attributes. Unlike the rigid tabular relational model, XML is flexible — schemas may be irregular or partially defined — which makes it ideal for data exchange between heterogeneous systems and the web.
Key constructs: elements (nested, ordered), attributes, text content, a single root element, and structure validation via DTD or XML Schema (XSD).
Example document:
<library>
<book isbn="123">
<title>Advanced Database</title>
<author>Ram</author>
<price>500</price>
</book>
</library>
Representing and Querying XML in Databases
Storage: XML can be stored as (a) a native XML database, (b) a column of an XML data type in an object-relational DBMS, or (c) shredded into relational tables (mapping elements/attributes to columns).
Querying:
- XPath — navigates the XML tree using path expressions to select nodes.
/library/book[price > 400]/title - XQuery — a full query language (FLWOR: For-Let-Where-Order-Return) for transforming and combining XML.
for $b in doc("library.xml")/library/book where $b/price > 400 return $b/title
These let applications query hierarchical XML data much as SQL queries relational tables.
Explain log-based recovery techniques in a database system.
Log-Based Recovery
A log (transaction log) is a sequential file recording every update so the database can be restored to a consistent state after a failure. Each log record stores <Ti, X, old_value, new_value> for a write, plus <Ti start>, <Ti commit>, <Ti abort> markers. Recovery relies on the Write-Ahead Logging (WAL) rule: the log record must be written to stable storage before the corresponding data change reaches the database.
1. Deferred Database Modification
Updates are not applied to the database until the transaction commits — they are only recorded in the log. On recovery, transactions with both <Ti start> and <Ti commit> are redone (redo(Ti)); those without a commit are simply ignored (no undo needed, since nothing was written).
2. Immediate Database Modification
Updates are written to the database before commit. The log keeps both old and new values. On recovery:
- redo(Ti) if the log has both
<Ti start>and<Ti commit>— reapply new values. - undo(Ti) if the log has
<Ti start>but no<Ti commit>— restore old values (in reverse order).
Checkpoints
To limit the amount of log scanned, the system periodically writes a checkpoint record after flushing buffers and the log to disk. During recovery only transactions active at or after the most recent checkpoint need to be examined, greatly speeding up recovery.
Write short notes on the MapReduce programming model for big data processing.
MapReduce Programming Model
MapReduce is a programming model and framework (popularized by Google, implemented in Apache Hadoop) for processing very large data sets in parallel across a shared-nothing cluster of commodity machines. The programmer writes two functions; the framework handles parallelization, data distribution, fault tolerance and load balancing automatically.
Two Phases
- Map — takes an input
(key, value)pair and emits a set of intermediate(key, value)pairs.
- Shuffle & Sort — the framework groups all intermediate values by key and routes them to reducers.
- Reduce — takes a key and the list of all its values and produces the final output.
Example — Word Count
Map(doc): for each word w in doc -> emit(w, 1)
Reduce(w, counts): emit(w, sum(counts))
Input "a b a" → Map emits (a,1)(b,1)(a,1) → Shuffle groups a:[1,1], b:[1] → Reduce gives (a,2)(b,1).
Features
- Scalability — scales horizontally to thousands of nodes.
- Fault tolerance — failed tasks are automatically re-executed; data replicated in HDFS.
- Data locality — computation moved to where data resides, reducing network traffic.
- Simplicity — hides the complexity of distributed/parallel programming.
Limitation: batch-oriented and disk-heavy; less suited to iterative or real-time workloads (addressed by Spark).
Frequently asked questions
- Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2080?
- The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2080 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Advanced Database (BSc CSIT, CSC461) 2080 paper come with solutions?
- Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2080 paper?
- The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?
- Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.