Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Explain the object-oriented database model. Discuss the ODMG object model along with the Object Definition Language (ODL) and Object Query Language (OQL) with examples.

Object-Oriented Database (OODB) Model

An object-oriented database stores data as objects, the same way object-oriented programming languages represent entities. It combines database capabilities (persistence, concurrency, recovery, querying) with OO concepts such as objects, classes, encapsulation, inheritance, polymorphism, and object identity (OID). Each object has a unique, system-generated OID independent of its attribute values, and complex objects (sets, lists, nested objects) can be stored directly without flattening into rows.

Key features: complex object support, persistence of program objects, object identity, encapsulation of state and behaviour (methods), class hierarchy with inheritance, and overcoming the impedance mismatch between programming languages and relational tables.

ODMG Object Model

The ODMG (Object Data Management Group) standard defines a common object model and languages so OODBs are portable across vendors. Its main components are:

  • Objects and Literals – an object has an OID, state and behaviour; a literal has only a value (no OID).
  • Types: atomic, structured, collection (set, bag, list, array, dictionary).
  • Interfaces and Classes defining attributes, relationships and operations.
  • Three languages: ODL (schema definition), OQL (querying), and language bindings (C++, Java, Smalltalk).

Object Definition Language (ODL)

ODL is used to define the schema (classes, attributes, relationships, methods), independent of any programming language.

class Student (extent students key roll_no) {
  attribute string  roll_no;
  attribute string  name;
  attribute short   age;
  relationship set<Course> enrolled_in
     inverse Course::has_students;
  short  totalCredits();
};

class Course (extent courses key code) {
  attribute string code;
  attribute string title;
  relationship set<Student> has_students
     inverse Student::enrolled_in;
};

Object Query Language (OQL)

OQL is a declarative, SQL-like query language for ODMG objects that also supports path expressions, method calls and complex results.

-- names of students older than 20
SELECT s.name
FROM   students s
WHERE  s.age > 20;

-- path expression over a relationship
SELECT c.title
FROM   students s, s.enrolled_in c
WHERE  s.name = "Ram";

OQL returns objects, structures or collections and can invoke object methods (e.g. s.totalCredits()), which pure SQL cannot.

Conclusion: The OODB model + ODMG standard (ODL for schema, OQL for queries) lets applications store and query rich, behaviour-bearing objects directly, removing impedance mismatch.

object-oriented-database
2long10 marks

What is concurrency control? Explain the two-phase locking protocol and timestamp-based ordering protocol for concurrency control with suitable examples.

Concurrency Control

Concurrency control is the activity of coordinating the concurrent execution of transactions in a multi-user database so that the result is equivalent to some serial execution (serializability) and the database stays consistent. It prevents anomalies such as lost update, dirty read, unrepeatable read and phantom problems that arise from interleaving.

1. Two-Phase Locking (2PL) Protocol

2PL is a lock-based protocol. Each transaction acquires shared (S) locks for reads and exclusive (X) locks for writes, and obeys two phases:

  • Growing phase: the transaction may acquire locks but cannot release any.
  • Shrinking phase: once the transaction releases its first lock, it may not acquire any new lock.

The point between the two phases (when it holds all its locks) is the lock point. 2PL guarantees conflict-serializability.

Example:

T1: lock-X(A); read A; A=A-50; write A;
    lock-X(B); read B; B=B+50; write B;
    unlock(A); unlock(B);   <- all acquire before any release

Variants: Strict 2PL (hold all X-locks till commit, avoids cascading rollback) and Rigorous 2PL (hold all locks till commit). Drawback: 2PL can cause deadlock.

2. Timestamp-Ordering (TO) Protocol

Each transaction TiT_i gets a unique timestamp TS(Ti)TS(T_i) at start (e.g. system clock or counter); older transactions have smaller timestamps. Each data item XX keeps:

  • W-TS(X)W\text{-}TS(X) = largest timestamp of any transaction that wrote XX;
  • R-TS(X)R\text{-}TS(X) = largest timestamp of any transaction that read XX.

The protocol enforces that conflicting operations execute in timestamp order:

Read XX by TiT_i: if TS(Ti)<W-TS(X)TS(T_i) < W\text{-}TS(X) → reject and roll back TiT_i (it is too late); else read, and set R-TS(X)=max(R-TS(X),TS(Ti))R\text{-}TS(X)=\max(R\text{-}TS(X),TS(T_i)).

Write XX by TiT_i: if TS(Ti)<R-TS(X)TS(T_i) < R\text{-}TS(X) or TS(Ti)<W-TS(X)TS(T_i) < W\text{-}TS(X) → reject and roll back TiT_i; else write and set W-TS(X)=TS(Ti)W\text{-}TS(X)=TS(T_i).

Example: Let TS(T1)=5TS(T_1)=5, TS(T2)=10TS(T_2)=10. If T2T_2 writes XX (so W-TS(X)=10W\text{-}TS(X)=10) and then T1T_1 tries to write XX, since TS(T1)=5<W-TS(X)=10TS(T_1)=5 < W\text{-}TS(X)=10, T1T_1 is rolled back and restarted with a new timestamp.

TO is deadlock-free (no waiting) but may cause starvation/cascading rollbacks; the Thomas Write Rule ignores obsolete writes to improve concurrency.

Comparison: 2PL uses locking and may deadlock; TO uses timestamps, avoids deadlock but rolls back more transactions.

transactionconcurrency
3long10 marks

What is a NoSQL database? Explain the characteristics of NoSQL systems and discuss the CAP theorem. Compare document-based, key-value, column-based and graph-based NoSQL data stores.

NoSQL Database

A NoSQL (Not Only SQL) database is a non-relational data store designed to handle large volumes of unstructured/semi-structured data with horizontal scalability, high availability and flexible (schema-less) data models. They emerged to meet Big-Data and web-scale needs that relational databases struggle with.

Characteristics of NoSQL Systems

  • Schema-less / flexible schema – records need not share a fixed structure.
  • Horizontal scalability – scale out across commodity servers via sharding/replication.
  • High availability and partition tolerance – distributed, fault-tolerant design.
  • BASE model (Basically Available, Soft state, Eventual consistency) instead of strict ACID.
  • Simple, fast access using keys/queries rather than complex joins.
  • Distributed and replicated for performance and reliability.

CAP Theorem

The CAP theorem (Brewer's theorem) states that a distributed data system can simultaneously guarantee at most two of the following three properties:

  • Consistency (C): every read sees the most recent write.
  • Availability (A): every request gets a (non-error) response.
  • Partition tolerance (P): the system keeps working despite network partitions.

Since network partitions are unavoidable in distributed systems, P must be tolerated, forcing a trade-off between C and A: a CP system (e.g. HBase, MongoDB defaults) sacrifices availability for consistency, while an AP system (e.g. Cassandra, DynamoDB) sacrifices strong consistency for availability (eventual consistency).

Comparison of NoSQL Data Stores

TypeData modelBest forExamples
Key-Value(key → value) opaque blobCaching, sessions, simple lookupsRedis, DynamoDB, Riak
DocumentSelf-describing documents (JSON/BSON/XML), nestedContent mgmt, catalogs, flexible recordsMongoDB, CouchDB
Column-family (wide-column)Rows with dynamic column families, stored by columnAnalytics, time-series, huge sparse tablesCassandra, HBase
GraphNodes + edges with propertiesSocial networks, recommendations, path queriesNeo4j, OrientDB

Summary: Choose key-value for speed and simplicity, document for flexible records, column-family for write-heavy analytical workloads, and graph when relationships are first-class.

nosqlbigdata
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

What is a deductive database? Explain the use of Datalog with an example.

Deductive Database

A deductive database is a database system that can derive (deduce) new facts from the facts and rules stored in it, using logical inference. It combines a relational-style fact base (extensional database, EDB) with deduction rules (intensional database, IDB) written in a logic language, and answers queries by applying the rules.

Datalog

Datalog is a declarative, logic-based query language (a subset of Prolog) used in deductive databases. A Datalog program consists of:

  • Facts – ground assertions, e.g. parent(ram, sita).
  • Rules – of the form Head :- Body, meaning the head is true if the body holds.

Example – ancestor relation:

% Facts (EDB)
parent(ram, sita).
parent(sita, hari).

% Rules (IDB)
ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

% Query
?- ancestor(ram, Y).

The recursive rule lets Datalog deduce ancestor(ram, sita) and ancestor(ram, hari) — something a single SQL statement (without recursion) cannot express naturally. Datalog rules are safe (every head variable appears in a non-negated body literal) and evaluated bottom-up to a fixpoint.

deductive-database
5short5 marks

Explain the two-phase locking (2PL) protocol for concurrency control.

Two-Phase Locking (2PL) Protocol

2PL is a concurrency-control protocol that ensures conflict-serializability by requiring every transaction to acquire and release locks in two distinct phases. Locks are of two kinds: shared (S) for reading and exclusive (X) for writing.

The two phases:

  1. Growing (expanding) phase – the transaction may acquire locks (S or X) but may not release any lock.
  2. Shrinking (contracting) phase – once the transaction releases its first lock, it may release more locks but may not acquire any new lock.

The moment the transaction holds all the locks it will ever need is called the lock point. Because all acquisitions precede all releases, 2PL guarantees a serializable schedule.

Example:

T1: lock-X(A) read/write A  -- growing
    lock-X(B) read/write B  -- growing (still no release)
    unlock(A) unlock(B)     -- shrinking

Variants: Strict 2PL (release all X-locks only at commit, prevents cascading rollback) and Rigorous 2PL (release all locks at commit). Limitation: standard 2PL can lead to deadlocks, which must be handled by detection or prevention.

concurrency
6short5 marks

Explain the two-phase commit (2PC) protocol used in distributed transactions.

Two-Phase Commit (2PC) Protocol

2PC is an atomic commitment protocol that ensures all participating sites in a distributed transaction either all commit or all abort, preserving atomicity across sites. One site acts as the coordinator and the others as participants (cohorts).

Phase 1 – Prepare / Voting phase

  1. The coordinator sends a PREPARE message to all participants.
  2. Each participant decides whether it can commit; it writes a ready/prepare record to its log and replies VOTE-COMMIT (Yes) if able, or VOTE-ABORT (No) if not.

Phase 2 – Commit / Decision phase

  1. If the coordinator receives Yes from all participants, it writes a commit record and sends GLOBAL-COMMIT; if any participant voted No (or timed out), it sends GLOBAL-ABORT.
  2. Each participant commits or aborts accordingly, writes the outcome to its log, and sends an ACK to the coordinator, which then completes.

Logging at each step lets the protocol recover after failures. Drawback: 2PC is a blocking protocol — if the coordinator fails after participants vote Yes but before the decision is sent, participants holding locks must wait (block) until the coordinator recovers. (Three-Phase Commit reduces this blocking.)

distributed-transaction
7short5 marks

What is indexing? Differentiate between primary, secondary and clustering indexes.

Indexing

Indexing is a data-structure technique that creates an auxiliary access structure (typically <search-key, pointer> entries, often organized as a B+-tree or hash) to speed up retrieval of records based on a search key, avoiding full-table scans. It trades extra storage and update cost for faster reads.

Primary vs Secondary vs Clustering Index

FeaturePrimary indexClustering indexSecondary index
Built onOrdering key field that is unique (primary key)Ordering field that is non-key (duplicates allowed)A non-ordering field (key or non-key)
Data file orderFile is physically ordered on this fieldFile is physically ordered on this fieldFile not ordered on this field
EntriesOne per data block (sparse)One per distinct value of the fieldUsually dense (one per record)
Number allowedOne per tableOne per tableMany per table

Key points:

  • A primary index is a sparse index on the unique ordering key.
  • A clustering index orders the file on a non-unique field (e.g. Department), grouping records with the same value together.
  • A secondary index provides an alternate access path on a field the file is not sorted by; it is dense and there can be several secondary indexes on a table.
indexing
8short5 marks

Explain the architecture of parallel databases. What is inter-query and intra-query parallelism?

Architecture of Parallel Databases

A parallel database uses multiple CPUs and disks working in parallel to improve throughput (transactions/second) and reduce response time. Three main hardware architectures exist:

  • Shared-memory: all processors share a common memory and disks via a fast bus. Easy programming but limited scalability (memory-bus contention).
  • Shared-disk: each processor has private memory but all share the disks. Better fault tolerance; interconnect to disks can bottleneck.
  • Shared-nothing: each processor has its own memory and disk; nodes communicate only by message passing. Most scalable — used by most large parallel/NoSQL systems.
  • (Hierarchical/hybrid) combines the above.

Data is partitioned (round-robin, hash, or range) across disks so operations run in parallel.

Inter-query vs Intra-query Parallelism

  • Inter-query parallelism: different queries/transactions run in parallel on different processors. It increases overall throughput but does not speed up an individual query. Common in OLTP systems.
  • Intra-query parallelism: a single query is executed in parallel across processors, reducing that query's response time. It has two forms:
    • Intra-operation parallelism – one operation (e.g. a large sort or join) is parallelized across processors.
    • Inter-operation parallelism – different operators of the same query plan run in parallel (e.g. pipelined or independent operators).
parallel-database
9short5 marks

What is an object-relational database? How does it differ from a purely relational database?

Object-Relational Database (ORDBMS)

An object-relational database extends the traditional relational model with object-oriented features while keeping SQL and tables as the foundation. It adds support for user-defined types (UDTs), complex/structured attributes, collection types (arrays, nested tables), inheritance of types, methods, references (REF), and large objects (BLOB/CLOB). SQL:1999 standardized these features; examples include Oracle, PostgreSQL and IBM DB2.

Example:

CREATE TYPE Address AS (street VARCHAR(50), city VARCHAR(30));
CREATE TABLE Person (
  id   INT PRIMARY KEY,
  name VARCHAR(40),
  addr Address          -- structured (object) attribute
);

Difference from a Purely Relational Database

AspectPure RelationalObject-Relational
Data valuesAtomic (1NF, simple types only)Complex/structured, multi-valued, nested types allowed
TypesBuilt-in scalar typesUser-defined types + methods
InheritanceNot supportedType/table inheritance supported
BehaviourData onlyData + methods/functions on types
Object identity / referencesNone (value-based keys)REF types give object references

In short, an ORDBMS keeps relational completeness and SQL but relaxes the atomic-value (1NF) restriction and adds OO concepts, bridging relational and object-oriented databases.

object-relational
10short5 marks

What is XML data model? Explain how XML is used to represent and query data in databases.

XML Data Model

XML (eXtensible Markup Language) is a self-describing, text-based markup language for representing semi-structured, hierarchical data. The XML data model is a tree (ordered, labelled tree) in which:

  • Elements (tags) are the nodes and may nest to any depth,
  • Attributes carry additional named values on elements,
  • Text (PCDATA) holds leaf content,
  • and a single root element contains the whole document.

Schemas are described using a DTD or XML Schema (XSD), which constrain element/attribute structure and types.

Example:

<students>
  <student id="1">
    <name>Ram</name>
    <age>22</age>
  </student>
</students>

Using XML in Databases (Representation & Querying)

  • Representation/storage: XML can be stored as a whole document (native XML / XML column type), shredded into relational tables, or kept in a native XML database.
  • Querying: XML is queried with XPath (path navigation) and XQuery (FLWOR — for, let, where, order by, return); transformation uses XSLT.
(: names of students older than 20 :)
for $s in doc("students.xml")//student
where $s/age > 20
return $s/name

XML is widely used for data exchange and interoperability between heterogeneous databases and web services because it is platform-independent and self-describing.

xml
11short5 marks

Explain log-based recovery techniques in a database system.

Log-Based Recovery

Log-based recovery uses a log (journal) — a sequential file of records describing every database modification — to restore the database to a consistent state after a failure, ensuring atomicity and durability.

Log records

For a transaction TiT_i updating data item XX from old to new value, the log holds:

  • <T_i start>
  • <T_i, X, old_value, new_value> (update record)
  • <T_i commit> or <T_i abort>

Log records are written before the corresponding data is written to disk — the Write-Ahead Logging (WAL) rule.

Two main techniques

1. Deferred update (NO-UNDO): database changes are written to disk only after the transaction commits. On failure, uncommitted transactions left no disk changes, so only REDO (using new values) is needed for committed transactions; no UNDO required.

2. Immediate update (UNDO/REDO): changes may be written to disk before commit. Recovery uses both:

  • UNDO (restore old values) for transactions that started but did not commit,
  • REDO (apply new values) for transactions that committed before the crash.

Checkpoints

Periodic checkpoints (<checkpoint>) flush buffers and log to disk so recovery need only scan the log back to the last checkpoint, limiting the amount of work. Transactions committed before the checkpoint need no redo.

recovery
12short5 marks

Write short notes on the MapReduce programming model for big data processing.

MapReduce Programming Model

MapReduce is a programming model (introduced by Google, implemented in Hadoop) for processing very large datasets in parallel across a cluster of commodity machines. The programmer writes two functions and the framework handles parallelization, data distribution, scheduling and fault tolerance.

The two phases

  • Map: takes input key-value pairs and emits a set of intermediate key-value pairs. map(k1,v1)list(k2,v2)\text{map}(k_1, v_1) \rightarrow \text{list}(k_2, v_2)
  • Shuffle & Sort: the framework groups all intermediate values by key.
  • Reduce: processes each key with its list of values and produces the final output. reduce(k2,list(v2))list(v3)\text{reduce}(k_2, \text{list}(v_2)) \rightarrow \text{list}(v_3)

Example – Word Count

map(docID, text):
   for each word w in text:
       emit(w, 1)

reduce(word, counts):
   emit(word, sum(counts))

Key points

  • Runs on a distributed file system (e.g. HDFS); computation is moved to the data (data locality).
  • Scalable and fault-tolerant – failed tasks are automatically re-executed.
  • Suited to batch processing of big data; less suited to iterative or low-latency tasks (where Spark is preferred).
mapreduce

Frequently asked questions

Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2078?
The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2078 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Advanced Database (BSc CSIT, CSC461) 2078 paper come with solutions?
Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2078 paper?
The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2078 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?
Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.