BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2078 Nepal

Q: Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2078?

The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2078 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Advanced Database (BSc CSIT, CSC461) 2078 paper come with solutions?

Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2078 paper?

The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2078 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?

Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Explain the object-oriented database model. Discuss the ODMG object model along with the Object Definition Language (ODL) and Object Query Language (OQL) with examples.

object-oriented-database

Answer 1

Object-Oriented Database (OODB) Model

An object-oriented database stores data as objects, the same way object-oriented programming languages represent entities. It combines database capabilities (persistence, concurrency, recovery, querying) with OO concepts such as objects, classes, encapsulation, inheritance, polymorphism, and object identity (OID). Each object has a unique, system-generated OID independent of its attribute values, and complex objects (sets, lists, nested objects) can be stored directly without flattening into rows.

Key features: complex object support, persistence of program objects, object identity, encapsulation of state and behaviour (methods), class hierarchy with inheritance, and overcoming the impedance mismatch between programming languages and relational tables.

ODMG Object Model

The ODMG (Object Data Management Group) standard defines a common object model and languages so OODBs are portable across vendors. Its main components are:

Objects and Literals – an object has an OID, state and behaviour; a literal has only a value (no OID).
Types: atomic, structured, collection (set, bag, list, array, dictionary).
Interfaces and Classes defining attributes, relationships and operations.
Three languages: ODL (schema definition), OQL (querying), and language bindings (C++, Java, Smalltalk).

Object Definition Language (ODL)

ODL is used to define the schema (classes, attributes, relationships, methods), independent of any programming language.

class Student (extent students key roll_no) {
  attribute string  roll_no;
  attribute string  name;
  attribute short   age;
  relationship set<Course> enrolled_in
     inverse Course::has_students;
  short  totalCredits();
};

class Course (extent courses key code) {
  attribute string code;
  attribute string title;
  relationship set<Student> has_students
     inverse Student::enrolled_in;
};

Object Query Language (OQL)

OQL is a declarative, SQL-like query language for ODMG objects that also supports path expressions, method calls and complex results.

-- names of students older than 20
SELECT s.name
FROM   students s
WHERE  s.age > 20;

-- path expression over a relationship
SELECT c.title
FROM   students s, s.enrolled_in c
WHERE  s.name = "Ram";

OQL returns objects, structures or collections and can invoke object methods (e.g. s.totalCredits()), which pure SQL cannot.

Conclusion: The OODB model + ODMG standard (ODL for schema, OQL for queries) lets applications store and query rich, behaviour-bearing objects directly, removing impedance mismatch.

Answer 2

Concurrency Control

Concurrency control is the activity of coordinating the concurrent execution of transactions in a multi-user database so that the result is equivalent to some serial execution (serializability) and the database stays consistent. It prevents anomalies such as lost update, dirty read, unrepeatable read and phantom problems that arise from interleaving.

1. Two-Phase Locking (2PL) Protocol

2PL is a lock-based protocol. Each transaction acquires shared (S) locks for reads and exclusive (X) locks for writes, and obeys two phases:

Growing phase: the transaction may acquire locks but cannot release any.
Shrinking phase: once the transaction releases its first lock, it may not acquire any new lock.

The point between the two phases (when it holds all its locks) is the lock point. 2PL guarantees conflict-serializability.

Example:

T1: lock-X(A); read A; A=A-50; write A;
    lock-X(B); read B; B=B+50; write B;
    unlock(A); unlock(B);   <- all acquire before any release

Variants: Strict 2PL (hold all X-locks till commit, avoids cascading rollback) and Rigorous 2PL (hold all locks till commit). Drawback: 2PL can cause deadlock.

2. Timestamp-Ordering (TO) Protocol

Each transaction $T_i$ gets a unique timestamp $TS(T_i)$ at start (e.g. system clock or counter); older transactions have smaller timestamps. Each data item $X$ keeps:

$W\text{-}TS(X)$ = largest timestamp of any transaction that wrote $X$ ;
$R\text{-}TS(X)$ = largest timestamp of any transaction that read $X$ .

The protocol enforces that conflicting operations execute in timestamp order:

Read $X$ by $T_i$ : if $TS(T_i) < W\text{-}TS(X)$ → reject and roll back $T_i$ (it is too late); else read, and set $R\text{-}TS(X)=\max(R\text{-}TS(X),TS(T_i))$ .

Write $X$ by $T_i$ : if $TS(T_i) < R\text{-}TS(X)$ or $TS(T_i) < W\text{-}TS(X)$ → reject and roll back $T_i$ ; else write and set $W\text{-}TS(X)=TS(T_i)$ .

Example: Let $TS(T_1)=5$ , $TS(T_2)=10$ . If $T_2$ writes $X$ (so $W\text{-}TS(X)=10$ ) and then $T_1$ tries to write $X$ , since $TS(T_1)=5 < W\text{-}TS(X)=10$ , $T_1$ is rolled back and restarted with a new timestamp.

TO is deadlock-free (no waiting) but may cause starvation/cascading rollbacks; the Thomas Write Rule ignores obsolete writes to improve concurrency.

Comparison: 2PL uses locking and may deadlock; TO uses timestamps, avoids deadlock but rolls back more transactions.

Answer 3

NoSQL Database

A NoSQL (Not Only SQL) database is a non-relational data store designed to handle large volumes of unstructured/semi-structured data with horizontal scalability, high availability and flexible (schema-less) data models. They emerged to meet Big-Data and web-scale needs that relational databases struggle with.

Characteristics of NoSQL Systems

Schema-less / flexible schema – records need not share a fixed structure.
Horizontal scalability – scale out across commodity servers via sharding/replication.
High availability and partition tolerance – distributed, fault-tolerant design.
BASE model (Basically Available, Soft state, Eventual consistency) instead of strict ACID.
Simple, fast access using keys/queries rather than complex joins.
Distributed and replicated for performance and reliability.

CAP Theorem

The CAP theorem (Brewer's theorem) states that a distributed data system can simultaneously guarantee at most two of the following three properties:

Consistency (C): every read sees the most recent write.
Availability (A): every request gets a (non-error) response.
Partition tolerance (P): the system keeps working despite network partitions.

Since network partitions are unavoidable in distributed systems, P must be tolerated, forcing a trade-off between C and A: a CP system (e.g. HBase, MongoDB defaults) sacrifices availability for consistency, while an AP system (e.g. Cassandra, DynamoDB) sacrifices strong consistency for availability (eventual consistency).

Comparison of NoSQL Data Stores

Type	Data model	Best for	Examples
Key-Value	`(key → value)` opaque blob	Caching, sessions, simple lookups	Redis, DynamoDB, Riak
Document	Self-describing documents (JSON/BSON/XML), nested	Content mgmt, catalogs, flexible records	MongoDB, CouchDB
Column-family (wide-column)	Rows with dynamic column families, stored by column	Analytics, time-series, huge sparse tables	Cassandra, HBase
Graph	Nodes + edges with properties	Social networks, recommendations, path queries	Neo4j, OrientDB

Summary: Choose key-value for speed and simplicity, document for flexible records, column-family for write-heavy analytical workloads, and graph when relationships are first-class.

Answer 4

Deductive Database

A deductive database is a database system that can derive (deduce) new facts from the facts and rules stored in it, using logical inference. It combines a relational-style fact base (extensional database, EDB) with deduction rules (intensional database, IDB) written in a logic language, and answers queries by applying the rules.

Datalog

Datalog is a declarative, logic-based query language (a subset of Prolog) used in deductive databases. A Datalog program consists of:

Facts – ground assertions, e.g. parent(ram, sita).
Rules – of the form Head :- Body, meaning the head is true if the body holds.

Example – ancestor relation:

% Facts (EDB)
parent(ram, sita).
parent(sita, hari).

% Rules (IDB)
ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

% Query
?- ancestor(ram, Y).

The recursive rule lets Datalog deduce ancestor(ram, sita) and ancestor(ram, hari) — something a single SQL statement (without recursion) cannot express naturally. Datalog rules are safe (every head variable appears in a non-negated body literal) and evaluated bottom-up to a fixpoint.

Answer 5

Two-Phase Locking (2PL) Protocol

2PL is a concurrency-control protocol that ensures conflict-serializability by requiring every transaction to acquire and release locks in two distinct phases. Locks are of two kinds: shared (S) for reading and exclusive (X) for writing.

The two phases:

Growing (expanding) phase – the transaction may acquire locks (S or X) but may not release any lock.
Shrinking (contracting) phase – once the transaction releases its first lock, it may release more locks but may not acquire any new lock.

The moment the transaction holds all the locks it will ever need is called the lock point. Because all acquisitions precede all releases, 2PL guarantees a serializable schedule.

Example:

T1: lock-X(A) read/write A  -- growing
    lock-X(B) read/write B  -- growing (still no release)
    unlock(A) unlock(B)     -- shrinking

Variants: Strict 2PL (release all X-locks only at commit, prevents cascading rollback) and Rigorous 2PL (release all locks at commit). Limitation: standard 2PL can lead to deadlocks, which must be handled by detection or prevention.

Answer 6

Two-Phase Commit (2PC) Protocol

2PC is an atomic commitment protocol that ensures all participating sites in a distributed transaction either all commit or all abort, preserving atomicity across sites. One site acts as the coordinator and the others as participants (cohorts).

Phase 1 – Prepare / Voting phase

The coordinator sends a PREPARE message to all participants.
Each participant decides whether it can commit; it writes a ready/prepare record to its log and replies VOTE-COMMIT (Yes) if able, or VOTE-ABORT (No) if not.

Phase 2 – Commit / Decision phase

If the coordinator receives Yes from all participants, it writes a commit record and sends GLOBAL-COMMIT; if any participant voted No (or timed out), it sends GLOBAL-ABORT.
Each participant commits or aborts accordingly, writes the outcome to its log, and sends an ACK to the coordinator, which then completes.

Logging at each step lets the protocol recover after failures. Drawback: 2PC is a blocking protocol — if the coordinator fails after participants vote Yes but before the decision is sent, participants holding locks must wait (block) until the coordinator recovers. (Three-Phase Commit reduces this blocking.)

Answer 7

Indexing

Indexing is a data-structure technique that creates an auxiliary access structure (typically <search-key, pointer> entries, often organized as a B+-tree or hash) to speed up retrieval of records based on a search key, avoiding full-table scans. It trades extra storage and update cost for faster reads.

Primary vs Secondary vs Clustering Index

Feature	Primary index	Clustering index	Secondary index
Built on	Ordering key field that is unique (primary key)	Ordering field that is non-key (duplicates allowed)	A non-ordering field (key or non-key)
Data file order	File is physically ordered on this field	File is physically ordered on this field	File not ordered on this field
Entries	One per data block (sparse)	One per distinct value of the field	Usually dense (one per record)
Number allowed	One per table	One per table	Many per table

Key points:

A primary index is a sparse index on the unique ordering key.
A clustering index orders the file on a non-unique field (e.g. Department), grouping records with the same value together.
A secondary index provides an alternate access path on a field the file is not sorted by; it is dense and there can be several secondary indexes on a table.

Answer 8

Architecture of Parallel Databases

A parallel database uses multiple CPUs and disks working in parallel to improve throughput (transactions/second) and reduce response time. Three main hardware architectures exist:

Shared-memory: all processors share a common memory and disks via a fast bus. Easy programming but limited scalability (memory-bus contention).
Shared-disk: each processor has private memory but all share the disks. Better fault tolerance; interconnect to disks can bottleneck.
Shared-nothing: each processor has its own memory and disk; nodes communicate only by message passing. Most scalable — used by most large parallel/NoSQL systems.
(Hierarchical/hybrid) combines the above.

Data is partitioned (round-robin, hash, or range) across disks so operations run in parallel.

Inter-query vs Intra-query Parallelism

Inter-query parallelism: different queries/transactions run in parallel on different processors. It increases overall throughput but does not speed up an individual query. Common in OLTP systems.
Intra-query parallelism: a single query is executed in parallel across processors, reducing that query's response time. It has two forms:
- Intra-operation parallelism – one operation (e.g. a large sort or join) is parallelized across processors.
- Inter-operation parallelism – different operators of the same query plan run in parallel (e.g. pipelined or independent operators).

Answer 9

Object-Relational Database (ORDBMS)

An object-relational database extends the traditional relational model with object-oriented features while keeping SQL and tables as the foundation. It adds support for user-defined types (UDTs), complex/structured attributes, collection types (arrays, nested tables), inheritance of types, methods, references (REF), and large objects (BLOB/CLOB). SQL:1999 standardized these features; examples include Oracle, PostgreSQL and IBM DB2.

Example:

CREATE TYPE Address AS (street VARCHAR(50), city VARCHAR(30));
CREATE TABLE Person (
  id   INT PRIMARY KEY,
  name VARCHAR(40),
  addr Address          -- structured (object) attribute
);

Difference from a Purely Relational Database

Aspect	Pure Relational	Object-Relational
Data values	Atomic (1NF, simple types only)	Complex/structured, multi-valued, nested types allowed
Types	Built-in scalar types	User-defined types + methods
Inheritance	Not supported	Type/table inheritance supported
Behaviour	Data only	Data + methods/functions on types
Object identity / references	None (value-based keys)	REF types give object references

In short, an ORDBMS keeps relational completeness and SQL but relaxes the atomic-value (1NF) restriction and adds OO concepts, bridging relational and object-oriented databases.

Answer 10

XML Data Model

XML (eXtensible Markup Language) is a self-describing, text-based markup language for representing semi-structured, hierarchical data. The XML data model is a tree (ordered, labelled tree) in which:

Elements (tags) are the nodes and may nest to any depth,
Attributes carry additional named values on elements,
Text (PCDATA) holds leaf content,
and a single root element contains the whole document.

Schemas are described using a DTD or XML Schema (XSD), which constrain element/attribute structure and types.

Example:

<students>
  <student id="1">
    <name>Ram</name>
    <age>22</age>
  </student>
</students>

Using XML in Databases (Representation & Querying)

Representation/storage: XML can be stored as a whole document (native XML / XML column type), shredded into relational tables, or kept in a native XML database.
Querying: XML is queried with XPath (path navigation) and XQuery (FLWOR — for, let, where, order by, return); transformation uses XSLT.

(: names of students older than 20 :)
for $s in doc("students.xml")//student
where $s/age > 20
return $s/name

XML is widely used for data exchange and interoperability between heterogeneous databases and web services because it is platform-independent and self-describing.

Answer 11

Log-Based Recovery

Log-based recovery uses a log (journal) — a sequential file of records describing every database modification — to restore the database to a consistent state after a failure, ensuring atomicity and durability.

Log records

For a transaction $T_i$ updating data item $X$ from old to new value, the log holds:

<T_i start>
<T_i, X, old_value, new_value> (update record)
<T_i commit> or <T_i abort>

Log records are written before the corresponding data is written to disk — the Write-Ahead Logging (WAL) rule.

Two main techniques

1. Deferred update (NO-UNDO): database changes are written to disk only after the transaction commits. On failure, uncommitted transactions left no disk changes, so only REDO (using new values) is needed for committed transactions; no UNDO required.

2. Immediate update (UNDO/REDO): changes may be written to disk before commit. Recovery uses both:

UNDO (restore old values) for transactions that started but did not commit,
REDO (apply new values) for transactions that committed before the crash.

Checkpoints

Periodic checkpoints (<checkpoint>) flush buffers and log to disk so recovery need only scan the log back to the last checkpoint, limiting the amount of work. Transactions committed before the checkpoint need no redo.

Answer 12

MapReduce Programming Model

MapReduce is a programming model (introduced by Google, implemented in Hadoop) for processing very large datasets in parallel across a cluster of commodity machines. The programmer writes two functions and the framework handles parallelization, data distribution, scheduling and fault tolerance.

The two phases

Map: takes input key-value pairs and emits a set of intermediate key-value pairs. $\text{map}(k_1, v_1) \rightarrow \text{list}(k_2, v_2)$
Shuffle & Sort: the framework groups all intermediate values by key.
Reduce: processes each key with its list of values and produces the final output. $\text{reduce}(k_2, \text{list}(v_2)) \rightarrow \text{list}(v_3)$

Example – Word Count

map(docID, text):
   for each word w in text:
       emit(w, 1)

reduce(word, counts):
   emit(word, sum(counts))

Key points

Runs on a distributed file system (e.g. HDFS); computation is moved to the data (data locality).
Scalable and fault-tolerant – failed tasks are automatically re-executed.
Suited to batch processing of big data; less suited to iterative or low-latency tasks (where Spark is preferred).

Level	BSc CSIT (TU)
Stream	Science
Subject	Advanced Database (BSc CSIT, CSC461)
Year	2078 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2078 Nepal

Section A: Long Answer Questions

Object-Oriented Database (OODB) Model

ODMG Object Model

Object Definition Language (ODL)

Object Query Language (OQL)

Concurrency Control

1. Two-Phase Locking (2PL) Protocol

2. Timestamp-Ordering (TO) Protocol

NoSQL Database

Characteristics of NoSQL Systems

CAP Theorem

Comparison of NoSQL Data Stores

Section B: Short Answer Questions

Deductive Database

Datalog

Two-Phase Locking (2PL) Protocol

Two-Phase Commit (2PC) Protocol

Phase 1 – Prepare / Voting phase

Phase 2 – Commit / Decision phase

Indexing

Primary vs Secondary vs Clustering Index

Architecture of Parallel Databases

Inter-query vs Intra-query Parallelism

Object-Relational Database (ORDBMS)

Difference from a Purely Relational Database

XML Data Model

Using XML in Databases (Representation & Querying)

Log-Based Recovery

Log records

Two main techniques

Checkpoints

MapReduce Programming Model

The two phases

Example – Word Count

Key points

Frequently asked questions