BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2075 Nepal

Q: Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2075?

The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2075 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Advanced Database (BSc CSIT, CSC461) 2075 paper come with solutions?

Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2075 paper?

The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?

Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is concurrency control? Explain the two-phase locking protocol and timestamp-based ordering protocol for concurrency control with suitable examples.

transactionconcurrency

Answer 1

Concurrency Control

Concurrency control is the activity of coordinating the simultaneous execution of multiple transactions in a multi-user database so that the result is equivalent to some serial (one-at-a-time) execution. It prevents anomalies such as lost updates, dirty reads, unrepeatable reads and phantom reads, thereby preserving the Isolation and Consistency properties of transactions.

1. Two-Phase Locking (2PL) Protocol

2PL is a lock-based concurrency-control protocol. A transaction must obtain a shared lock (S) before reading a data item and an exclusive lock (X) before writing it. Each transaction executes in two phases:

Growing (expanding) phase: the transaction may acquire locks but cannot release any.
Shrinking (contracting) phase: the transaction may release locks but cannot acquire any new lock.

The point where the transaction holds all its locks is called the lock point. 2PL guarantees conflict-serializability.

Example:

Step	T1	T2
1	`lock-X(A)`
2	`read(A); A=A-50; write(A)`
3	`lock-X(B)` (growing)
4	`unlock(A)` (shrinking)
5		`lock-X(A)` (waits, then proceeds)

Variants: Strict 2PL (all exclusive locks held until commit/abort — avoids cascading rollback) and Rigorous 2PL (all locks held until commit). Drawback: 2PL can cause deadlocks.

2. Timestamp-Ordering (TO) Protocol

Each transaction $T_i$ is assigned a unique timestamp $TS(T_i)$ at start (smaller = older). Each data item $Q$ keeps:

$W\text{-}TS(Q)$ : largest timestamp of any transaction that successfully wrote $Q$ .
$R\text{-}TS(Q)$ : largest timestamp of any transaction that successfully read $Q$ .

The protocol orders conflicting operations by timestamp:

On read(Q) by $T_i$ :

If $TS(T_i) < W\text{-}TS(Q)$ → $T_i$ needs a value already overwritten → reject and roll back $T_i$ .
Else execute read; set $R\text{-}TS(Q)=\max(R\text{-}TS(Q),\,TS(T_i))$ .

On write(Q) by $T_i$ :

If $TS(T_i) < R\text{-}TS(Q)$ → a younger transaction already read the old value → reject/roll back.
If $TS(T_i) < W\text{-}TS(Q)$ → write is obsolete → ignore (Thomas Write Rule) or reject.
Else execute write; set $W\text{-}TS(Q)=TS(T_i)$ .

Example: Suppose $TS(T1)=5$ , $TS(T2)=10$ . If $T2$ writes $A$ (now $W\text{-}TS(A)=10$ ) and later the older $T1$ tries to read(A), then $TS(T1)=5 < W\text{-}TS(A)=10$ , so $T1$ is rolled back and restarted with a new timestamp.

The TO protocol ensures serializability in timestamp order and is deadlock-free (no waiting), but may cause starvation/cascading rollbacks.

Answer 2

NoSQL Database

A NoSQL (Not Only SQL) database is a non-relational data store designed to handle large volumes of unstructured/semi-structured data with high scalability, availability and flexible schemas. NoSQL systems emerged to meet the needs of Big Data and web-scale applications where the fixed-schema relational model and ACID overhead become bottlenecks.

Characteristics of NoSQL Systems

Schema-less / flexible schema: records need not follow a fixed structure.
Horizontal scalability: scale out by adding commodity servers (sharding/partitioning).
High availability & fault tolerance: data is replicated across nodes.
BASE model (Basically Available, Soft state, Eventually consistent) instead of strict ACID.
Distributed architecture with no single point of failure.
Optimized for specific data models (documents, key-value, columns, graphs).

CAP Theorem

Proposed by Eric Brewer, the CAP theorem states that a distributed data store can simultaneously guarantee only two of the following three properties:

Consistency (C): every read receives the most recent write (all nodes see the same data).
Availability (A): every request receives a (non-error) response.
Partition tolerance (P): the system continues to operate despite network partitions.

Since network partitions are unavoidable in distributed systems, real systems must choose between CP (e.g., HBase, MongoDB) and AP (e.g., Cassandra, DynamoDB) during a partition.

Comparison of NoSQL Data Stores

Type	Data model	Example DBs	Best for
Key-Value	Simple `(key → value)` pairs; value is opaque	Redis, DynamoDB, Riak	Caching, session stores, fast lookups
Document	Self-describing documents (JSON/BSON/XML) keyed by id; nested fields queryable	MongoDB, CouchDB	Content management, catalogs, flexible records
Column-family (wide-column)	Rows grouped into column families; sparse, columnar storage	Cassandra, HBase, Bigtable	Time-series, analytics, write-heavy big data
Graph	Nodes + edges with properties; relationships are first-class	Neo4j, JanusGraph	Social networks, recommendation, fraud detection

Summary: Key-value gives fastest simple access; document adds queryable structure; column-family scales for huge sparse datasets; graph excels where relationships dominate.

Answer 3

Enhanced Entity-Relationship (EER) Model

The EER model extends the basic ER model with additional semantic (object-oriented) concepts to capture complex relationships in modern applications. It adds subclasses/superclasses, specialization, generalization, categorization (union types), and aggregation, along with attribute and relationship inheritance.

1. Specialization (Top-Down)

The process of defining a set of subclasses of an entity type (the superclass) based on some distinguishing characteristic. The subclasses inherit the attributes and relationships of the superclass and add their own specific ones.

Example: EMPLOYEE is specialized into SECRETARY, ENGINEER, and TECHNICIAN. Diagram: a superclass box EMPLOYEE connected through a circle ○ (with subset symbol ⊂) to the three subclass boxes; JobType is the defining attribute.

2. Generalization (Bottom-Up)

The reverse of specialization: identifying common features of several entity types and abstracting them into a single generalized superclass.

Example: CAR and TRUCK share attributes (VehicleId, Price) → generalized into superclass VEHICLE. Diagram: CAR and TRUCK boxes connected upward through a circle to VEHICLE.

Constraints on specialization/generalization:

Disjointness: disjoint (d) — an entity belongs to at most one subclass; overlapping (o) — may belong to several.
Completeness: total (double line) — every superclass entity must belong to some subclass; partial (single line) — may belong to none.

3. Categorization (Union Type)

A category (union type) is a subclass whose members are a subset of the union of entities from several different superclasses. Unlike a shared subclass, each member relates to one of the superclasses.

Example: OWNER is a category of the union of PERSON, BANK, and COMPANY (an account owner can be any one of these). Diagram: a circle marked ∪ connecting PERSON, BANK, COMPANY to the OWNER category box.

4. Aggregation

Aggregation is an abstraction that treats a relationship (together with its participating entities) as a higher-level entity, allowing relationships among relationships.

Example: the relationship WORKS-ON between EMPLOYEE and PROJECT is aggregated, and this aggregate participates in a USES relationship with MACHINERY. Diagram: a dashed rectangle enclosing the EMPLOYEE—WORKS-ON—PROJECT relationship, with a line from that box to the USES diamond and on to MACHINERY.

Conclusion: These EER constructs provide richer, object-oriented semantics (inheritance, union, abstraction) that the basic ER model cannot express.

Answer 4

Deductive Database

A deductive database is a database system that can derive new facts (implicit data) from the facts and rules stored in it using logical inference. It combines a relational database with a rule-based logic system, typically based on first-order logic / Datalog. It has two components:

Facts (extensional database, EDB): the base data, like relational tuples.
Rules (intensional database, IDB): logical rules used to deduce additional facts.

Datalog

Datalog is a declarative, non-procedural query language (a subset of Prolog, without function symbols) used to express facts and rules. A rule has the form:

HEAD :- BODY.   % read as: HEAD is true IF BODY is true

Example — family relationships:

% Facts (EDB)
parent(ram, sita).
parent(sita, hari).
parent(hari, gita).

% Rules (IDB)
grandparent(X, Z) :- parent(X, Y), parent(Y, Z).
ancestor(X, Y)    :- parent(X, Y).
ancestor(X, Y)    :- parent(X, Z), ancestor(Z, Y).

Query: ?- grandparent(ram, Q). deduces Q = hari (because parent(ram,sita) and parent(sita,hari)). The recursive ancestor rule shows Datalog's key strength: expressing recursive queries that plain SQL handled poorly.

Answer 5

Two-Phase Locking (2PL) Protocol

2PL is a lock-based concurrency-control protocol that guarantees conflict-serializability. A transaction must acquire a shared lock (S) before reading and an exclusive lock (X) before writing a data item. Every transaction follows two phases:

Growing (expanding) phase — the transaction may acquire locks but may not release any lock.
Shrinking (contracting) phase — the transaction may release locks but may not acquire any new lock.

Once a transaction releases its first lock, it enters the shrinking phase and can never lock again. The moment it holds the maximum number of locks is the lock point; serializing transactions by their lock points yields a serializable schedule.

Variants:

Strict 2PL: holds all exclusive (write) locks until commit/abort → prevents cascading rollbacks.
Rigorous 2PL: holds all locks (S and X) until commit.

Advantages: ensures serializability and recoverability (strict version). Disadvantages: can cause deadlocks and reduces concurrency due to waiting.

Answer 6

Two-Phase Commit (2PC) Protocol

2PC is an atomic commitment protocol that ensures all sites participating in a distributed transaction either all commit or all abort, preserving atomicity across the network. One site acts as the coordinator and the rest are participants (cohorts).

Phase 1 — Prepare / Voting Phase

The coordinator sends a PREPARE message to all participants.
Each participant executes the transaction up to commit point, force-writes a <ready T> log record, and replies VOTE-COMMIT (Yes) if it can commit, or VOTE-ABORT (No) otherwise.

Phase 2 — Commit / Decision Phase

If all participants voted Yes, the coordinator writes <commit T> to its log and sends GLOBAL-COMMIT to all; otherwise it sends GLOBAL-ABORT.
Each participant commits or aborts as instructed, writes the outcome to its log, and returns an ACK.
After receiving all ACKs, the coordinator writes <complete T> and the transaction ends.

Properties: guarantees atomicity even with failures (timeouts trigger abort). Drawback: it is a blocking protocol — if the coordinator crashes after participants have voted Yes but before sending the decision, participants remain blocked holding locks. (Three-Phase Commit reduces this blocking.)

Answer 7

Indexing

Indexing is a data-structure technique that creates an auxiliary access structure on one or more attributes of a file to speed up data retrieval, avoiding a full scan of the data file. An index entry is typically a (search-key value, pointer) pair, and indexes are usually kept sorted so they can be searched quickly (e.g., binary search, B+-trees).

Primary, Secondary and Clustering Indexes

Feature	Primary Index	Clustering Index	Secondary Index
Defined on	Ordering key field (unique, e.g., primary key)	Ordering non-key field (duplicates allowed)	Any non-ordering field (key or non-key)
File ordering	File is physically ordered on this field	File is physically ordered on this field	File is not ordered on this field
Density	Sparse (one entry per data block, points to first/anchor record)	Sparse (one entry per distinct value / block)	Dense (one entry per record, or per distinct value)
Number per file	At most one	At most one	Can have many

Key points:

A file can have only one primary or clustering index (because a file is physically ordered on only one field), but several secondary indexes.
Primary index is built on the unique ordering key; clustering index is similar but the ordering field has duplicate values.
Secondary index improves access on non-ordering fields but is larger (denser) and adds storage/maintenance overhead.

Answer 8

Architecture of Parallel Databases

A parallel database uses multiple CPUs and disks operating in parallel to improve performance (throughput and response time) through query and transaction parallelism. The three classic architectures are:

Shared-Memory (shared everything): all processors share a common main memory and disks via a fast interconnect. Easy to program but limited scalability (memory-bus contention).
Shared-Disk (shared nothing but disk): each processor has its own memory but all share the disks. Good fault tolerance; suffers disk-interconnect bottlenecks.
Shared-Nothing: each processor has its own memory and disk; nodes communicate only via the network. Highly scalable — the model used by most large parallel/Big-Data systems.
(Hierarchical/hybrid) combines the above (e.g., shared-nothing clusters of shared-memory nodes).

Inter-Query Parallelism

Different queries/transactions are executed in parallel on different processors. This increases transaction throughput (more queries per second) but does not speed up an individual query. Mainly used in OLTP workloads; requires careful concurrency control (e.g., cache coherence in shared-disk systems).

Intra-Query Parallelism

A single query is executed in parallel across multiple processors to reduce its response time. It has two forms:

Intra-operation parallelism: parallelize one operation (e.g., a parallel sort or parallel join over partitioned data).
Inter-operation parallelism: execute different operators of the same query in parallel (pipelined or independent parallelism).

Intra-query parallelism is especially beneficial for large analytical (OLAP) queries.

Answer 9

Object-Relational Database (ORDBMS)

An object-relational database is a database management system that combines the relational model (tables, SQL, ACID transactions) with object-oriented features such as user-defined types, inheritance, and complex objects. It extends SQL (SQL:1999 and later) so that columns can hold rich, structured data rather than only atomic values. Examples: Oracle, PostgreSQL, IBM DB2.

How it Differs from a Purely Relational Database

Aspect	Pure Relational (RDBMS)	Object-Relational (ORDBMS)
Data types	Only atomic/simple types (1NF, scalar values)	User-defined types (UDTs), structured types, references
Complex objects	Not supported directly	Supports arrays, nested tables, collections, multi-valued attributes
Inheritance	None	Type and table inheritance (sub-types/super-types)
Behaviour	Data only	Methods/functions can be attached to types (encapsulation)
Object identity	Identified by primary-key value	Supports object identifiers (OIDs) and reference types
Standard	SQL-92	SQL:1999+ object extensions

Summary: A pure relational DB strictly stores flat, atomic-valued tuples, whereas an ORDBMS lets you model complex, real-world objects (with structure, inheritance and behaviour) while retaining the maturity, querying power and reliability of the relational model. It thus bridges RDBMS and pure OODBMS.

Answer 10

XML Data Model

XML (eXtensible Markup Language) is a self-describing, text-based markup language for representing semi-structured, hierarchical data. Its data model is a tree (ordered, labelled) in which:

Elements are nodes delimited by tags, e.g. <book>...</book>, and may be nested to any depth.
Attributes provide name–value properties of an element, e.g. <book id="b1">.
Text (PCDATA) forms the leaf content.
A document has exactly one root element and must be well-formed; it may be valid against a schema (DTD or XML Schema/XSD).

XML is platform-independent, making it ideal for data exchange between heterogeneous systems and the web.

Example:

<library>
  <book id="b1">
    <title>Database Systems</title>
    <author>Elmasri</author>
    <price>500</price>
  </book>
</library>

Representing and Querying XML in Databases

Storage: XML can be stored as a CLOB/text, shredded into relational tables, or kept natively as an XML data type in XML-enabled/native XML databases (e.g., the SQL/XML XML type).
Querying:
- XPath — navigates the tree with path expressions, e.g. /library/book/title selects all titles; //book[price>400] filters.
- XQuery — a full query language with FLWOR expressions (For, Let, Where, Order by, Return):

for $b in doc("library.xml")/library/book
where $b/price > 400
return $b/title

SQL/XML — SQL functions (XMLELEMENT, XMLAGG, XMLQUERY) to produce/query XML from relational data.

Answer 11

Log-Based Recovery

Log-based recovery restores the database to a consistent state after a failure using a log (journal) — a sequential file recording every update before it reaches the database (the Write-Ahead Logging, WAL rule). Each log record typically has the form <Ti, X, old_value, new_value>, plus markers <Ti start>, <Ti commit>, <Ti abort>.

Two Update/Recovery Techniques

1. Deferred Database Modification (NO-UNDO/REDO):

Updates are written only to the log; the actual database is modified after the transaction commits.
On recovery, REDO committed transactions; ignore uncommitted ones (their changes never reached the DB). No UNDO needed.

2. Immediate Database Modification (UNDO/REDO):

Updates are written to the database while the transaction is still active (using WAL).
On recovery: UNDO transactions that have <start> but no <commit> (restore old_value), and REDO transactions that have <commit> (apply new_value).

Checkpoints

To limit the amount of log scanned, the system periodically writes a <checkpoint> record after flushing buffers to disk. During recovery only transactions active at or after the last checkpoint need to be examined — REDO from the checkpoint, UNDO active transactions — greatly reducing recovery time.

Answer 12

MapReduce Programming Model

MapReduce is a programming model (introduced by Google, implemented in Hadoop) for processing very large data sets in parallel across a cluster of commodity machines. The programmer expresses the computation as two functions, and the framework handles parallelization, data distribution, fault tolerance and load balancing automatically.

Two Phases

Map: takes input (key, value) pairs and emits a set of intermediate (key, value) pairs.

map(k_1, v_1) \rightarrow list(k_2, v_2)

Reduce: the framework shuffles and sorts intermediate pairs so that all values with the same key go to one reducer, which aggregates them into the final output.

reduce(k_2, list(v_2)) \rightarrow list(v_3)

Between them is the Shuffle & Sort phase that groups values by key.

Example — Word Count

map(docName, text):
    for each word w in text:
        emit(w, 1)

reduce(word, counts):
    emit(word, sum(counts))

Key Features

Data locality: computation is moved to where data resides (HDFS blocks).
Scalability & fault tolerance: failed tasks are re-executed on other nodes; works on thousands of nodes.
Simplicity: developers write only map/reduce; the framework manages the rest.

Limitations: batch-oriented (high latency), poor for iterative/interactive jobs — addressed by newer engines like Apache Spark.

Level	BSc CSIT (TU)
Stream	Science
Subject	Advanced Database (BSc CSIT, CSC461)
Year	2075 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions