Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What is query processing? Explain the different steps involved in query processing and optimization with the help of a block diagram. Discuss how heuristic-based optimization transforms a relational algebra query into an efficient query plan.

Query Processing

Query processing is the set of activities a DBMS performs to translate a high-level query (e.g., SQL) into an efficient sequence of low-level operations that retrieve data from the database, execute them, and return the result.

Steps in Query Processing and Optimization

The overall flow can be shown as the following block diagram (described in words):

  High-level SQL query
          |
          v
  [Scanner / Parser / Validator]   --> checks syntax, attributes, relations
          |
          v
  Query in internal form (parse tree)
          |
          v
  [Query Optimizer] <---- Statistics / Catalog / Cost model
          |
          v
  Execution Plan (annotated relational algebra tree)
          |
          v
  [Code Generator]
          |
          v
  [Runtime / Query Execution Engine]
          |
          v
      Result of query
  1. Parsing and translation (Scanner, Parser, Validator): The query is scanned for tokens, parsed for correct syntax, and validated against the system catalog (relation and attribute names exist, types match). It is translated into an internal representation, usually a relational-algebra expression (query tree).
  2. Optimization: The optimizer generates several equivalent evaluation plans and selects the one with the least estimated cost (disk I/O, CPU, etc.) using catalog statistics and a cost model. It chooses access methods (index vs. full scan), join orders, and join algorithms.
  3. Code/plan generation: The chosen plan is turned into executable code (an execution plan with iterators).
  4. Execution: The query-evaluation engine runs the plan and returns the result.

Heuristic-Based Optimization

Heuristic optimization applies transformation rules to the relational-algebra tree to convert it into an equivalent but cheaper form, without a full cost analysis. The main heuristic is to reduce the size of intermediate results early. Typical rules applied in order:

  • Push selections (σ\sigma) down the tree (perform them as early as possible) so that fewer tuples flow upward — selection is the most beneficial operation to do first.
  • Push projections (π\pi) down so that only needed attributes are carried forward, reducing tuple width.
  • Convert Cartesian products followed by selection into joins (σθ(R×S)RθS\sigma_\theta(R \times S) \equiv R \bowtie_\theta S).
  • Reorder joins so the most restrictive (smallest intermediate result) joins execute first.
  • Combine cascading selections and use associativity/commutativity of joins.

Example: A query πname(σdept=CS(StudentEnroll))\pi_{name}(\sigma_{dept='CS'}(Student \bowtie Enroll)) is improved by pushing σdept=CS\sigma_{dept='CS'} onto the Student relation before the join, so the join processes far fewer tuples. The result is an efficient query plan that is then executed.

query-optimization
2long10 marks

What is a distributed database management system (DDBMS)? Explain data fragmentation, replication and allocation techniques used in distributed databases with suitable examples.

Distributed Database Management System (DDBMS)

A distributed database is a collection of logically interrelated databases distributed over a computer network. A DDBMS is the software that manages this distributed database and makes the distribution transparent to users — they see a single logical database even though the data physically resides on multiple sites. It supports location, fragmentation, and replication transparency, distributed query processing, and distributed transaction management.

Data Fragmentation

Fragmentation splits a global relation into smaller pieces (fragments) stored at different sites. Conditions: completeness (every tuple/attribute appears in some fragment), reconstruction (the original relation can be rebuilt), and disjointness.

  • Horizontal fragmentation: divides a relation by rows (tuples) using a selection predicate. Example: EMP split as EMP1=σdept=Sales(EMP)EMP_1 = \sigma_{dept='Sales'}(EMP) stored at the Sales site and EMP2=σdept=HR(EMP)EMP_2 = \sigma_{dept='HR'}(EMP) at the HR site. Reconstruction: EMP=EMP1EMP2EMP = EMP_1 \cup EMP_2.
  • Vertical fragmentation: divides a relation by columns (attributes), repeating the primary key in each fragment. Example: EMPA=πeid,name,dept(EMP)EMP_A = \pi_{eid, name, dept}(EMP) and EMPB=πeid,salary,tax(EMP)EMP_B = \pi_{eid, salary, tax}(EMP). Reconstruction: EMP=EMPAEMPBEMP = EMP_A \bowtie EMP_B.
  • Mixed (hybrid) fragmentation: combines horizontal and vertical fragmentation.

Data Replication

Replication stores copies of a fragment (or whole relation) at multiple sites.

  • Full replication: a copy of the whole database at every site — best read performance and availability, but costly updates.
  • Partial replication: only some fragments are replicated at selected sites.
  • No replication: each fragment stored at exactly one site.

Advantages: improved availability and faster local reads. Disadvantage: updates must be propagated to all copies to keep them consistent.

Data Allocation

Allocation decides which fragment/replica is stored at which site, based on access patterns and cost.

  • Non-redundant allocation: each fragment assigned to exactly one site.
  • Redundant allocation: fragments allocated with replicas to optimize availability and local access. The goal is to place data close to where it is most frequently used to minimize communication cost.

Summary: Fragmentation decides how to split data, replication decides how many copies, and allocation decides where to place them.

distributed-database
3long10 marks

Explain the object-oriented database model. Discuss the ODMG object model along with the Object Definition Language (ODL) and Object Query Language (OQL) with examples.

Object-Oriented Database (OODB) Model

An object-oriented database combines database capabilities (persistence, concurrency, recovery, querying) with the object-oriented programming paradigm. Data is represented as objects, each with a unique object identifier (OID), a state (attribute values), and behavior (methods). Key OO concepts supported: encapsulation, inheritance, polymorphism, complex objects, and class/type hierarchies. OODBs avoid the impedance mismatch between programming languages and relational tables.

ODMG Object Model

The ODMG (Object Data Management Group) standard defines a common model so OODB products are portable. Its main components are:

  • Objects and Literals: an object has an OID and mutable state; a literal has only a value (no OID).
  • Types: built from interfaces (behavior only) and classes (state + behavior, instantiable).
  • Atomic, structured and collection types (set, bag, list, array, dictionary).
  • Relationships (with inverse traversal paths) and methods.
  • Supports inheritance through : (subtyping).

Object Definition Language (ODL)

ODL is the DDL of the ODMG model — used to define object schemas (classes, attributes, relationships, methods) independent of any programming language.

class Student (extent students key sid) {
    attribute int sid;
    attribute string name;
    relationship Set<Course> enrolled
        inverse Course::students;
    float gpa();
};
class Course (extent courses key cid) {
    attribute string cid;
    attribute string title;
    relationship Set<Student> students
        inverse Student::enrolled;
};

Object Query Language (OQL)

OQL is the declarative query language of ODMG, SQL-like but supporting objects, path expressions, method calls, and collections.

-- Names of students with GPA above 3.5
select s.name
from s in students
where s.gpa() > 3.5;

-- Path expression: titles of courses a student is enrolled in
select c.title
from s in students, c in s.enrolled
where s.name = "Ram";

OQL queries return collections of objects or literals and can be embedded in host languages such as C++ or Java.

object-oriented-database
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Why is hashing important to store data in databases? What is primary file organization?

Importance of hashing: Hashing applies a hash function to a record's search key to compute the address (bucket) of the disk block where the record is stored. This allows records to be located in (ideally) O(1) time / a single disk access without scanning the file or traversing an index, making equality searches, insertions, and deletions very fast. It is especially useful for equality selections on the key. (Its limitation: it does not efficiently support range queries.)

Primary file organization: It determines the physical placement/order of records in the data file on disk, i.e., how records are physically arranged and accessed. Common primary organizations are:

  • Heap (unordered) file — records placed in insertion order.
  • Sequential (ordered) file — records sorted on an ordering field.
  • Hash file — record placement determined by a hash function on the hash key.
  • B+-tree / clustered organization.

It differs from a secondary organization (a secondary index), which only provides an extra access path without changing the physical storage of records.

file-organization
5short5 marks

Explain aggregation with a suitable example.

Aggregation is an abstraction in the (Enhanced) Entity-Relationship model used to treat a relationship (along with its participating entities) as a single higher-level entity, so that this relationship can itself participate in another relationship. It overcomes the ER limitation that a relationship cannot directly be connected to another relationship.

Example: Consider entities Employee and Project linked by a relationship WorksOn. Now suppose we want to record which Manager monitors a particular employee-on-a-project assignment. A manager monitors the WorksOn relationship instance, not the employee or project alone. Using aggregation, the WorksOn relationship is abstracted into a single composite object, and a new relationship Monitors is drawn between Manager and this aggregated WorksOn object.

In a diagram this is shown by enclosing the Employee–WorksOn–Project relationship inside a dashed box, and connecting Manager to that box via Monitors. This captures the semantics that monitoring applies to the assignment as a whole.

aggregation
6short5 marks

What is the ODMG object model? What is Object Definition Language (ODL)?

ODMG Object Model: The standard object model defined by the Object Data Management Group (ODMG) to provide a portable, common data model for object-oriented and object databases. Its main features:

  • Basic building blocks are objects (have a unique OID and mutable state) and literals (have only a value, no OID).
  • Types are described by interfaces (behavior only) and classes (state + behavior, instantiable).
  • Supports atomic, structured, and collection types (set, bag, list, array, dictionary).
  • Supports inheritance, relationships with inverses, attributes, and methods (operations).
  • Provides three sub-languages: ODL (schema definition), OQL (querying), and language bindings (C++, Java, Smalltalk).

Object Definition Language (ODL): ODL is the data-definition language of the ODMG standard, used to specify the object schema — defining classes, their attributes, relationships (with inverse), and method signatures — in a programming-language-independent way.

class Employee (extent employees key eid) {
    attribute int eid;
    attribute string name;
    relationship Department dept inverse Department::staff;
};
odmg
7short5 marks

Explain the different steps in query processing.

Steps in Query Processing

Query processing translates a high-level query into a result through the following main steps:

  1. Parsing and translation: The query is scanned and parsed to check syntactic correctness, validated against the system catalog (relations/attributes exist, types match), and translated into an internal relational-algebra expression / query tree.
  2. Optimization: The optimizer generates equivalent execution plans and selects the one with the lowest estimated cost (disk I/O, CPU), choosing access paths (index vs. scan), join orders, and join algorithms using catalog statistics.
  3. Code/plan generation: The selected plan is converted into an executable query-execution plan (a tree of physical operators/iterators).
  4. Execution (evaluation): The runtime engine executes the plan against the stored data and returns the result to the user.

In short: SQL query → parse & validate → relational algebra → optimize → execution plan → execute → result.

query-processing
8short5 marks

Why is query optimization essential in databases? What is heuristic optimization?

Why query optimization is essential: A single SQL query can be evaluated by many different but equivalent execution plans, and their costs (disk I/O, CPU time, memory, response time) can differ by orders of magnitude. Query optimization is essential because it:

  • selects the most efficient execution plan, minimizing resource usage and response time;
  • relieves the user from writing low-level, efficient code (declarative SQL just states what, not how);
  • improves throughput and scalability, especially for large relations and complex joins where a poor join order or a full scan instead of an index scan can be extremely costly.

Heuristic optimization: It is a rule-based optimization approach that applies a set of transformation (heuristic) rules to the relational-algebra query tree to obtain an equivalent but cheaper plan, without exhaustively computing the cost of every alternative. The core heuristic is to reduce the size of intermediate results early, mainly by:

  • pushing selections (σ\sigma) down the tree to filter tuples as early as possible;
  • pushing projections (π\pi) down to drop unneeded columns early;
  • converting Cartesian product + selection into joins and choosing a good join order.

This is faster to compute than full cost-based optimization but may not always yield the absolute optimal plan.

query-optimization
9short5 marks

Define fragmentation. Explain horizontal fragmentation with an example.

Fragmentation is the process in a distributed database of dividing a global relation into smaller pieces called fragments, which are stored at different sites of the network. It must satisfy completeness (every data item appears in some fragment), reconstruction (the original relation can be rebuilt from its fragments), and disjointness. Types: horizontal, vertical, and mixed (hybrid).

Horizontal fragmentation divides a relation by rows (tuples), where each fragment is the set of tuples satisfying a selection predicate. Each fragment is defined as Ri=σPi(R)R_i = \sigma_{P_i}(R), and the original relation is reconstructed by the union of all fragments: R=R1R2RnR = R_1 \cup R_2 \cup \dots \cup R_n.

Example: Given a relation EMPLOYEE(eid, name, branch) for a bank with branches in Kathmandu and Pokhara:

  • EMP1=σbranch=Kathmandu(EMPLOYEE)EMP_1 = \sigma_{branch='Kathmandu'}(EMPLOYEE) — stored at the Kathmandu site.
  • EMP2=σbranch=Pokhara(EMPLOYEE)EMP_2 = \sigma_{branch='Pokhara'}(EMPLOYEE) — stored at the Pokhara site.

Each site holds only the employees relevant to it (improving local access), and EMPLOYEE=EMP1EMP2EMPLOYEE = EMP_1 \cup EMP_2 reconstructs the full relation.

fragmentation
10short5 marks

What are the characteristics of NoSQL systems? Explain.

Characteristics of NoSQL Systems

NoSQL ("Not Only SQL") systems are non-relational databases designed for large-scale, distributed, and high-availability data management, especially for big data and web applications. Their key characteristics:

  1. Schema-less / flexible schema: Data need not conform to a fixed predefined schema; records can have different structures (good for semi-structured data like JSON documents).
  2. Horizontal scalability: Designed to scale out across many commodity servers (sharding) rather than scaling up a single machine, handling huge data volumes and high throughput.
  3. Distribution and replication: Data is partitioned and replicated across nodes for fault tolerance and high availability.
  4. BASE instead of ACID: Favor Basically Available, Soft state, Eventual consistency — relaxing strict consistency for availability and performance (per the CAP theorem, NoSQL systems often prioritize Availability and Partition tolerance over strong Consistency).
  5. Variety of data models: include key-value stores (Redis), document stores (MongoDB), column-family stores (Cassandra, HBase), and graph databases (Neo4j).
  6. High performance for simple read/write operations and no (or limited) support for complex joins and full SQL.

These features make NoSQL suitable for big-data, real-time, and web-scale applications where flexibility and scalability matter more than rigid relational structure.

nosql
11short5 marks

What is the concept of an active database? What are triggers?

Active database: An active database is a database system that can react automatically to events (changes/conditions) by executing predefined actions without explicit user request. It extends a conventional (passive) DBMS with reactive behavior, typically specified using ECA (Event–Condition–Action) rules:

  • Event — an occurrence such as an INSERT, UPDATE, or DELETE.
  • Condition — a predicate checked when the event fires.
  • Action — the operation executed if the condition is true.

Active databases are used to enforce integrity constraints, business rules, automatic derived-data maintenance, alerting, and logging/auditing.

Triggers: A trigger is the most common mechanism that implements ECA rules. It is a named procedural block stored in the database that is automatically executed ("fired") in response to a specified data-modification event on a table.

CREATE TRIGGER check_salary
BEFORE INSERT ON Employee
FOR EACH ROW
WHEN (NEW.salary < 0)
BEGIN
   RAISE_APPLICATION_ERROR(-20001, 'Salary cannot be negative');
END;

Here the event is INSERT, the condition is salary < 0, and the action raises an error — enforcing a business rule automatically.

active-databasetriggers
12short5 marks

Write short notes on: (a) Big Data (b) Information Retrieval.

Short Notes

(a) Big Data

Big Data refers to datasets that are too large, fast, and varied to be captured, stored, managed, and processed efficiently by traditional database tools. It is characterized by the "V"s:

  • Volume — enormous amounts of data (terabytes to petabytes and beyond).
  • Velocity — high speed of data generation and the need for real-time/near-real-time processing.
  • Variety — structured, semi-structured, and unstructured data (text, images, logs, sensor data, social media).
  • Veracity — uncertainty/quality of data, and Value — usefulness extracted from it.

Big Data is typically processed using distributed frameworks such as Hadoop (HDFS + MapReduce), Spark, and NoSQL databases that scale horizontally across clusters of commodity machines.

(b) Information Retrieval

Information Retrieval (IR) is the area concerned with finding relevant documents/information (usually unstructured text) from a large collection in response to a user's query, as opposed to retrieving exact structured records like a DBMS.

  • A typical IR system indexes documents (e.g., using an inverted index), processes the query, ranks documents by relevance, and returns the most relevant results.
  • Common models: Boolean, Vector Space Model (using TF–IDF weighting and cosine similarity), and Probabilistic models.
  • Effectiveness is measured by precision (fraction of retrieved that are relevant) and recall (fraction of relevant that are retrieved).
  • Web search engines (e.g., Google) are the most familiar large-scale IR systems.
bigdatainformation-retrieval

Frequently asked questions

Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2074?
The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2074 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Advanced Database (BSc CSIT, CSC461) 2074 paper come with solutions?
Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2074 paper?
The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2074 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?
Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.