Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What is a NoSQL database? Explain the characteristics of NoSQL systems and discuss the CAP theorem. Compare document-based, key-value, column-based and graph-based NoSQL data stores.

NoSQL Databases

A NoSQL (Not Only SQL) database is a non-relational data store designed to handle large volumes of structured, semi-structured and unstructured data with high scalability and availability. Unlike relational databases, NoSQL systems do not require a fixed tabular schema and typically avoid costly join operations, making them well suited to Big Data and real-time web applications.

Characteristics of NoSQL Systems

  • Schema-less / flexible schema: Records can have different fields; the structure can evolve without altering existing data.
  • Horizontal scalability: Data is distributed (sharded) across many commodity servers; capacity is increased by adding nodes (scale-out) rather than upgrading one machine (scale-up).
  • High availability and replication: Data is replicated across nodes to tolerate failures.
  • BASE rather than ACID: Most systems favour Basically Available, Soft state, Eventual consistency over strict ACID transactions.
  • Distributed and partition-tolerant: Built to run on clusters spanning data centres.
  • Simple query interface / API-based access instead of full SQL (though many add SQL-like layers).

CAP Theorem

The CAP theorem (Brewer's theorem) states that a distributed data store can simultaneously provide at most two of the following three guarantees:

  • Consistency (C): Every read returns the most recent write (all nodes see the same data).
  • Availability (A): Every request receives a (non-error) response, even if some nodes fail.
  • Partition tolerance (P): The system keeps operating despite network partitions (lost/dropped messages between nodes).

Since network partitions are unavoidable in distributed systems, P must be tolerated, so a real system must choose between CP (sacrifice availability, e.g. HBase, MongoDB in strong mode) and AP (sacrifice strong consistency, e.g. Cassandra, DynamoDB).

Comparison of NoSQL Data Stores

TypeData ModelBest ForExample
Key-Value(key → opaque value) pairs; fast hash lookupCaching, session stores, simple lookupsRedis, DynamoDB
DocumentSelf-describing documents (JSON/BSON/XML); nested fields, queryableContent management, catalogues, user profilesMongoDB, CouchDB
Column-based (wide-column)Rows grouped into column families; sparse, column-oriented storageTime-series, analytics over huge datasetsCassandra, HBase
GraphNodes + edges with properties; traverses relationshipsSocial networks, recommendation, fraud detectionNeo4j, JanusGraph

Summary: Key-value stores are the simplest and fastest but least expressive; document stores add rich query over flexible records; column stores excel at write-heavy, large-scale analytics; graph stores are unmatched for highly connected, relationship-centric data.

nosqlbigdata
2long10 marks

Explain the Enhanced Entity-Relationship (EER) model. Discuss specialization, generalization, categorization and aggregation with suitable diagrams and examples.

Enhanced Entity-Relationship (EER) Model

The EER model extends the basic ER model with additional semantic concepts needed to model complex applications. It adds specialization, generalization, categorization (union types), aggregation, and the notions of subclass/superclass with attribute and relationship inheritance. A subclass inherits all attributes and relationships of its superclass.

1. Specialization (Top-Down)

The process of defining a set of subclasses of a superclass based on some distinguishing characteristic.

  • Example: Entity EMPLOYEE is specialized into SECRETARY, ENGINEER, and MANAGER based on job type.
  • Diagram (described): A superclass box EMPLOYEE connects through a circle (with subset symbol on the lines) down to the three subclass boxes.
  • Constraints: disjoint (d) vs overlapping (o), and total (double line) vs partial (single line) participation.

2. Generalization (Bottom-Up)

The reverse of specialization: several entities sharing common attributes are combined into a higher-level generalized superclass.

  • Example: CAR and TRUCK, both having attributes like VehicleID, Price, are generalized into superclass VEHICLE.

3. Categorization (Union Type)

A category is a subclass whose members are a union of entities from several different superclasses (each of possibly different type).

  • Example: OWNER is a category that is the union of PERSON, BANK, and COMPANY, because a vehicle owner may be any of these.
  • Diagram (described): A circle marked with (union) connecting the three superclasses down to the OWNER subclass.

4. Aggregation

Aggregation treats a relationship (and its participating entities) as a higher-level entity, allowing a relationship to participate in another relationship.

  • Example: The relationship WORKS_ON between EMPLOYEE and PROJECT is aggregated, and this aggregate then relates via USES to entity MACHINERY (i.e., an employee working on a project uses certain machinery).
  • Diagram (described): A dashed box drawn around EMPLOYEEWORKS_ONPROJECT, with a relationship line from that box to MACHINERY.

Summary

ConceptDirectionIdea
SpecializationTop-downSplit superclass into subclasses
GeneralizationBottom-upMerge entities into superclass
CategorizationUnionSubclass = union of distinct superclasses
AggregationAbstractionTreat a relationship as an entity
extended-erer-model
3long10 marks

What is a data warehouse? Explain its architecture, the concept of OLAP and the star schema, and differentiate between OLTP and OLAP systems.

Data Warehouse

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data (Inmon's definition) used to support management decision-making. It consolidates historical data from multiple operational sources into a single repository optimized for analysis and reporting rather than transaction processing.

Architecture

A typical three-tier architecture:

  1. Bottom tier (Data sources + ETL + Warehouse server): Operational databases and external sources feed the ETL process — Extract, Transform (clean, integrate), Load — into the warehouse and data marts. Metadata is also stored here.
  2. Middle tier (OLAP server): A ROLAP (relational) or MOLAP (multidimensional) engine that presents data as multidimensional cubes.
  3. Top tier (Front-end tools): Query, reporting, analysis, and data-mining tools used by analysts.

OLAP (Online Analytical Processing)

OLAP enables fast, multidimensional analysis of data organized as cubes (measures viewed across dimensions). Core operations:

  • Roll-up: aggregate to a higher level (day → month → year).
  • Drill-down: the reverse, view finer detail.
  • Slice: fix one dimension to a value.
  • Dice: select a sub-cube on multiple dimensions.
  • Pivot (rotate): reorient the cube to view another perspective.

Star Schema

The star schema has a central fact table containing numeric measures and foreign keys, surrounded by denormalized dimension tables.

        DIM_TIME      DIM_PRODUCT
              \          /
               FACT_SALES (qty, amount, time_id, product_id, store_id, customer_id)
              /          \
      DIM_STORE        DIM_CUSTOMER

When dimension tables are normalized into sub-dimensions it becomes a snowflake schema.

OLTP vs OLAP

FeatureOLTPOLAP
PurposeDay-to-day transactionsAnalysis / decision support
DataCurrent, detailedHistorical, summarized
OperationsInsert/Update/Delete (write-heavy)Mostly complex reads
SchemaNormalized (3NF)Star / snowflake (denormalized)
QueriesSimple, shortComplex, long-running, aggregate
UsersClerks, online usersAnalysts, managers
Design goalThroughput, integrityQuery performance
data-warehousedata-mining
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Why is hashing important to store data in databases? What is primary file organization?

Importance of Hashing

Hashing applies a hash function to a record's key to compute the address (bucket) where the record is stored. It is important because it allows direct (random) access to a record in approximately O(1) average time, without scanning the file or traversing an index tree. This makes equality searches (key = value) extremely fast and is ideal for primary key lookups and large files. Collisions are handled by methods such as open addressing or chaining/overflow buckets.

Primary File Organization

Primary file organization determines the physical placement of records on disk and how they are accessed. Common types are:

  • Heap (unordered): records placed in insertion order — fast insert, slow search.
  • Sequential (ordered): records ordered by a field — efficient ordered/range access.
  • Hash: records placed using a hash function on a hash key — fast equality access.
  • B-tree / clustered: tree-structured ordering for efficient search and range queries.

It differs from a secondary organization (index), which provides an alternative access path without changing the primary placement of the records.

file-organization
5short5 marks

Explain aggregation with a suitable example.

Aggregation

Aggregation is an abstraction in the (E)ER model in which a relationship together with its participating entities is treated as a single higher-level entity, so that the relationship itself can participate in another relationship. It is used when we need to model a relationship between a relationship and an entity, which the basic ER model cannot express directly.

Example

Consider entities EMPLOYEE and PROJECT linked by the relationship WORKS_ON. We now want to record which MACHINERY is used by an employee while working on a project.

  • We aggregate EMPLOYEEWORKS_ONPROJECT into one abstract entity.
  • This aggregate participates in a new relationship USES with the entity MACHINERY.

Diagram (described): Draw a dashed rectangle enclosing EMPLOYEE, the diamond WORKS_ON, and PROJECT. From this dashed box, draw a line to the diamond USES, which connects to the MACHINERY entity box.

This avoids the redundancy and loss of meaning that would occur if MACHINERY were related to EMPLOYEE and PROJECT separately.

aggregation
6short5 marks

What is the ODMG object model? What is Object Definition Language (ODL)?

ODMG Object Model

The ODMG (Object Data Management Group) object model is the standard data model proposed by the ODMG consortium for object-oriented databases (OODBMS). It defines a common framework so that object databases from different vendors can be portable and interoperable. Key features:

  • Basic building blocks are objects (have an OID and mutable state) and literals (values without an OID).
  • Supports atomic, collection (set, bag, list, array), and structured types.
  • Provides classes, inheritance, attributes, relationships, and methods (behaviour).
  • Defines three component languages: ODL (schema), OQL (queries), and OML language bindings (C++, Java, Smalltalk).

Object Definition Language (ODL)

ODL is the schema-definition language of the ODMG standard, analogous to the DDL of relational systems. It is used to define the interfaces (classes), their attributes, relationships, and method signatures independently of any programming language, so the schema can be mapped to multiple language bindings.

Example:

class Employee (extent employees key empId) {
  attribute int    empId;
  attribute string name;
  relationship Department worksIn inverse Department::staff;
  void giveRaise(in float amt);
};
odmg
7short5 marks

Explain the different steps in query processing.

Steps in Query Processing

Query processing is the sequence of activities that transform a high-level query into an efficient execution and produce results.

  1. Parsing and Translation (Syntactic analysis): The SQL query is scanned, parsed, and checked for syntactic and semantic correctness (valid relation/attribute names, types). It is then translated into an internal form, usually a relational algebra expression / query tree.

  2. Optimization: The query optimizer generates alternative execution plans (different join orders, access methods, algorithms) and uses heuristic rules and/or cost estimates (based on statistics in the system catalog) to select the cheapest plan.

  3. Evaluation / Execution: The execution engine runs the chosen plan — invoking the physical operators (table scan, index scan, sort, hash join, etc.) — against the stored data and returns the result to the user.

Diagram (described): Query → [Parser & Translator] → relational-algebra expression → [Optimizer + Statistics/Catalog] → execution plan → [Evaluation Engine + Data] → Result.

query-processing
8short5 marks

Why is query optimization essential in databases? What is heuristic optimization?

Why Query Optimization is Essential

A single SQL query can usually be executed by many different equivalent plans (different join orders, join algorithms, and access paths), and their costs can differ by orders of magnitude. Query optimization chooses an efficient plan so as to minimize resource usage — disk I/O, CPU time, memory and (in distributed systems) communication cost — thereby reducing response time. Because users write declarative queries (what, not how), it is the DBMS's job to find a good execution strategy; without optimization, query performance over large databases would be unacceptable.

Heuristic Optimization

Heuristic (rule-based) optimization improves a query by applying transformation rules to the relational-algebra query tree, regardless of exact data statistics. It follows general guidelines such as:

  • Push selections (σ) down the tree to filter rows as early as possible.
  • Push projections (π) down to drop unneeded columns early.
  • Perform the most restrictive (smallest result) operations first.
  • Convert Cartesian products followed by selection into joins.
  • Combine a selection sequence into a single conjunctive condition.

The result is an optimized query tree that typically reduces the size of intermediate relations, lowering processing cost. It is simpler and faster than full cost-based optimization but does not guarantee the globally optimal plan.

query-optimization
9short5 marks

Define fragmentation. Explain horizontal fragmentation with an example.

Fragmentation

In a distributed database, fragmentation is the process of dividing a global relation into smaller pieces called fragments, which are stored at different sites. It improves locality of reference, parallelism, and performance. Fragmentation must satisfy:

  • Completeness: every tuple/attribute appears in some fragment.
  • Reconstruction: the original relation can be rebuilt from the fragments.
  • Disjointness: fragments do not overlap (except the key in vertical fragmentation).

Types: horizontal, vertical, and hybrid (mixed) fragmentation.

Horizontal Fragmentation

Horizontal fragmentation divides a relation by rows (tuples) using a selection (σ) predicate; each fragment contains a subset of the rows. The original relation is reconstructed using UNION (∪).

Example

Relation EMPLOYEE(EmpID, Name, Dept, City). Fragment by city:

  • EMP_KTM = σ_{City = 'Kathmandu'}(EMPLOYEE) → stored at the Kathmandu site.
  • EMP_PKR = σ_{City = 'Pokhara'}(EMPLOYEE) → stored at the Pokhara site.

Reconstruction: EMPLOYEE = EMP_KTM ∪ EMP_PKR. Each site holds and processes mostly its own local employee records, reducing network traffic.

fragmentation
10short5 marks

What are the characteristics of NoSQL systems? Explain.

Characteristics of NoSQL Systems

NoSQL (Not Only SQL) systems are non-relational data stores built for large-scale, distributed, high-availability applications. Their main characteristics are:

  • Flexible / schema-less data model: Records need not share a fixed structure; schemas can evolve dynamically (e.g., JSON documents).
  • Horizontal scalability (scale-out): Data is sharded/partitioned across many commodity servers; throughput grows by adding nodes.
  • High availability and replication: Data is automatically replicated across nodes so the system tolerates node and network failures.
  • BASE consistency model: Favours Basically Available, Soft state, Eventual consistency instead of strict ACID — guided by the CAP theorem trade-off.
  • Distributed architecture: Designed to run on clusters, often across data centres.
  • Simple / specialized query interface: Access is via APIs or simple queries rather than full SQL joins; joins are usually avoided.
  • Handles large volume and variety of data: Suited to Big Data — structured, semi-structured and unstructured.

These properties make NoSQL well suited to web-scale, real-time, and Big Data workloads where relational databases struggle to scale.

nosql
11short5 marks

What is the concept of an active database? What are triggers?

Active Database

An active database is a database system that can react automatically to events without explicit user or application requests. In addition to normal (passive) data storage, it monitors specified conditions and automatically performs actions when they occur. This reactive behaviour is expressed using ECA (Event–Condition–Action) rules:

  • Event: something that happens (e.g., an INSERT, UPDATE, or DELETE).
  • Condition: a predicate checked when the event fires.
  • Action: the operation executed if the condition is true.

Active databases are used to enforce business rules, maintain integrity constraints, derive data, log changes, and send alerts automatically.

Triggers

A trigger is the most common mechanism for implementing ECA rules — a named, stored procedural block that is automatically executed (fired) by the DBMS in response to a specified data-modification event on a table.

  • Specified as BEFORE / AFTER / INSTEAD OF an INSERT / UPDATE / DELETE.
  • Can be row-level (fires once per affected row) or statement-level.

Example:

CREATE TRIGGER log_salary_change
AFTER UPDATE OF salary ON Employee
FOR EACH ROW
BEGIN
  INSERT INTO Salary_Audit(emp_id, old_sal, new_sal, changed_on)
  VALUES (:OLD.emp_id, :OLD.salary, :NEW.salary, SYSDATE);
END;

Uses: enforcing complex constraints, auditing, automatic derivation, and maintaining consistency.

active-databasetriggers
12short5 marks

Write short notes on: (a) Big Data (b) Information Retrieval.

(a) Big Data

Big Data refers to datasets so large, fast-growing, and varied that traditional database tools cannot capture, store, manage, and analyze them efficiently. It is characterized by the 3 V's (often extended to 5):

  • Volume: enormous amounts of data (terabytes to petabytes).
  • Velocity: data generated and processed at high speed (often real-time/streaming).
  • Variety: structured, semi-structured and unstructured data (text, images, logs, sensor data).
  • (Extended) Veracity (data quality/uncertainty) and Value (usefulness extracted).

It is handled using distributed frameworks such as Hadoop (HDFS, MapReduce), Spark, and NoSQL stores, enabling analytics, machine learning and decision support.

(b) Information Retrieval

Information Retrieval (IR) is the process of finding relevant unstructured documents (mainly text) from a large collection in response to a user's query. Unlike database querying, which returns exact matches over structured data, IR returns documents ranked by relevance.

Key components and concepts:

  • Indexing: building an inverted index (term → list of documents) for fast lookup.
  • Query processing: matching query terms against the index.
  • Ranking models: Boolean, Vector Space Model with TF–IDF weighting, and probabilistic models, used to score and order results.
  • Evaluation metrics: Precision (fraction of retrieved that are relevant) and Recall (fraction of relevant that are retrieved).

Search engines (e.g., Google) are the most common application of IR.

bigdatainformation-retrieval

Frequently asked questions

Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2081?
The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2081 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Advanced Database (BSc CSIT, CSC461) 2081 paper come with solutions?
Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2081 paper?
The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?
Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.