BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2081 Nepal
This is the official BSc CSIT (TU) (Science stream) Advanced Database (BSc CSIT, CSC461) question paper for 2081, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Advanced Database (BSc CSIT, CSC461) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) exam or solving previous years' question papers, this 2081 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
What is a NoSQL database? Explain the characteristics of NoSQL systems and discuss the CAP theorem. Compare document-based, key-value, column-based and graph-based NoSQL data stores.
NoSQL Databases
A NoSQL (Not Only SQL) database is a non-relational data store designed to handle large volumes of structured, semi-structured and unstructured data with high scalability and availability. Unlike relational databases, NoSQL systems do not require a fixed tabular schema and typically avoid costly join operations, making them well suited to Big Data and real-time web applications.
Characteristics of NoSQL Systems
- Schema-less / flexible schema: Records can have different fields; the structure can evolve without altering existing data.
- Horizontal scalability: Data is distributed (sharded) across many commodity servers; capacity is increased by adding nodes (scale-out) rather than upgrading one machine (scale-up).
- High availability and replication: Data is replicated across nodes to tolerate failures.
- BASE rather than ACID: Most systems favour Basically Available, Soft state, Eventual consistency over strict ACID transactions.
- Distributed and partition-tolerant: Built to run on clusters spanning data centres.
- Simple query interface / API-based access instead of full SQL (though many add SQL-like layers).
CAP Theorem
The CAP theorem (Brewer's theorem) states that a distributed data store can simultaneously provide at most two of the following three guarantees:
- Consistency (C): Every read returns the most recent write (all nodes see the same data).
- Availability (A): Every request receives a (non-error) response, even if some nodes fail.
- Partition tolerance (P): The system keeps operating despite network partitions (lost/dropped messages between nodes).
Since network partitions are unavoidable in distributed systems, P must be tolerated, so a real system must choose between CP (sacrifice availability, e.g. HBase, MongoDB in strong mode) and AP (sacrifice strong consistency, e.g. Cassandra, DynamoDB).
Comparison of NoSQL Data Stores
| Type | Data Model | Best For | Example |
|---|---|---|---|
| Key-Value | (key → opaque value) pairs; fast hash lookup | Caching, session stores, simple lookups | Redis, DynamoDB |
| Document | Self-describing documents (JSON/BSON/XML); nested fields, queryable | Content management, catalogues, user profiles | MongoDB, CouchDB |
| Column-based (wide-column) | Rows grouped into column families; sparse, column-oriented storage | Time-series, analytics over huge datasets | Cassandra, HBase |
| Graph | Nodes + edges with properties; traverses relationships | Social networks, recommendation, fraud detection | Neo4j, JanusGraph |
Summary: Key-value stores are the simplest and fastest but least expressive; document stores add rich query over flexible records; column stores excel at write-heavy, large-scale analytics; graph stores are unmatched for highly connected, relationship-centric data.
Explain the Enhanced Entity-Relationship (EER) model. Discuss specialization, generalization, categorization and aggregation with suitable diagrams and examples.
Enhanced Entity-Relationship (EER) Model
The EER model extends the basic ER model with additional semantic concepts needed to model complex applications. It adds specialization, generalization, categorization (union types), aggregation, and the notions of subclass/superclass with attribute and relationship inheritance. A subclass inherits all attributes and relationships of its superclass.
1. Specialization (Top-Down)
The process of defining a set of subclasses of a superclass based on some distinguishing characteristic.
- Example: Entity
EMPLOYEEis specialized intoSECRETARY,ENGINEER, andMANAGERbased on job type. - Diagram (described): A superclass box
EMPLOYEEconnects through a circle (with subset symbol⊂on the lines) down to the three subclass boxes. - Constraints: disjoint (d) vs overlapping (o), and total (double line) vs partial (single line) participation.
2. Generalization (Bottom-Up)
The reverse of specialization: several entities sharing common attributes are combined into a higher-level generalized superclass.
- Example:
CARandTRUCK, both having attributes likeVehicleID,Price, are generalized into superclassVEHICLE.
3. Categorization (Union Type)
A category is a subclass whose members are a union of entities from several different superclasses (each of possibly different type).
- Example:
OWNERis a category that is the union ofPERSON,BANK, andCOMPANY, because a vehicle owner may be any of these. - Diagram (described): A circle marked with
∪(union) connecting the three superclasses down to theOWNERsubclass.
4. Aggregation
Aggregation treats a relationship (and its participating entities) as a higher-level entity, allowing a relationship to participate in another relationship.
- Example: The relationship
WORKS_ONbetweenEMPLOYEEandPROJECTis aggregated, and this aggregate then relates viaUSESto entityMACHINERY(i.e., an employee working on a project uses certain machinery). - Diagram (described): A dashed box drawn around
EMPLOYEE–WORKS_ON–PROJECT, with a relationship line from that box toMACHINERY.
Summary
| Concept | Direction | Idea |
|---|---|---|
| Specialization | Top-down | Split superclass into subclasses |
| Generalization | Bottom-up | Merge entities into superclass |
| Categorization | Union | Subclass = union of distinct superclasses |
| Aggregation | Abstraction | Treat a relationship as an entity |
What is a data warehouse? Explain its architecture, the concept of OLAP and the star schema, and differentiate between OLTP and OLAP systems.
Data Warehouse
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data (Inmon's definition) used to support management decision-making. It consolidates historical data from multiple operational sources into a single repository optimized for analysis and reporting rather than transaction processing.
Architecture
A typical three-tier architecture:
- Bottom tier (Data sources + ETL + Warehouse server): Operational databases and external sources feed the ETL process — Extract, Transform (clean, integrate), Load — into the warehouse and data marts. Metadata is also stored here.
- Middle tier (OLAP server): A ROLAP (relational) or MOLAP (multidimensional) engine that presents data as multidimensional cubes.
- Top tier (Front-end tools): Query, reporting, analysis, and data-mining tools used by analysts.
OLAP (Online Analytical Processing)
OLAP enables fast, multidimensional analysis of data organized as cubes (measures viewed across dimensions). Core operations:
- Roll-up: aggregate to a higher level (day → month → year).
- Drill-down: the reverse, view finer detail.
- Slice: fix one dimension to a value.
- Dice: select a sub-cube on multiple dimensions.
- Pivot (rotate): reorient the cube to view another perspective.
Star Schema
The star schema has a central fact table containing numeric measures and foreign keys, surrounded by denormalized dimension tables.
DIM_TIME DIM_PRODUCT
\ /
FACT_SALES (qty, amount, time_id, product_id, store_id, customer_id)
/ \
DIM_STORE DIM_CUSTOMER
When dimension tables are normalized into sub-dimensions it becomes a snowflake schema.
OLTP vs OLAP
| Feature | OLTP | OLAP |
|---|---|---|
| Purpose | Day-to-day transactions | Analysis / decision support |
| Data | Current, detailed | Historical, summarized |
| Operations | Insert/Update/Delete (write-heavy) | Mostly complex reads |
| Schema | Normalized (3NF) | Star / snowflake (denormalized) |
| Queries | Simple, short | Complex, long-running, aggregate |
| Users | Clerks, online users | Analysts, managers |
| Design goal | Throughput, integrity | Query performance |
Section B: Short Answer Questions
Attempt any EIGHT questions.
Why is hashing important to store data in databases? What is primary file organization?
Importance of Hashing
Hashing applies a hash function to a record's key to compute the address (bucket) where the record is stored. It is important because it allows direct (random) access to a record in approximately O(1) average time, without scanning the file or traversing an index tree. This makes equality searches (key = value) extremely fast and is ideal for primary key lookups and large files. Collisions are handled by methods such as open addressing or chaining/overflow buckets.
Primary File Organization
Primary file organization determines the physical placement of records on disk and how they are accessed. Common types are:
- Heap (unordered): records placed in insertion order — fast insert, slow search.
- Sequential (ordered): records ordered by a field — efficient ordered/range access.
- Hash: records placed using a hash function on a hash key — fast equality access.
- B-tree / clustered: tree-structured ordering for efficient search and range queries.
It differs from a secondary organization (index), which provides an alternative access path without changing the primary placement of the records.
Explain aggregation with a suitable example.
Aggregation
Aggregation is an abstraction in the (E)ER model in which a relationship together with its participating entities is treated as a single higher-level entity, so that the relationship itself can participate in another relationship. It is used when we need to model a relationship between a relationship and an entity, which the basic ER model cannot express directly.
Example
Consider entities EMPLOYEE and PROJECT linked by the relationship WORKS_ON. We now want to record which MACHINERY is used by an employee while working on a project.
- We aggregate
EMPLOYEE–WORKS_ON–PROJECTinto one abstract entity. - This aggregate participates in a new relationship
USESwith the entityMACHINERY.
Diagram (described): Draw a dashed rectangle enclosing EMPLOYEE, the diamond WORKS_ON, and PROJECT. From this dashed box, draw a line to the diamond USES, which connects to the MACHINERY entity box.
This avoids the redundancy and loss of meaning that would occur if MACHINERY were related to EMPLOYEE and PROJECT separately.
What is the ODMG object model? What is Object Definition Language (ODL)?
ODMG Object Model
The ODMG (Object Data Management Group) object model is the standard data model proposed by the ODMG consortium for object-oriented databases (OODBMS). It defines a common framework so that object databases from different vendors can be portable and interoperable. Key features:
- Basic building blocks are objects (have an OID and mutable state) and literals (values without an OID).
- Supports atomic, collection (set, bag, list, array), and structured types.
- Provides classes, inheritance, attributes, relationships, and methods (behaviour).
- Defines three component languages: ODL (schema), OQL (queries), and OML language bindings (C++, Java, Smalltalk).
Object Definition Language (ODL)
ODL is the schema-definition language of the ODMG standard, analogous to the DDL of relational systems. It is used to define the interfaces (classes), their attributes, relationships, and method signatures independently of any programming language, so the schema can be mapped to multiple language bindings.
Example:
class Employee (extent employees key empId) {
attribute int empId;
attribute string name;
relationship Department worksIn inverse Department::staff;
void giveRaise(in float amt);
};
Explain the different steps in query processing.
Steps in Query Processing
Query processing is the sequence of activities that transform a high-level query into an efficient execution and produce results.
-
Parsing and Translation (Syntactic analysis): The SQL query is scanned, parsed, and checked for syntactic and semantic correctness (valid relation/attribute names, types). It is then translated into an internal form, usually a relational algebra expression / query tree.
-
Optimization: The query optimizer generates alternative execution plans (different join orders, access methods, algorithms) and uses heuristic rules and/or cost estimates (based on statistics in the system catalog) to select the cheapest plan.
-
Evaluation / Execution: The execution engine runs the chosen plan — invoking the physical operators (table scan, index scan, sort, hash join, etc.) — against the stored data and returns the result to the user.
Diagram (described): Query → [Parser & Translator] → relational-algebra expression → [Optimizer + Statistics/Catalog] → execution plan → [Evaluation Engine + Data] → Result.
Why is query optimization essential in databases? What is heuristic optimization?
Why Query Optimization is Essential
A single SQL query can usually be executed by many different equivalent plans (different join orders, join algorithms, and access paths), and their costs can differ by orders of magnitude. Query optimization chooses an efficient plan so as to minimize resource usage — disk I/O, CPU time, memory and (in distributed systems) communication cost — thereby reducing response time. Because users write declarative queries (what, not how), it is the DBMS's job to find a good execution strategy; without optimization, query performance over large databases would be unacceptable.
Heuristic Optimization
Heuristic (rule-based) optimization improves a query by applying transformation rules to the relational-algebra query tree, regardless of exact data statistics. It follows general guidelines such as:
- Push selections (σ) down the tree to filter rows as early as possible.
- Push projections (π) down to drop unneeded columns early.
- Perform the most restrictive (smallest result) operations first.
- Convert Cartesian products followed by selection into joins.
- Combine a selection sequence into a single conjunctive condition.
The result is an optimized query tree that typically reduces the size of intermediate relations, lowering processing cost. It is simpler and faster than full cost-based optimization but does not guarantee the globally optimal plan.
Define fragmentation. Explain horizontal fragmentation with an example.
Fragmentation
In a distributed database, fragmentation is the process of dividing a global relation into smaller pieces called fragments, which are stored at different sites. It improves locality of reference, parallelism, and performance. Fragmentation must satisfy:
- Completeness: every tuple/attribute appears in some fragment.
- Reconstruction: the original relation can be rebuilt from the fragments.
- Disjointness: fragments do not overlap (except the key in vertical fragmentation).
Types: horizontal, vertical, and hybrid (mixed) fragmentation.
Horizontal Fragmentation
Horizontal fragmentation divides a relation by rows (tuples) using a selection (σ) predicate; each fragment contains a subset of the rows. The original relation is reconstructed using UNION (∪).
Example
Relation EMPLOYEE(EmpID, Name, Dept, City). Fragment by city:
EMP_KTM = σ_{City = 'Kathmandu'}(EMPLOYEE)→ stored at the Kathmandu site.EMP_PKR = σ_{City = 'Pokhara'}(EMPLOYEE)→ stored at the Pokhara site.
Reconstruction: EMPLOYEE = EMP_KTM ∪ EMP_PKR. Each site holds and processes mostly its own local employee records, reducing network traffic.
What are the characteristics of NoSQL systems? Explain.
Characteristics of NoSQL Systems
NoSQL (Not Only SQL) systems are non-relational data stores built for large-scale, distributed, high-availability applications. Their main characteristics are:
- Flexible / schema-less data model: Records need not share a fixed structure; schemas can evolve dynamically (e.g., JSON documents).
- Horizontal scalability (scale-out): Data is sharded/partitioned across many commodity servers; throughput grows by adding nodes.
- High availability and replication: Data is automatically replicated across nodes so the system tolerates node and network failures.
- BASE consistency model: Favours Basically Available, Soft state, Eventual consistency instead of strict ACID — guided by the CAP theorem trade-off.
- Distributed architecture: Designed to run on clusters, often across data centres.
- Simple / specialized query interface: Access is via APIs or simple queries rather than full SQL joins; joins are usually avoided.
- Handles large volume and variety of data: Suited to Big Data — structured, semi-structured and unstructured.
These properties make NoSQL well suited to web-scale, real-time, and Big Data workloads where relational databases struggle to scale.
What is the concept of an active database? What are triggers?
Active Database
An active database is a database system that can react automatically to events without explicit user or application requests. In addition to normal (passive) data storage, it monitors specified conditions and automatically performs actions when they occur. This reactive behaviour is expressed using ECA (Event–Condition–Action) rules:
- Event: something that happens (e.g., an
INSERT,UPDATE, orDELETE). - Condition: a predicate checked when the event fires.
- Action: the operation executed if the condition is true.
Active databases are used to enforce business rules, maintain integrity constraints, derive data, log changes, and send alerts automatically.
Triggers
A trigger is the most common mechanism for implementing ECA rules — a named, stored procedural block that is automatically executed (fired) by the DBMS in response to a specified data-modification event on a table.
- Specified as BEFORE / AFTER / INSTEAD OF an INSERT / UPDATE / DELETE.
- Can be row-level (fires once per affected row) or statement-level.
Example:
CREATE TRIGGER log_salary_change
AFTER UPDATE OF salary ON Employee
FOR EACH ROW
BEGIN
INSERT INTO Salary_Audit(emp_id, old_sal, new_sal, changed_on)
VALUES (:OLD.emp_id, :OLD.salary, :NEW.salary, SYSDATE);
END;
Uses: enforcing complex constraints, auditing, automatic derivation, and maintaining consistency.
Write short notes on: (a) Big Data (b) Information Retrieval.
(a) Big Data
Big Data refers to datasets so large, fast-growing, and varied that traditional database tools cannot capture, store, manage, and analyze them efficiently. It is characterized by the 3 V's (often extended to 5):
- Volume: enormous amounts of data (terabytes to petabytes).
- Velocity: data generated and processed at high speed (often real-time/streaming).
- Variety: structured, semi-structured and unstructured data (text, images, logs, sensor data).
- (Extended) Veracity (data quality/uncertainty) and Value (usefulness extracted).
It is handled using distributed frameworks such as Hadoop (HDFS, MapReduce), Spark, and NoSQL stores, enabling analytics, machine learning and decision support.
(b) Information Retrieval
Information Retrieval (IR) is the process of finding relevant unstructured documents (mainly text) from a large collection in response to a user's query. Unlike database querying, which returns exact matches over structured data, IR returns documents ranked by relevance.
Key components and concepts:
- Indexing: building an inverted index (term → list of documents) for fast lookup.
- Query processing: matching query terms against the index.
- Ranking models: Boolean, Vector Space Model with TF–IDF weighting, and probabilistic models, used to score and order results.
- Evaluation metrics: Precision (fraction of retrieved that are relevant) and Recall (fraction of relevant that are retrieved).
Search engines (e.g., Google) are the most common application of IR.
Frequently asked questions
- Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2081?
- The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2081 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Advanced Database (BSc CSIT, CSC461) 2081 paper come with solutions?
- Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2081 paper?
- The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?
- Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.