BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2079 Nepal
This is the official BSc CSIT (TU) (Science stream) Advanced Database (BSc CSIT, CSC461) question paper for 2079, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Advanced Database (BSc CSIT, CSC461) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) exam or solving previous years' question papers, this 2079 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
Explain the Enhanced Entity-Relationship (EER) model. Discuss specialization, generalization, categorization and aggregation with suitable diagrams and examples.
Enhanced Entity-Relationship (EER) Model
The EER model extends the basic ER model with additional semantic concepts needed for complex, real-world applications (e.g., engineering design, telecommunications, GIS). It adds the concepts of subclasses/superclasses, specialization/generalization, inheritance, categories (union types), and aggregation on top of entities, attributes and relationships.
Key idea: a superclass is a general entity type; a subclass is a specialized grouping of its entities. A subclass inherits all attributes and relationships of its superclass — this is attribute and relationship inheritance.
1. Specialization (Top-Down)
Specialization is the process of defining a set of subclasses of a superclass based on some distinguishing characteristic. It is a top-down refinement.
Example: EMPLOYEE is specialized into SECRETARY, ENGINEER and TECHNICIAN based on job type. Each subclass may have its own specific attributes (e.g., TypingSpeed for SECRETARY) plus the inherited attributes (Name, SSN, Salary).
Diagram (text): EMPLOYEE at the top with a circle (∪ symbol) below it; the circle connects (via a subset symbol ⊂ on each line) downward to SECRETARY, ENGINEER, TECHNICIAN.
Constraints: disjointness (d = disjoint, an entity belongs to at most one subclass; o = overlapping, may belong to several) and completeness (total = double line, every superclass member must belong to a subclass; partial = single line).
2. Generalization (Bottom-Up)
Generalization is the reverse of specialization: several entity types sharing common features are combined into a single higher-level superclass. It is bottom-up abstraction.
Example: CAR and TRUCK both have VehicleId, Price, LicensePlate → generalized into superclass VEHICLE. The common attributes move up to VEHICLE; the specific attributes stay in CAR/TRUCK.
3. Categorization (Union Type / Category)
A category is a subclass that is a subset of the union of two or more superclasses of different entity types. It models a single relationship to a collection of distinct entity types.
Example: The owner of a registered vehicle (OWNER) can be a PERSON, a BANK, or a COMPANY. OWNER is a category formed from the union PERSON ∪ BANK ∪ COMPANY. Unlike a shared subclass, a category member inherits attributes from only one of its superclasses (selective inheritance). Represented by a circle with the ∪ symbol and a ⊂ toward the category.
4. Aggregation
Aggregation is an abstraction in which a relationship together with its participating entities is treated as a single higher-level entity, allowing a relationship to participate in another relationship. It models “relationship of a relationship.”
Example: An EMPLOYEE works-on a PROJECT (relationship). To record that a MANAGER manages this work assignment, we aggregate the works-on relationship into a single abstract entity and connect a manages relationship from MANAGER to that aggregate. In a diagram the works-on relationship and its entities are enclosed in a dashed box, treated as one unit.
Summary Table
| Concept | Direction | Result |
|---|---|---|
| Specialization | Top-down | Subclasses from a superclass |
| Generalization | Bottom-up | Superclass from entity types |
| Categorization | Union | Subclass of a union of superclasses |
| Aggregation | Abstraction | Relationship treated as an entity |
What is a data warehouse? Explain its architecture, the concept of OLAP and the star schema, and differentiate between OLTP and OLAP systems.
Data Warehouse
A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision-making (Inmon). It consolidates historical data from multiple operational (OLTP) sources into a single repository optimized for analysis and reporting rather than transaction processing.
Four defining properties:
- Subject-oriented – organized around major subjects (customer, product, sales).
- Integrated – data from heterogeneous sources cleaned to a consistent format.
- Time-variant – stores historical snapshots over long periods.
- Non-volatile – data is loaded and read, but not updated/deleted in normal use.
Architecture
Typical three-tier architecture:
- Bottom tier (Data sources + ETL + Warehouse server): Operational databases and external data are passed through an ETL process (Extract → Transform/clean → Load) into the warehouse and its metadata repository and data marts.
- Middle tier (OLAP server): A ROLAP (relational) or MOLAP (multidimensional) engine that presents data as cubes.
- Top tier (Front-end tools): Query, reporting, analysis, and data-mining tools used by analysts.
Sources → [ETL] → Data Warehouse (+ metadata, data marts) → OLAP Server → Front-end tools
OLAP (Online Analytical Processing)
OLAP enables multidimensional analysis of data using a data cube with dimensions (e.g., Time, Location, Product) and measures (e.g., Sales). Core operations:
- Roll-up – aggregate by climbing a hierarchy (city → country).
- Drill-down – the reverse, more detail (country → city).
- Slice – select one dimension value; Dice – select a sub-cube on several dimensions.
- Pivot (rotate) – reorient the cube for viewing.
Star Schema
The star schema is the simplest data-warehouse logical model: one central fact table holding measures and foreign keys, surrounded by several dimension tables.
Example: SALES_FACT(time_key, product_key, store_key, units_sold, amount) linked to dimensions TIME, PRODUCT, STORE. The diagram resembles a star — fact at center, dimensions as points. (A snowflake schema normalizes the dimension tables.)
OLTP vs OLAP
| Feature | OLTP | OLAP |
|---|---|---|
| Purpose | Day-to-day transactions | Decision support / analysis |
| Data | Current, detailed | Historical, summarized |
| Operations | INSERT/UPDATE/DELETE, short | Complex read queries |
| Schema | Normalized (3NF) | Star/Snowflake (denormalized) |
| Users | Clerks, online users | Analysts, managers |
| Access | Many short transactions | Few long complex queries |
| Design goal | High throughput, integrity | Fast query/aggregation |
What is query processing? Explain the different steps involved in query processing and optimization with the help of a block diagram. Discuss how heuristic-based optimization transforms a relational algebra query into an efficient query plan.
Query Processing
Query processing is the set of activities a DBMS performs to translate a high-level query (e.g., SQL) into an efficient sequence of low-level operations that retrieve the requested data from the database.
Steps in Query Processing
SQL Query
│
▼
[1] Parser & Translator → produces relational-algebra expression (parse tree)
│
▼
[2] Optimizer ← uses catalog statistics → chooses best execution plan
│
▼
[3] Evaluation Engine (executor) ← reads Data + Statistics
│
▼
Query Result
- Parsing & Translation – the query is scanned, parsed for syntax, and checked semantically (do tables/attributes exist?). It is translated into an internal relational algebra expression (query tree).
- Optimization – several equivalent evaluation plans are generated; using cost estimates based on catalog statistics (relation sizes, indexes, selectivity), the optimizer picks the lowest-cost plan. This combines heuristic (rule-based) and cost-based techniques.
- Evaluation/Execution – the execution engine runs the chosen plan and returns results to the user.
Heuristic (Rule-Based) Optimization
Heuristic optimization applies transformation rules to a relational-algebra query tree to produce an equivalent but cheaper tree, guided by general rules rather than exact cost computation. Main heuristics:
- Push selections (σ) down as far as possible toward the leaves — apply selection before joins to reduce intermediate tuples early.
- Push projections (π) down to discard unneeded attributes early, reducing tuple width.
- Apply the most restrictive selections / smallest relations first so joins operate on the smallest possible inputs.
- Convert Cartesian products followed by selection into joins (σ over × → ⋈).
- Combine cascades of selections/projections and reorder commutative/associative joins for efficiency.
Example
Query: names of employees in the 'Research' department.
Heuristic optimizer pushes the selection down to DEPT and the projection down, yielding the cheaper plan:
so the join processes far fewer DEPT tuples. The transformed tree is the optimized query plan passed to the execution engine.
Section B: Short Answer Questions
Attempt any EIGHT questions.
Why is hashing important to store data in databases? What is primary file organization?
Importance of hashing: Hashing provides direct (random) access to a record using a hash function h(key) that computes the disk bucket/address from the search key, giving average O(1) retrieval without scanning the file or traversing an index. This makes equality searches (key = value), insertions and deletions very fast, which is essential for large databases needing quick lookups. (Limitation: it does not efficiently support range queries.)
Primary file organization: It is the way records of a file are physically arranged on the storage medium and how they are accessed. Common primary organizations are: heap (unordered) – records placed in insertion order; sequential/ordered – records sorted on an ordering key; and hash – records placed in buckets determined by a hash function. The primary organization determines the basic access path to the records (as opposed to a secondary organization/index built additionally for alternative access).
Explain aggregation with a suitable example.
Aggregation is an abstraction (used in the EER model) in which a relationship, together with its participating entities, is treated as a single higher-level (aggregate) entity so that this aggregate can itself participate in another relationship. It overcomes the limitation that a basic ER relationship cannot connect directly to another relationship.
Example: Consider entities EMPLOYEE and PROJECT connected by the relationship works-on. Now suppose we must record which MANAGER manages a particular employee–project assignment. We aggregate the works-on relationship (with EMPLOYEE and PROJECT) into a single abstract entity and create a manages relationship from MANAGER to this aggregate.
Diagram (text): EMPLOYEE—works-on—PROJECT enclosed in a dashed rectangle (the aggregate); a line from MANAGER connects through a manages diamond to that dashed box.
Thus aggregation lets a relationship behave like an entity in another relationship.
What is the ODMG object model? What is Object Definition Language (ODL)?
ODMG Object Model: The ODMG (Object Data Management Group) model is the standard data model for object-oriented databases (OODBMS), providing a common framework so that object databases and their languages are portable across products. Its main building blocks are:
- Objects – have a unique OID (object identifier) and a mutable state; literals – have a value but no OID.
- Type / class with attributes, relationships (with inverse references), and operations (methods).
- Support for inheritance (interface inheritance
:and classextends), collections (Set, Bag, List, Array, Dictionary), and extents/keys. - A defined set of atomic, structured, and collection literal types.
The ODMG standard also specifies ODL, OQL, and language bindings (C++, Java, Smalltalk).
Object Definition Language (ODL): ODL is the data-definition language of the ODMG standard, the object-database counterpart of SQL's DDL. It is used to define the schema — the object types/classes with their attributes, relationships (and inverses), and operation signatures — independently of any programming language, so the schema is portable across ODMG-compliant systems.
class Employee (extent employees key ssn) {
attribute string name;
attribute string ssn;
relationship Department works_in inverse Department::has_emps;
void give_raise(in float amt);
};
Explain the different steps in query processing.
Steps in Query Processing
-
Parsing and Translation – The SQL query is scanned and parsed to check syntax, then semantically validated against the system catalog (do the named relations and attributes exist, are types correct?). The valid query is translated into an internal relational-algebra expression (query tree).
-
Optimization – The optimizer generates alternative equivalent execution plans for the algebra expression and, using catalog statistics (relation sizes, available indexes, selectivity) and cost estimates (heuristic + cost-based methods), selects the plan with the lowest estimated cost.
-
Evaluation (Execution) – The execution engine takes the chosen query-evaluation plan, runs its low-level operations against the stored data, and returns the result to the user.
SQL → [Parser/Translator] → Relational Algebra → [Optimizer] → Execution Plan → [Executor] → Result
These steps convert a declarative high-level query into an efficient procedural plan that retrieves the correct data quickly.
Why is query optimization essential in databases? What is heuristic optimization?
Why query optimization is essential: For any non-trivial query there are many equivalent ways to execute it (different join orders, access methods, and operator implementations), and their costs can differ by orders of magnitude. Optimization chooses an efficient execution plan that minimizes resource usage — disk I/O, CPU, memory and response time — so that queries run fast and the DBMS scales to large data and many users. Without it, the system might pick an extremely slow plan (e.g., a full Cartesian product instead of an indexed join).
Heuristic optimization: A rule-based optimization technique that transforms the query (relational-algebra tree) into an equivalent, cheaper form by applying general heuristic rules rather than computing exact costs. Key rules:
- Push selections (σ) down toward the leaves so filtering happens early and intermediate results shrink.
- Push projections (π) down to drop unneeded columns early.
- Apply the most restrictive operations first, and convert
σover a Cartesian product into a join (×+σ→⋈). - Combine cascades of selections/projections and reorder joins.
Heuristic optimization is cheap and fast, and is often applied before cost-based optimization to prune the search space.
Define fragmentation. Explain horizontal fragmentation with an example.
Fragmentation: In a distributed database, fragmentation is the process of dividing a global relation into smaller pieces (fragments) that can be stored at different sites. It improves locality of access, parallelism and availability. The three types are horizontal, vertical, and mixed (hybrid) fragmentation. Correct fragmentation must satisfy completeness, reconstruction, and disjointness.
Horizontal fragmentation: Divides a relation by rows (tuples) using a selection (σ) predicate on some attribute; each fragment contains a subset of the tuples but all the attributes. The original relation is reconstructed by the UNION (∪) of the fragments.
Example: Relation EMPLOYEE(eid, name, dept, location) fragmented by location:
EMP_1 (all Kathmandu employees) is stored at the Kathmandu site and EMP_2 at the Pokhara site, so each office mostly accesses local data. Reconstruction: .
What are the characteristics of NoSQL systems? Explain.
Characteristics of NoSQL Systems
NoSQL ("Not Only SQL") systems are non-relational data stores designed for big data, high scalability, and flexible schemas. Main characteristics:
- Scalability (horizontal): Designed to scale out across many commodity servers via sharding/partitioning, rather than scaling up a single machine.
- Distribution & replication: Data is automatically partitioned and replicated across nodes for high availability and fault tolerance.
- Schema flexibility (schema-less): No fixed table schema; records can have varying fields, suited to semi-structured/unstructured data (often JSON/document or key-value form).
- High availability with BASE / eventual consistency: Following the CAP theorem, most NoSQL stores favor Availability and Partition tolerance, offering BASE (Basically Available, Soft state, Eventual consistency) instead of strict ACID.
- Simple, non-relational query/access: Access via APIs/keys rather than full SQL joins; queries are simpler and optimized for performance.
- Data-model variety: Four common categories — key-value (Redis, DynamoDB), document (MongoDB), column-family (Cassandra, HBase), and graph (Neo4j) stores.
These features make NoSQL suitable for web-scale, rapidly changing, and very large data workloads where rigid relational schemas and vertical scaling are limiting.
What is the concept of an active database? What are triggers?
Active Database: An active database is a database system that can react automatically to specified events/conditions in the database without explicit user request, by executing predefined actions. It extends a passive DBMS with active rules — typically expressed as ECA (Event–Condition–Action) rules: when an Event occurs, if a Condition is true, then perform the Action. This is used to enforce business rules, integrity constraints, alerts, and automatic derivations (e.g., automatically updating a stock total when a sale is recorded).
Triggers: A trigger is the most common implementation of an active rule — a named database object that is automatically fired by the DBMS in response to a data-modification event (INSERT, UPDATE, or DELETE) on a table. It follows the ECA model: the event is the DML operation, the optional condition is a WHEN clause, and the action is the SQL body, executed BEFORE/AFTER the event and at row or statement level.
CREATE TRIGGER check_salary
BEFORE INSERT ON Employee
FOR EACH ROW
WHEN (NEW.salary < 0)
BEGIN
RAISE_APPLICATION_ERROR(-20001, 'Salary cannot be negative');
END;
Write short notes on: (a) Big Data (b) Information Retrieval.
(a) Big Data
Big Data refers to data sets that are so large, fast, and varied that traditional database tools cannot capture, store, manage, or analyze them efficiently. It is commonly characterized by the 5 V's:
- Volume – enormous amounts of data (terabytes to petabytes).
- Velocity – data generated and processed at high speed (often real-time/streaming).
- Variety – structured, semi-structured and unstructured data (text, images, logs, video).
- Veracity – uncertainty/quality of the data.
- Value – useful insight extracted from the data.
Big Data is handled by distributed frameworks such as Hadoop (HDFS + MapReduce), Spark, and NoSQL stores, enabling parallel processing across clusters.
(b) Information Retrieval (IR)
Information Retrieval is the process of finding relevant documents/information (usually unstructured text) from a large collection in response to a user's query. Unlike database querying (which returns exact matches on structured data), IR returns ranked, relevant results based on similarity. Key elements:
- Indexing – building an inverted index mapping terms to documents.
- Query processing & ranking – using models such as the Boolean and Vector Space Model with TF-IDF weighting and cosine similarity.
- Evaluation – measured by precision and recall.
Web search engines (e.g., Google) are the largest application of IR.
Frequently asked questions
- Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2079?
- The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Advanced Database (BSc CSIT, CSC461) 2079 paper come with solutions?
- Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2079 paper?
- The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?
- Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.