BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2079 Nepal

Q: Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2079?

The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2079 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Advanced Database (BSc CSIT, CSC461) 2079 paper come with solutions?

Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2079 paper?

The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?

Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.

Question

1Long answer10 marks

Explain the Enhanced Entity-Relationship (EER) model. Discuss specialization, generalization, categorization and aggregation with suitable diagrams and examples.

extended-erer-model

Answer 1

Enhanced Entity-Relationship (EER) Model

The EER model extends the basic ER model with additional semantic concepts needed for complex, real-world applications (e.g., engineering design, telecommunications, GIS). It adds the concepts of subclasses/superclasses, specialization/generalization, inheritance, categories (union types), and aggregation on top of entities, attributes and relationships.

Key idea: a superclass is a general entity type; a subclass is a specialized grouping of its entities. A subclass inherits all attributes and relationships of its superclass — this is attribute and relationship inheritance.

1. Specialization (Top-Down)

Specialization is the process of defining a set of subclasses of a superclass based on some distinguishing characteristic. It is a top-down refinement.

Example: EMPLOYEE is specialized into SECRETARY, ENGINEER and TECHNICIAN based on job type. Each subclass may have its own specific attributes (e.g., TypingSpeed for SECRETARY) plus the inherited attributes (Name, SSN, Salary).

Diagram (text): EMPLOYEE at the top with a circle (∪ symbol) below it; the circle connects (via a subset symbol ⊂ on each line) downward to SECRETARY, ENGINEER, TECHNICIAN.

Constraints: disjointness (d = disjoint, an entity belongs to at most one subclass; o = overlapping, may belong to several) and completeness (total = double line, every superclass member must belong to a subclass; partial = single line).

2. Generalization (Bottom-Up)

Generalization is the reverse of specialization: several entity types sharing common features are combined into a single higher-level superclass. It is bottom-up abstraction.

Example: CAR and TRUCK both have VehicleId, Price, LicensePlate → generalized into superclass VEHICLE. The common attributes move up to VEHICLE; the specific attributes stay in CAR/TRUCK.

3. Categorization (Union Type / Category)

A category is a subclass that is a subset of the union of two or more superclasses of different entity types. It models a single relationship to a collection of distinct entity types.

Example: The owner of a registered vehicle (OWNER) can be a PERSON, a BANK, or a COMPANY. OWNER is a category formed from the union PERSON ∪ BANK ∪ COMPANY. Unlike a shared subclass, a category member inherits attributes from only one of its superclasses (selective inheritance). Represented by a circle with the ∪ symbol and a ⊂ toward the category.

4. Aggregation

Aggregation is an abstraction in which a relationship together with its participating entities is treated as a single higher-level entity, allowing a relationship to participate in another relationship. It models “relationship of a relationship.”

Example: An EMPLOYEE works-on a PROJECT (relationship). To record that a MANAGER manages this work assignment, we aggregate the works-on relationship into a single abstract entity and connect a manages relationship from MANAGER to that aggregate. In a diagram the works-on relationship and its entities are enclosed in a dashed box, treated as one unit.

Summary Table

Concept	Direction	Result
Specialization	Top-down	Subclasses from a superclass
Generalization	Bottom-up	Superclass from entity types
Categorization	Union	Subclass of a union of superclasses
Aggregation	Abstraction	Relationship treated as an entity

Answer 2

Data Warehouse

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision-making (Inmon). It consolidates historical data from multiple operational (OLTP) sources into a single repository optimized for analysis and reporting rather than transaction processing.

Four defining properties:

Subject-oriented – organized around major subjects (customer, product, sales).
Integrated – data from heterogeneous sources cleaned to a consistent format.
Time-variant – stores historical snapshots over long periods.
Non-volatile – data is loaded and read, but not updated/deleted in normal use.

Architecture

Typical three-tier architecture:

Bottom tier (Data sources + ETL + Warehouse server): Operational databases and external data are passed through an ETL process (Extract → Transform/clean → Load) into the warehouse and its metadata repository and data marts.
Middle tier (OLAP server): A ROLAP (relational) or MOLAP (multidimensional) engine that presents data as cubes.
Top tier (Front-end tools): Query, reporting, analysis, and data-mining tools used by analysts.

Sources → [ETL] → Data Warehouse (+ metadata, data marts) → OLAP Server → Front-end tools

OLAP (Online Analytical Processing)

OLAP enables multidimensional analysis of data using a data cube with dimensions (e.g., Time, Location, Product) and measures (e.g., Sales). Core operations:

Roll-up – aggregate by climbing a hierarchy (city → country).
Drill-down – the reverse, more detail (country → city).
Slice – select one dimension value; Dice – select a sub-cube on several dimensions.
Pivot (rotate) – reorient the cube for viewing.

Star Schema

The star schema is the simplest data-warehouse logical model: one central fact table holding measures and foreign keys, surrounded by several dimension tables.

Example: SALES_FACT(time_key, product_key, store_key, units_sold, amount) linked to dimensions TIME, PRODUCT, STORE. The diagram resembles a star — fact at center, dimensions as points. (A snowflake schema normalizes the dimension tables.)

OLTP vs OLAP

Feature	OLTP	OLAP
Purpose	Day-to-day transactions	Decision support / analysis
Data	Current, detailed	Historical, summarized
Operations	INSERT/UPDATE/DELETE, short	Complex read queries
Schema	Normalized (3NF)	Star/Snowflake (denormalized)
Users	Clerks, online users	Analysts, managers
Access	Many short transactions	Few long complex queries
Design goal	High throughput, integrity	Fast query/aggregation

Answer 3

Query Processing

Query processing is the set of activities a DBMS performs to translate a high-level query (e.g., SQL) into an efficient sequence of low-level operations that retrieve the requested data from the database.

Steps in Query Processing

  SQL Query
      │
      ▼
[1] Parser & Translator  → produces relational-algebra expression (parse tree)
      │
      ▼
[2] Optimizer  ← uses catalog statistics → chooses best execution plan
      │
      ▼
[3] Evaluation Engine (executor)  ← reads Data + Statistics
      │
      ▼
   Query Result

Parsing & Translation – the query is scanned, parsed for syntax, and checked semantically (do tables/attributes exist?). It is translated into an internal relational algebra expression (query tree).
Optimization – several equivalent evaluation plans are generated; using cost estimates based on catalog statistics (relation sizes, indexes, selectivity), the optimizer picks the lowest-cost plan. This combines heuristic (rule-based) and cost-based techniques.
Evaluation/Execution – the execution engine runs the chosen plan and returns results to the user.

Heuristic (Rule-Based) Optimization

Heuristic optimization applies transformation rules to a relational-algebra query tree to produce an equivalent but cheaper tree, guided by general rules rather than exact cost computation. Main heuristics:

Push selections (σ) down as far as possible toward the leaves — apply selection before joins to reduce intermediate tuples early.
Push projections (π) down to discard unneeded attributes early, reducing tuple width.
Apply the most restrictive selections / smallest relations first so joins operate on the smallest possible inputs.
Convert Cartesian products followed by selection into joins (σ over × → ⋈).
Combine cascades of selections/projections and reorder commutative/associative joins for efficiency.

Example

Query: names of employees in the 'Research' department.

\pi_{name}\big(\sigma_{dname='Research'}(EMP \bowtie DEPT)\big)

Heuristic optimizer pushes the selection down to DEPT and the projection down, yielding the cheaper plan:

\pi_{name}\Big(EMP \bowtie \big(\pi_{dno}(\sigma_{dname='Research'}(DEPT))\big)\Big)

so the join processes far fewer DEPT tuples. The transformed tree is the optimized query plan passed to the execution engine.

Answer 4

Importance of hashing: Hashing provides direct (random) access to a record using a hash function h(key) that computes the disk bucket/address from the search key, giving average O(1) retrieval without scanning the file or traversing an index. This makes equality searches (key = value), insertions and deletions very fast, which is essential for large databases needing quick lookups. (Limitation: it does not efficiently support range queries.)

Primary file organization: It is the way records of a file are physically arranged on the storage medium and how they are accessed. Common primary organizations are: heap (unordered) – records placed in insertion order; sequential/ordered – records sorted on an ordering key; and hash – records placed in buckets determined by a hash function. The primary organization determines the basic access path to the records (as opposed to a secondary organization/index built additionally for alternative access).

Answer 5

Aggregation is an abstraction (used in the EER model) in which a relationship, together with its participating entities, is treated as a single higher-level (aggregate) entity so that this aggregate can itself participate in another relationship. It overcomes the limitation that a basic ER relationship cannot connect directly to another relationship.

Example: Consider entities EMPLOYEE and PROJECT connected by the relationship works-on. Now suppose we must record which MANAGER manages a particular employee–project assignment. We aggregate the works-on relationship (with EMPLOYEE and PROJECT) into a single abstract entity and create a manages relationship from MANAGER to this aggregate.

Diagram (text): EMPLOYEE—works-on—PROJECT enclosed in a dashed rectangle (the aggregate); a line from MANAGER connects through a manages diamond to that dashed box.

Thus aggregation lets a relationship behave like an entity in another relationship.

Answer 6

ODMG Object Model: The ODMG (Object Data Management Group) model is the standard data model for object-oriented databases (OODBMS), providing a common framework so that object databases and their languages are portable across products. Its main building blocks are:

Objects – have a unique OID (object identifier) and a mutable state; literals – have a value but no OID.
Type / class with attributes, relationships (with inverse references), and operations (methods).
Support for inheritance (interface inheritance : and class extends), collections (Set, Bag, List, Array, Dictionary), and extents/keys.
A defined set of atomic, structured, and collection literal types.

The ODMG standard also specifies ODL, OQL, and language bindings (C++, Java, Smalltalk).

Object Definition Language (ODL): ODL is the data-definition language of the ODMG standard, the object-database counterpart of SQL's DDL. It is used to define the schema — the object types/classes with their attributes, relationships (and inverses), and operation signatures — independently of any programming language, so the schema is portable across ODMG-compliant systems.

class Employee (extent employees key ssn) {
  attribute string  name;
  attribute string  ssn;
  relationship Department works_in inverse Department::has_emps;
  void give_raise(in float amt);
};

Answer 7

Steps in Query Processing

Parsing and Translation – The SQL query is scanned and parsed to check syntax, then semantically validated against the system catalog (do the named relations and attributes exist, are types correct?). The valid query is translated into an internal relational-algebra expression (query tree).
Optimization – The optimizer generates alternative equivalent execution plans for the algebra expression and, using catalog statistics (relation sizes, available indexes, selectivity) and cost estimates (heuristic + cost-based methods), selects the plan with the lowest estimated cost.
Evaluation (Execution) – The execution engine takes the chosen query-evaluation plan, runs its low-level operations against the stored data, and returns the result to the user.

SQL → [Parser/Translator] → Relational Algebra → [Optimizer] → Execution Plan → [Executor] → Result

These steps convert a declarative high-level query into an efficient procedural plan that retrieves the correct data quickly.

Answer 8

Why query optimization is essential: For any non-trivial query there are many equivalent ways to execute it (different join orders, access methods, and operator implementations), and their costs can differ by orders of magnitude. Optimization chooses an efficient execution plan that minimizes resource usage — disk I/O, CPU, memory and response time — so that queries run fast and the DBMS scales to large data and many users. Without it, the system might pick an extremely slow plan (e.g., a full Cartesian product instead of an indexed join).

Heuristic optimization: A rule-based optimization technique that transforms the query (relational-algebra tree) into an equivalent, cheaper form by applying general heuristic rules rather than computing exact costs. Key rules:

Push selections (σ) down toward the leaves so filtering happens early and intermediate results shrink.
Push projections (π) down to drop unneeded columns early.
Apply the most restrictive operations first, and convert σ over a Cartesian product into a join (× + σ → ⋈).
Combine cascades of selections/projections and reorder joins.

Heuristic optimization is cheap and fast, and is often applied before cost-based optimization to prune the search space.

Answer 9

Fragmentation: In a distributed database, fragmentation is the process of dividing a global relation into smaller pieces (fragments) that can be stored at different sites. It improves locality of access, parallelism and availability. The three types are horizontal, vertical, and mixed (hybrid) fragmentation. Correct fragmentation must satisfy completeness, reconstruction, and disjointness.

Horizontal fragmentation: Divides a relation by rows (tuples) using a selection (σ) predicate on some attribute; each fragment contains a subset of the tuples but all the attributes. The original relation is reconstructed by the UNION (∪) of the fragments.

Example: Relation EMPLOYEE(eid, name, dept, location) fragmented by location:

EMP_1 = \sigma_{location='Kathmandu'}(EMPLOYEE)

EMP_2 = \sigma_{location='Pokhara'}(EMPLOYEE)

EMP_1 (all Kathmandu employees) is stored at the Kathmandu site and EMP_2 at the Pokhara site, so each office mostly accesses local data. Reconstruction: $EMPLOYEE = EMP_1 \cup EMP_2$ .

Answer 10

Characteristics of NoSQL Systems

NoSQL ("Not Only SQL") systems are non-relational data stores designed for big data, high scalability, and flexible schemas. Main characteristics:

Scalability (horizontal): Designed to scale out across many commodity servers via sharding/partitioning, rather than scaling up a single machine.
Distribution & replication: Data is automatically partitioned and replicated across nodes for high availability and fault tolerance.
Schema flexibility (schema-less): No fixed table schema; records can have varying fields, suited to semi-structured/unstructured data (often JSON/document or key-value form).
High availability with BASE / eventual consistency: Following the CAP theorem, most NoSQL stores favor Availability and Partition tolerance, offering BASE (Basically Available, Soft state, Eventual consistency) instead of strict ACID.
Simple, non-relational query/access: Access via APIs/keys rather than full SQL joins; queries are simpler and optimized for performance.
Data-model variety: Four common categories — key-value (Redis, DynamoDB), document (MongoDB), column-family (Cassandra, HBase), and graph (Neo4j) stores.

These features make NoSQL suitable for web-scale, rapidly changing, and very large data workloads where rigid relational schemas and vertical scaling are limiting.

Answer 11

Active Database: An active database is a database system that can react automatically to specified events/conditions in the database without explicit user request, by executing predefined actions. It extends a passive DBMS with active rules — typically expressed as ECA (Event–Condition–Action) rules: when an Event occurs, if a Condition is true, then perform the Action. This is used to enforce business rules, integrity constraints, alerts, and automatic derivations (e.g., automatically updating a stock total when a sale is recorded).

Triggers: A trigger is the most common implementation of an active rule — a named database object that is automatically fired by the DBMS in response to a data-modification event (INSERT, UPDATE, or DELETE) on a table. It follows the ECA model: the event is the DML operation, the optional condition is a WHEN clause, and the action is the SQL body, executed BEFORE/AFTER the event and at row or statement level.

CREATE TRIGGER check_salary
BEFORE INSERT ON Employee
FOR EACH ROW
WHEN (NEW.salary < 0)
BEGIN
  RAISE_APPLICATION_ERROR(-20001, 'Salary cannot be negative');
END;

Answer 12

(a) Big Data

Big Data refers to data sets that are so large, fast, and varied that traditional database tools cannot capture, store, manage, or analyze them efficiently. It is commonly characterized by the 5 V's:

Volume – enormous amounts of data (terabytes to petabytes).
Velocity – data generated and processed at high speed (often real-time/streaming).
Variety – structured, semi-structured and unstructured data (text, images, logs, video).
Veracity – uncertainty/quality of the data.
Value – useful insight extracted from the data.

Big Data is handled by distributed frameworks such as Hadoop (HDFS + MapReduce), Spark, and NoSQL stores, enabling parallel processing across clusters.

(b) Information Retrieval (IR)

Information Retrieval is the process of finding relevant documents/information (usually unstructured text) from a large collection in response to a user's query. Unlike database querying (which returns exact matches on structured data), IR returns ranked, relevant results based on similarity. Key elements:

Indexing – building an inverted index mapping terms to documents.
Query processing & ranking – using models such as the Boolean and Vector Space Model with TF-IDF weighting and cosine similarity.
Evaluation – measured by precision and recall.

Web search engines (e.g., Google) are the largest application of IR.

Level	BSc CSIT (TU)
Stream	Science
Subject	Advanced Database (BSc CSIT, CSC461)
Year	2079 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions