BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2077 Nepal

Q: Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2077?

The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2077 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Advanced Database (BSc CSIT, CSC461) 2077 paper come with solutions?

Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2077 paper?

The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2077 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?

Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is a data warehouse? Explain its architecture, the concept of OLAP and the star schema, and differentiate between OLTP and OLAP systems.

data-warehousedata-mining

Answer 1

Data Warehouse

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data that supports management decision making. It consolidates historical data from multiple operational sources into a single repository optimized for analysis and reporting rather than transaction processing.

Key properties (Inmon's definition):

Subject-oriented – organized around major subjects (customer, product, sales) rather than applications.
Integrated – data from heterogeneous sources is cleaned and made consistent.
Time-variant – stores historical snapshots; every record carries a time element.
Non-volatile – data is loaded and read, but not updated/deleted by end users.

Architecture

A typical three-tier architecture:

Bottom tier (Data sources + Staging/ETL + Warehouse server) – operational databases and external files. An ETL process (Extract → Transform → Load) cleans, integrates and loads data into the warehouse and its metadata repository.
Middle tier (OLAP server) – a ROLAP (relational) or MOLAP (multidimensional) engine that presents the data as data cubes for fast multidimensional queries.
Top tier (Front-end tools) – query, reporting, analysis and data-mining tools used by analysts.

Data marts (departmental subsets) are often built on top of the central warehouse.

OLAP (Online Analytical Processing)

OLAP enables interactive, multidimensional analysis of summarized data using a data cube with dimensions (e.g., time, location, product) and measures (e.g., sales). Core OLAP operations:

Roll-up – aggregate to a higher level (city → country).
Drill-down – go to finer detail (country → city).
Slice – select one dimension value to get a sub-cube.
Dice – select values on several dimensions.
Pivot (rotate) – reorient the cube for a different view.

Star Schema

The star schema is the most common dimensional model. It has one central fact table (containing measures and foreign keys) surrounded by several dimension tables, forming a star shape.

            +-----------+
            |  Time dim |
            +-----------+
                  |
+--------+   +-----------+   +-----------+
|Product |---|  SALES    |---| Location  |
| dim    |   |  (fact)   |   |  dim      |
+--------+   +-----------+   +-----------+
                  |
            +-----------+
            | Customer  |
            +-----------+

The fact table (SALES) holds measures such as amount, quantity plus foreign keys to each dimension. Dimensions are denormalized for fast joins. (When dimensions are normalized into sub-dimensions it becomes a snowflake schema.)

OLTP vs OLAP

Feature	OLTP	OLAP
Purpose	Day-to-day transaction processing	Decision support / analysis
Data	Current, detailed	Historical, summarized
Operations	Short read/write (INSERT, UPDATE, DELETE)	Complex read-mostly queries
Schema	Normalized (3NF)	Dimensional (star/snowflake)
Users	Clerks, online customers (many)	Analysts, managers (few)
DB size	MB–GB	GB–TB
Access	Many short transactions	Fewer, long, ad-hoc queries
Design goal	High throughput, data integrity	Fast query response, aggregation

Answer 2

Query Processing

Query processing is the set of activities a DBMS performs to translate a high-level query (e.g., SQL) into an efficient sequence of low-level operations that retrieve the required data, execute them, and return the result.

Steps in Query Processing and Optimization

  High-level (SQL) query
          |
          v
  +--------------------+
  | 1. Scanning,       |   --> syntax & semantic check
  |    Parsing,        |       against the system catalog
  |    Validating      |
  +--------------------+
          |
          v
  +--------------------+
  | 2. Translation to  |   --> initial relational
  |    relational      |       algebra query tree
  |    algebra         |
  +--------------------+
          |
          v
  +--------------------+ <--- catalog statistics,
  | 3. Query Optimizer |      access paths, cost model
  +--------------------+
          |
          v
   Execution plan (optimized)
          |
          v
  +--------------------+
  | 4. Code generator /|
  |    runtime engine  |
  +--------------------+
          |
          v
       Query result

Parsing and translation – the query is scanned, parsed for syntax, and validated (do the relations/attributes exist?). It is then translated into an internal relational algebra expression, usually represented as a query tree.
Optimization – the optimizer generates alternative equivalent execution plans and chooses the cheapest. Two approaches: heuristic (rule-based) optimization and cost-based optimization (uses catalog statistics to estimate cost).
Evaluation/Execution – the chosen plan (with specific access methods for each operation) is executed by the query-evaluation engine, which returns the result.

Heuristic-Based Optimization

Heuristic optimization applies a set of transformation rules to the relational-algebra query tree to obtain an equivalent but more efficient tree, without computing actual costs. The main heuristic is "perform selections and projections as early as possible" to reduce intermediate result sizes. Typical steps:

Push down selections (σ) – move selection operations down the tree, closest to the base relations, so fewer tuples flow upward.
Push down projections (π) – move projections down to eliminate unneeded attributes early.
Replace Cartesian product + selection by a join (⋈) – σ_θ(R × S) becomes R ⋈_θ S.
Reorder joins – execute the most restrictive (smallest-result) joins first.
Combine/cascade selections and projections to avoid repeated passes.

Example

σ_dept='CS'(EMP ⋈ DEPT)

is transformed to push the selection onto EMP before the join:

(σ_dept='CS'(EMP)) ⋈ DEPT

so the join processes far fewer tuples — producing an efficient query plan.

Answer 3

Distributed DBMS (DDBMS)

A distributed database is a single logical database whose data is physically stored across multiple computers (sites) connected by a network. A DDBMS is the software that manages this distributed database and makes the distribution transparent to users — they see one logical database even though it is fragmented and spread over several sites.

Goals: improved reliability/availability, local autonomy, better performance through parallelism, and easier expansion.

1. Data Fragmentation

Fragmentation divides a relation into smaller pieces (fragments) that can be stored at different sites. Correctness rules: completeness (every tuple/attribute appears in some fragment), reconstruction (original relation can be rebuilt), and disjointness.

Horizontal fragmentation – splits a relation by rows using selection predicates. Example: EMP1 = σ_branch='KTM'(EMP), EMP2 = σ_branch='PKR'(EMP). Reconstruct with UNION.
Vertical fragmentation – splits by columns using projection; each fragment keeps the primary key. Example: EMP_A = π_(eid,name)(EMP), EMP_B = π_(eid,salary)(EMP). Reconstruct with JOIN on eid.
Mixed (hybrid) fragmentation – combination of horizontal and vertical fragmentation.

2. Replication

Replication stores copies of a fragment/relation at more than one site.

Full replication – every site has a complete copy (fast reads/high availability, costly updates).
Partial replication – only selected fragments are replicated where they are most needed.
No replication – each fragment stored at exactly one site.

Trade-off: replication improves availability and read performance but increases the cost and complexity of keeping copies consistent during updates.

3. Data Allocation

Allocation decides which fragment is stored at which site.

Centralized – single database at one site.
Partitioned (non-replicated) – each fragment assigned to exactly one site.
Replicated – fragments allocated with copies at multiple sites.

Allocation is driven by an analysis of query/update frequencies at each site so that data resides close to where it is most used, minimizing communication cost.

Example

A bank with branches in Kathmandu and Pokhara horizontally fragments ACCOUNT by branch, stores each fragment at its own branch site (allocation), and keeps a replica of a small BRANCH table at both sites (replication) for fast local lookups.

Answer 4

Importance of hashing: Hashing computes the disk address (bucket) of a record directly from its hash-key value using a hash function h(key), allowing a record to be located in (ideally) a single block access, i.e., average $O(1)$ retrieval. This is far faster than scanning or even index traversal for equality searches, supports very large files efficiently, and spreads records evenly to avoid hotspots. (Limitation: it does not help with range queries.)

Primary file organization: It is the way records of a file are physically arranged on disk and the method used to access them. Common primary organizations are:

Heap (unordered) – records placed in insertion order.
Sequential/Ordered – records sorted on an ordering field.
Hashed – record placement determined by a hash function on the hash field.

It determines how each record is stored and retrieved, in contrast to a secondary organization (e.g., a secondary index) that provides an alternative access path to the same data.

Answer 5

Aggregation is an abstraction in the Entity-Relationship model that treats a relationship (together with its participating entities) as a higher-level entity, so that this relationship can itself participate in another relationship. It is used when we need to model a relationship among relationships — something a plain ER diagram cannot express directly.

Example: Consider an EMPLOYEE working on a PROJECT — modeled by the relationship WORKS-ON. Now suppose each such work assignment is managed using some MACHINERY. The relationship MANAGES must connect MACHINERY to the WORKS-ON relationship, not to a single entity.

 EMPLOYEE ---<WORKS-ON>--- PROJECT
              |  (aggregated as one unit)
          <MANAGES>
              |
          MACHINERY

Here the WORKS-ON relationship set is aggregated into a single abstract entity, which then participates in the MANAGES relationship with MACHINERY. (Note: in SQL/relational algebra, aggregation also refers to summary functions like SUM, AVG, COUNT, MIN, MAX over groups of rows.)

Answer 6

ODMG Object Model: The ODMG (Object Data Management Group) model is the standard data model for object-oriented databases (OODBMS). It provides a common framework so that object databases from different vendors are portable. Its main components are:

Objects and Literals – an object has a unique OID (object identifier), state and behaviour; a literal has only a value and no OID.
Type / class with attributes, relationships (with inverse, maintaining referential integrity), and methods (operations).
Built-in collection types: Set, Bag, List, Array, Dictionary.
Support for inheritance (extends / :), encapsulation and a class extent and keys.
Standard languages: ODL (schema), OQL (query) and language bindings (C++, Java, Smalltalk).

Object Definition Language (ODL): ODL is the schema-definition language of the ODMG standard, analogous to the DDL of relational databases. It is used to specify the interface (structure) of object types — their attributes, relationships (with inverses), and operation signatures — independent of any programming language, so the same schema can be mapped to different language bindings.

class Employee (extent employees key eid) {
  attribute int    eid;
  attribute string name;
  relationship Department worksIn inverse Department::staff;
  float computeSalary();
};

Answer 7

Query processing translates a high-level (SQL) query into an executable, optimized sequence of operations and runs it. The main steps are:

Scanning, parsing and validating – the query string is tokenized (scanned), checked for correct syntax (parsed), and semantically validated against the system catalog to confirm that the referenced relations and attributes exist and are used correctly.
Translation into relational algebra – the validated query is converted into an internal relational-algebra expression, typically represented as a query tree (or query graph).
Query optimization – the optimizer generates equivalent execution plans and selects the most efficient one using heuristic (rule-based) rules and/or cost-based estimation from catalog statistics, producing an execution plan with specific access methods.
Code generation / execution – the query-evaluation engine executes the chosen plan and returns the result to the user.

SQL query → Parse & validate → Relational algebra (query tree)
          → Optimizer → Execution plan → Execution → Result

Answer 8

Why query optimization is essential: For any SQL query there are many equivalent execution strategies (different join orders, access paths, and operation sequences) whose costs can differ by orders of magnitude. Optimization is essential to:

minimize response time and the consumption of resources (disk I/O, CPU, memory, network);
choose the cheapest among many equivalent plans, since a poor plan can be thousands of times slower;
keep the system scalable as data volume grows;
relieve the user, who writes what data is needed (declarative SQL) but not how to fetch it efficiently.

Heuristic optimization: Heuristic (rule-based) optimization improves the relational-algebra query tree by applying transformation rules based on general guidelines rather than computing actual costs. The central heuristic is "apply selections and projections as early as possible" (push them down the tree) to shrink intermediate results. Other rules include converting Cartesian-product-plus-selection into joins, reordering joins to do the most restrictive first, and cascading/combining selections and projections. It is fast and avoids the overhead of full cost estimation, though it may miss the truly optimal plan.

Answer 9

Fragmentation is the process of dividing a relation in a distributed database into smaller logical pieces called fragments, which can be stored at different sites. It must satisfy three correctness conditions: completeness (every data item appears in some fragment), reconstruction (the original relation can be rebuilt from its fragments), and disjointness (no overlap, except the key in vertical fragmentation). The three types are horizontal, vertical and mixed (hybrid).

Horizontal fragmentation divides a relation by rows (tuples), selecting subsets of tuples using a selection predicate ( $\sigma$ ). Each fragment contains all attributes but only the tuples satisfying its condition. The original relation is reconstructed by taking the UNION of all horizontal fragments.

Example: Given EMP(eid, name, branch, salary) for a company with two branches:

EMP_1 = \sigma_{branch = 'Kathmandu'}(EMP)

EMP_2 = \sigma_{branch = 'Pokhara'}(EMP)

EMP1 is stored at the Kathmandu site and EMP2 at the Pokhara site, placing each branch's data where it is most used. Reconstruction: $EMP = EMP_1 \cup EMP_2$ .

Answer 10

NoSQL (Not Only SQL) systems are non-relational databases designed for large-scale, distributed storage of structured, semi-structured and unstructured data. Their main characteristics are:

Schema-less / flexible schema – records need not follow a fixed schema; fields can vary per record, easing evolution.
Horizontal scalability (scale-out) – data is partitioned (sharded) and distributed across many commodity servers rather than scaling up a single machine.
High availability and fault tolerance – data is replicated across nodes so the system keeps running despite node failures.
BASE rather than ACID – favours Basically Available, Soft state, Eventual consistency over strict ACID transactions, following the CAP theorem trade-off (often choosing Availability + Partition tolerance over strong Consistency).
Distributed, no/limited joins – queries avoid expensive cross-node joins; access is typically by key.
Variety of data models – key-value (Redis), document (MongoDB), column-family (Cassandra/HBase), and graph (Neo4j) stores.
Designed for Big Data and high write/read throughput.

Answer 11

Active database: An active database is a database system that can react automatically to events by executing predefined actions, without explicit user or application intervention. It extends a conventional (passive) DBMS with active rules that monitor the database and respond to changes, enforcing business logic and integrity reactively. Active rules follow the ECA (Event–Condition–Action) model: when an Event occurs, if a Condition holds, then an Action is executed.

Triggers: A trigger is the most common form of active rule — a named block of procedural code that is automatically executed (fired) by the DBMS in response to a specified data-modification event (INSERT, UPDATE, or DELETE) on a table. A trigger specifies the event, an optional condition (WHEN), the timing (BEFORE/AFTER) and granularity (FOR EACH ROW/statement), and the action to perform. Triggers are used to enforce complex integrity constraints, maintain derived/audit data, and automate business rules.

CREATE TRIGGER chk_salary
AFTER UPDATE ON Employee
FOR EACH ROW
WHEN (NEW.salary > 1000000)
BEGIN
   INSERT INTO Audit_Log VALUES (NEW.eid, 'High salary', SYSDATE);
END;

Answer 12

(a) Big Data

Big Data refers to datasets so large, fast-growing, and varied that traditional database tools cannot capture, store, manage or analyze them efficiently. It is characterized by the "V"s:

Volume – enormous quantities (TB to PB and beyond).
Velocity – data generated and processed at high speed (e.g., streaming, real-time).
Variety – structured, semi-structured and unstructured data (text, images, logs, video).
Veracity – uncertainty/quality of data; and Value – the useful insight extracted.

Big Data is handled by distributed frameworks such as Hadoop (HDFS + MapReduce), Spark, and NoSQL stores, enabling parallel processing across clusters of commodity machines.

(b) Information Retrieval

Information Retrieval (IR) is the science of finding relevant unstructured information (mainly text documents) from a large collection in response to a user's query. Unlike database querying, which returns exact matches over structured data, IR ranks documents by relevance to a (often keyword) query.

Key concepts:

Indexing using an inverted index mapping terms → documents.
Ranking models such as the Boolean, Vector Space Model (using TF-IDF weights and cosine similarity), and probabilistic models.
Evaluation by precision and recall.

Web search engines (Google, Bing) are the most prominent IR systems.

Level	BSc CSIT (TU)
Stream	Science
Subject	Advanced Database (BSc CSIT, CSC461)
Year	2077 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2077 Nepal

Section A: Long Answer Questions

Data Warehouse

Architecture

OLAP (Online Analytical Processing)

Star Schema

OLTP vs OLAP

Query Processing

Steps in Query Processing and Optimization

Heuristic-Based Optimization

Example

Distributed DBMS (DDBMS)

1. Data Fragmentation

2. Replication

3. Data Allocation

Example

Section B: Short Answer Questions

(a) Big Data

(b) Information Retrieval

Frequently asked questions