BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2077 Nepal
This is the official BSc CSIT (TU) (Science stream) Advanced Database (BSc CSIT, CSC461) question paper for 2077, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Advanced Database (BSc CSIT, CSC461) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) exam or solving previous years' question papers, this 2077 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
What is a data warehouse? Explain its architecture, the concept of OLAP and the star schema, and differentiate between OLTP and OLAP systems.
Data Warehouse
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data that supports management decision making. It consolidates historical data from multiple operational sources into a single repository optimized for analysis and reporting rather than transaction processing.
Key properties (Inmon's definition):
- Subject-oriented – organized around major subjects (customer, product, sales) rather than applications.
- Integrated – data from heterogeneous sources is cleaned and made consistent.
- Time-variant – stores historical snapshots; every record carries a time element.
- Non-volatile – data is loaded and read, but not updated/deleted by end users.
Architecture
A typical three-tier architecture:
- Bottom tier (Data sources + Staging/ETL + Warehouse server) – operational databases and external files. An ETL process (Extract → Transform → Load) cleans, integrates and loads data into the warehouse and its metadata repository.
- Middle tier (OLAP server) – a ROLAP (relational) or MOLAP (multidimensional) engine that presents the data as data cubes for fast multidimensional queries.
- Top tier (Front-end tools) – query, reporting, analysis and data-mining tools used by analysts.
Data marts (departmental subsets) are often built on top of the central warehouse.
OLAP (Online Analytical Processing)
OLAP enables interactive, multidimensional analysis of summarized data using a data cube with dimensions (e.g., time, location, product) and measures (e.g., sales). Core OLAP operations:
- Roll-up – aggregate to a higher level (city → country).
- Drill-down – go to finer detail (country → city).
- Slice – select one dimension value to get a sub-cube.
- Dice – select values on several dimensions.
- Pivot (rotate) – reorient the cube for a different view.
Star Schema
The star schema is the most common dimensional model. It has one central fact table (containing measures and foreign keys) surrounded by several dimension tables, forming a star shape.
+-----------+
| Time dim |
+-----------+
|
+--------+ +-----------+ +-----------+
|Product |---| SALES |---| Location |
| dim | | (fact) | | dim |
+--------+ +-----------+ +-----------+
|
+-----------+
| Customer |
+-----------+
The fact table (SALES) holds measures such as amount, quantity plus foreign keys to each dimension. Dimensions are denormalized for fast joins. (When dimensions are normalized into sub-dimensions it becomes a snowflake schema.)
OLTP vs OLAP
| Feature | OLTP | OLAP |
|---|---|---|
| Purpose | Day-to-day transaction processing | Decision support / analysis |
| Data | Current, detailed | Historical, summarized |
| Operations | Short read/write (INSERT, UPDATE, DELETE) | Complex read-mostly queries |
| Schema | Normalized (3NF) | Dimensional (star/snowflake) |
| Users | Clerks, online customers (many) | Analysts, managers (few) |
| DB size | MB–GB | GB–TB |
| Access | Many short transactions | Fewer, long, ad-hoc queries |
| Design goal | High throughput, data integrity | Fast query response, aggregation |
What is query processing? Explain the different steps involved in query processing and optimization with the help of a block diagram. Discuss how heuristic-based optimization transforms a relational algebra query into an efficient query plan.
Query Processing
Query processing is the set of activities a DBMS performs to translate a high-level query (e.g., SQL) into an efficient sequence of low-level operations that retrieve the required data, execute them, and return the result.
Steps in Query Processing and Optimization
High-level (SQL) query
|
v
+--------------------+
| 1. Scanning, | --> syntax & semantic check
| Parsing, | against the system catalog
| Validating |
+--------------------+
|
v
+--------------------+
| 2. Translation to | --> initial relational
| relational | algebra query tree
| algebra |
+--------------------+
|
v
+--------------------+ <--- catalog statistics,
| 3. Query Optimizer | access paths, cost model
+--------------------+
|
v
Execution plan (optimized)
|
v
+--------------------+
| 4. Code generator /|
| runtime engine |
+--------------------+
|
v
Query result
- Parsing and translation – the query is scanned, parsed for syntax, and validated (do the relations/attributes exist?). It is then translated into an internal relational algebra expression, usually represented as a query tree.
- Optimization – the optimizer generates alternative equivalent execution plans and chooses the cheapest. Two approaches: heuristic (rule-based) optimization and cost-based optimization (uses catalog statistics to estimate cost).
- Evaluation/Execution – the chosen plan (with specific access methods for each operation) is executed by the query-evaluation engine, which returns the result.
Heuristic-Based Optimization
Heuristic optimization applies a set of transformation rules to the relational-algebra query tree to obtain an equivalent but more efficient tree, without computing actual costs. The main heuristic is "perform selections and projections as early as possible" to reduce intermediate result sizes. Typical steps:
- Push down selections (σ) – move selection operations down the tree, closest to the base relations, so fewer tuples flow upward.
- Push down projections (π) – move projections down to eliminate unneeded attributes early.
- Replace Cartesian product + selection by a join (⋈) –
σ_θ(R × S)becomesR ⋈_θ S. - Reorder joins – execute the most restrictive (smallest-result) joins first.
- Combine/cascade selections and projections to avoid repeated passes.
Example
σ_dept='CS'(EMP ⋈ DEPT)
is transformed to push the selection onto EMP before the join:
(σ_dept='CS'(EMP)) ⋈ DEPT
so the join processes far fewer tuples — producing an efficient query plan.
What is a distributed database management system (DDBMS)? Explain data fragmentation, replication and allocation techniques used in distributed databases with suitable examples.
Distributed DBMS (DDBMS)
A distributed database is a single logical database whose data is physically stored across multiple computers (sites) connected by a network. A DDBMS is the software that manages this distributed database and makes the distribution transparent to users — they see one logical database even though it is fragmented and spread over several sites.
Goals: improved reliability/availability, local autonomy, better performance through parallelism, and easier expansion.
1. Data Fragmentation
Fragmentation divides a relation into smaller pieces (fragments) that can be stored at different sites. Correctness rules: completeness (every tuple/attribute appears in some fragment), reconstruction (original relation can be rebuilt), and disjointness.
- Horizontal fragmentation – splits a relation by rows using selection predicates.
Example:
EMP1 = σ_branch='KTM'(EMP),EMP2 = σ_branch='PKR'(EMP). Reconstruct with UNION. - Vertical fragmentation – splits by columns using projection; each fragment keeps the primary key.
Example:
EMP_A = π_(eid,name)(EMP),EMP_B = π_(eid,salary)(EMP). Reconstruct with JOIN oneid. - Mixed (hybrid) fragmentation – combination of horizontal and vertical fragmentation.
2. Replication
Replication stores copies of a fragment/relation at more than one site.
- Full replication – every site has a complete copy (fast reads/high availability, costly updates).
- Partial replication – only selected fragments are replicated where they are most needed.
- No replication – each fragment stored at exactly one site.
Trade-off: replication improves availability and read performance but increases the cost and complexity of keeping copies consistent during updates.
3. Data Allocation
Allocation decides which fragment is stored at which site.
- Centralized – single database at one site.
- Partitioned (non-replicated) – each fragment assigned to exactly one site.
- Replicated – fragments allocated with copies at multiple sites.
Allocation is driven by an analysis of query/update frequencies at each site so that data resides close to where it is most used, minimizing communication cost.
Example
A bank with branches in Kathmandu and Pokhara horizontally fragments ACCOUNT by branch, stores each fragment at its own branch site (allocation), and keeps a replica of a small BRANCH table at both sites (replication) for fast local lookups.
Section B: Short Answer Questions
Attempt any EIGHT questions.
Why is hashing important to store data in databases? What is primary file organization?
Importance of hashing: Hashing computes the disk address (bucket) of a record directly from its hash-key value using a hash function h(key), allowing a record to be located in (ideally) a single block access, i.e., average retrieval. This is far faster than scanning or even index traversal for equality searches, supports very large files efficiently, and spreads records evenly to avoid hotspots. (Limitation: it does not help with range queries.)
Primary file organization: It is the way records of a file are physically arranged on disk and the method used to access them. Common primary organizations are:
- Heap (unordered) – records placed in insertion order.
- Sequential/Ordered – records sorted on an ordering field.
- Hashed – record placement determined by a hash function on the hash field.
It determines how each record is stored and retrieved, in contrast to a secondary organization (e.g., a secondary index) that provides an alternative access path to the same data.
Explain aggregation with a suitable example.
Aggregation is an abstraction in the Entity-Relationship model that treats a relationship (together with its participating entities) as a higher-level entity, so that this relationship can itself participate in another relationship. It is used when we need to model a relationship among relationships — something a plain ER diagram cannot express directly.
Example: Consider an EMPLOYEE working on a PROJECT — modeled by the relationship WORKS-ON. Now suppose each such work assignment is managed using some MACHINERY. The relationship MANAGES must connect MACHINERY to the WORKS-ON relationship, not to a single entity.
EMPLOYEE ---<WORKS-ON>--- PROJECT
| (aggregated as one unit)
<MANAGES>
|
MACHINERY
Here the WORKS-ON relationship set is aggregated into a single abstract entity, which then participates in the MANAGES relationship with MACHINERY. (Note: in SQL/relational algebra, aggregation also refers to summary functions like SUM, AVG, COUNT, MIN, MAX over groups of rows.)
What is the ODMG object model? What is Object Definition Language (ODL)?
ODMG Object Model: The ODMG (Object Data Management Group) model is the standard data model for object-oriented databases (OODBMS). It provides a common framework so that object databases from different vendors are portable. Its main components are:
- Objects and Literals – an object has a unique OID (object identifier), state and behaviour; a literal has only a value and no OID.
- Type / class with attributes, relationships (with inverse, maintaining referential integrity), and methods (operations).
- Built-in collection types: Set, Bag, List, Array, Dictionary.
- Support for inheritance (
extends/:), encapsulation and a class extent and keys. - Standard languages: ODL (schema), OQL (query) and language bindings (C++, Java, Smalltalk).
Object Definition Language (ODL): ODL is the schema-definition language of the ODMG standard, analogous to the DDL of relational databases. It is used to specify the interface (structure) of object types — their attributes, relationships (with inverses), and operation signatures — independent of any programming language, so the same schema can be mapped to different language bindings.
class Employee (extent employees key eid) {
attribute int eid;
attribute string name;
relationship Department worksIn inverse Department::staff;
float computeSalary();
};
Explain the different steps in query processing.
Query processing translates a high-level (SQL) query into an executable, optimized sequence of operations and runs it. The main steps are:
- Scanning, parsing and validating – the query string is tokenized (scanned), checked for correct syntax (parsed), and semantically validated against the system catalog to confirm that the referenced relations and attributes exist and are used correctly.
- Translation into relational algebra – the validated query is converted into an internal relational-algebra expression, typically represented as a query tree (or query graph).
- Query optimization – the optimizer generates equivalent execution plans and selects the most efficient one using heuristic (rule-based) rules and/or cost-based estimation from catalog statistics, producing an execution plan with specific access methods.
- Code generation / execution – the query-evaluation engine executes the chosen plan and returns the result to the user.
SQL query → Parse & validate → Relational algebra (query tree)
→ Optimizer → Execution plan → Execution → Result
Why is query optimization essential in databases? What is heuristic optimization?
Why query optimization is essential: For any SQL query there are many equivalent execution strategies (different join orders, access paths, and operation sequences) whose costs can differ by orders of magnitude. Optimization is essential to:
- minimize response time and the consumption of resources (disk I/O, CPU, memory, network);
- choose the cheapest among many equivalent plans, since a poor plan can be thousands of times slower;
- keep the system scalable as data volume grows;
- relieve the user, who writes what data is needed (declarative SQL) but not how to fetch it efficiently.
Heuristic optimization: Heuristic (rule-based) optimization improves the relational-algebra query tree by applying transformation rules based on general guidelines rather than computing actual costs. The central heuristic is "apply selections and projections as early as possible" (push them down the tree) to shrink intermediate results. Other rules include converting Cartesian-product-plus-selection into joins, reordering joins to do the most restrictive first, and cascading/combining selections and projections. It is fast and avoids the overhead of full cost estimation, though it may miss the truly optimal plan.
Define fragmentation. Explain horizontal fragmentation with an example.
Fragmentation is the process of dividing a relation in a distributed database into smaller logical pieces called fragments, which can be stored at different sites. It must satisfy three correctness conditions: completeness (every data item appears in some fragment), reconstruction (the original relation can be rebuilt from its fragments), and disjointness (no overlap, except the key in vertical fragmentation). The three types are horizontal, vertical and mixed (hybrid).
Horizontal fragmentation divides a relation by rows (tuples), selecting subsets of tuples using a selection predicate (). Each fragment contains all attributes but only the tuples satisfying its condition. The original relation is reconstructed by taking the UNION of all horizontal fragments.
Example: Given EMP(eid, name, branch, salary) for a company with two branches:
EMP1 is stored at the Kathmandu site and EMP2 at the Pokhara site, placing each branch's data where it is most used. Reconstruction: .
What are the characteristics of NoSQL systems? Explain.
NoSQL (Not Only SQL) systems are non-relational databases designed for large-scale, distributed storage of structured, semi-structured and unstructured data. Their main characteristics are:
- Schema-less / flexible schema – records need not follow a fixed schema; fields can vary per record, easing evolution.
- Horizontal scalability (scale-out) – data is partitioned (sharded) and distributed across many commodity servers rather than scaling up a single machine.
- High availability and fault tolerance – data is replicated across nodes so the system keeps running despite node failures.
- BASE rather than ACID – favours Basically Available, Soft state, Eventual consistency over strict ACID transactions, following the CAP theorem trade-off (often choosing Availability + Partition tolerance over strong Consistency).
- Distributed, no/limited joins – queries avoid expensive cross-node joins; access is typically by key.
- Variety of data models – key-value (Redis), document (MongoDB), column-family (Cassandra/HBase), and graph (Neo4j) stores.
- Designed for Big Data and high write/read throughput.
What is the concept of an active database? What are triggers?
Active database: An active database is a database system that can react automatically to events by executing predefined actions, without explicit user or application intervention. It extends a conventional (passive) DBMS with active rules that monitor the database and respond to changes, enforcing business logic and integrity reactively. Active rules follow the ECA (Event–Condition–Action) model: when an Event occurs, if a Condition holds, then an Action is executed.
Triggers: A trigger is the most common form of active rule — a named block of procedural code that is automatically executed (fired) by the DBMS in response to a specified data-modification event (INSERT, UPDATE, or DELETE) on a table. A trigger specifies the event, an optional condition (WHEN), the timing (BEFORE/AFTER) and granularity (FOR EACH ROW/statement), and the action to perform. Triggers are used to enforce complex integrity constraints, maintain derived/audit data, and automate business rules.
CREATE TRIGGER chk_salary
AFTER UPDATE ON Employee
FOR EACH ROW
WHEN (NEW.salary > 1000000)
BEGIN
INSERT INTO Audit_Log VALUES (NEW.eid, 'High salary', SYSDATE);
END;
Write short notes on: (a) Big Data (b) Information Retrieval.
(a) Big Data
Big Data refers to datasets so large, fast-growing, and varied that traditional database tools cannot capture, store, manage or analyze them efficiently. It is characterized by the "V"s:
- Volume – enormous quantities (TB to PB and beyond).
- Velocity – data generated and processed at high speed (e.g., streaming, real-time).
- Variety – structured, semi-structured and unstructured data (text, images, logs, video).
- Veracity – uncertainty/quality of data; and Value – the useful insight extracted.
Big Data is handled by distributed frameworks such as Hadoop (HDFS + MapReduce), Spark, and NoSQL stores, enabling parallel processing across clusters of commodity machines.
(b) Information Retrieval
Information Retrieval (IR) is the science of finding relevant unstructured information (mainly text documents) from a large collection in response to a user's query. Unlike database querying, which returns exact matches over structured data, IR ranks documents by relevance to a (often keyword) query.
Key concepts:
- Indexing using an inverted index mapping terms → documents.
- Ranking models such as the Boolean, Vector Space Model (using TF-IDF weights and cosine similarity), and probabilistic models.
- Evaluation by precision and recall.
Web search engines (Google, Bing) are the most prominent IR systems.
Frequently asked questions
- Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2077?
- The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2077 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Advanced Database (BSc CSIT, CSC461) 2077 paper come with solutions?
- Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2077 paper?
- The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2077 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?
- Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.