BSc CSIT (TU) Science Advanced Database (BSc CSIT, CSC461) Question Paper 2074 Nepal
This is the official BSc CSIT (TU) (Science stream) Advanced Database (BSc CSIT, CSC461) question paper for 2074, as set in the regular annual examination. It carries 60 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Advanced Database (BSc CSIT, CSC461) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) exam or solving previous years' question papers, this 2074 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt any TWO questions.
What is query processing? Explain the different steps involved in query processing and optimization with the help of a block diagram. Discuss how heuristic-based optimization transforms a relational algebra query into an efficient query plan.
Query Processing
Query processing is the set of activities a DBMS performs to translate a high-level query (e.g., SQL) into an efficient sequence of low-level operations that retrieve data from the database, execute them, and return the result.
Steps in Query Processing and Optimization
The overall flow can be shown as the following block diagram (described in words):
High-level SQL query
|
v
[Scanner / Parser / Validator] --> checks syntax, attributes, relations
|
v
Query in internal form (parse tree)
|
v
[Query Optimizer] <---- Statistics / Catalog / Cost model
|
v
Execution Plan (annotated relational algebra tree)
|
v
[Code Generator]
|
v
[Runtime / Query Execution Engine]
|
v
Result of query
- Parsing and translation (Scanner, Parser, Validator): The query is scanned for tokens, parsed for correct syntax, and validated against the system catalog (relation and attribute names exist, types match). It is translated into an internal representation, usually a relational-algebra expression (query tree).
- Optimization: The optimizer generates several equivalent evaluation plans and selects the one with the least estimated cost (disk I/O, CPU, etc.) using catalog statistics and a cost model. It chooses access methods (index vs. full scan), join orders, and join algorithms.
- Code/plan generation: The chosen plan is turned into executable code (an execution plan with iterators).
- Execution: The query-evaluation engine runs the plan and returns the result.
Heuristic-Based Optimization
Heuristic optimization applies transformation rules to the relational-algebra tree to convert it into an equivalent but cheaper form, without a full cost analysis. The main heuristic is to reduce the size of intermediate results early. Typical rules applied in order:
- Push selections () down the tree (perform them as early as possible) so that fewer tuples flow upward — selection is the most beneficial operation to do first.
- Push projections () down so that only needed attributes are carried forward, reducing tuple width.
- Convert Cartesian products followed by selection into joins ().
- Reorder joins so the most restrictive (smallest intermediate result) joins execute first.
- Combine cascading selections and use associativity/commutativity of joins.
Example: A query is improved by pushing onto the Student relation before the join, so the join processes far fewer tuples. The result is an efficient query plan that is then executed.
What is a distributed database management system (DDBMS)? Explain data fragmentation, replication and allocation techniques used in distributed databases with suitable examples.
Distributed Database Management System (DDBMS)
A distributed database is a collection of logically interrelated databases distributed over a computer network. A DDBMS is the software that manages this distributed database and makes the distribution transparent to users — they see a single logical database even though the data physically resides on multiple sites. It supports location, fragmentation, and replication transparency, distributed query processing, and distributed transaction management.
Data Fragmentation
Fragmentation splits a global relation into smaller pieces (fragments) stored at different sites. Conditions: completeness (every tuple/attribute appears in some fragment), reconstruction (the original relation can be rebuilt), and disjointness.
- Horizontal fragmentation: divides a relation by rows (tuples) using a selection predicate.
Example:
EMPsplit as stored at the Sales site and at the HR site. Reconstruction: . - Vertical fragmentation: divides a relation by columns (attributes), repeating the primary key in each fragment. Example: and . Reconstruction: .
- Mixed (hybrid) fragmentation: combines horizontal and vertical fragmentation.
Data Replication
Replication stores copies of a fragment (or whole relation) at multiple sites.
- Full replication: a copy of the whole database at every site — best read performance and availability, but costly updates.
- Partial replication: only some fragments are replicated at selected sites.
- No replication: each fragment stored at exactly one site.
Advantages: improved availability and faster local reads. Disadvantage: updates must be propagated to all copies to keep them consistent.
Data Allocation
Allocation decides which fragment/replica is stored at which site, based on access patterns and cost.
- Non-redundant allocation: each fragment assigned to exactly one site.
- Redundant allocation: fragments allocated with replicas to optimize availability and local access. The goal is to place data close to where it is most frequently used to minimize communication cost.
Summary: Fragmentation decides how to split data, replication decides how many copies, and allocation decides where to place them.
Explain the object-oriented database model. Discuss the ODMG object model along with the Object Definition Language (ODL) and Object Query Language (OQL) with examples.
Object-Oriented Database (OODB) Model
An object-oriented database combines database capabilities (persistence, concurrency, recovery, querying) with the object-oriented programming paradigm. Data is represented as objects, each with a unique object identifier (OID), a state (attribute values), and behavior (methods). Key OO concepts supported: encapsulation, inheritance, polymorphism, complex objects, and class/type hierarchies. OODBs avoid the impedance mismatch between programming languages and relational tables.
ODMG Object Model
The ODMG (Object Data Management Group) standard defines a common model so OODB products are portable. Its main components are:
- Objects and Literals: an object has an OID and mutable state; a literal has only a value (no OID).
- Types: built from interfaces (behavior only) and classes (state + behavior, instantiable).
- Atomic, structured and collection types (set, bag, list, array, dictionary).
- Relationships (with inverse traversal paths) and methods.
- Supports inheritance through
:(subtyping).
Object Definition Language (ODL)
ODL is the DDL of the ODMG model — used to define object schemas (classes, attributes, relationships, methods) independent of any programming language.
class Student (extent students key sid) {
attribute int sid;
attribute string name;
relationship Set<Course> enrolled
inverse Course::students;
float gpa();
};
class Course (extent courses key cid) {
attribute string cid;
attribute string title;
relationship Set<Student> students
inverse Student::enrolled;
};
Object Query Language (OQL)
OQL is the declarative query language of ODMG, SQL-like but supporting objects, path expressions, method calls, and collections.
-- Names of students with GPA above 3.5
select s.name
from s in students
where s.gpa() > 3.5;
-- Path expression: titles of courses a student is enrolled in
select c.title
from s in students, c in s.enrolled
where s.name = "Ram";
OQL queries return collections of objects or literals and can be embedded in host languages such as C++ or Java.
Section B: Short Answer Questions
Attempt any EIGHT questions.
Why is hashing important to store data in databases? What is primary file organization?
Importance of hashing: Hashing applies a hash function to a record's search key to compute the address (bucket) of the disk block where the record is stored. This allows records to be located in (ideally) O(1) time / a single disk access without scanning the file or traversing an index, making equality searches, insertions, and deletions very fast. It is especially useful for equality selections on the key. (Its limitation: it does not efficiently support range queries.)
Primary file organization: It determines the physical placement/order of records in the data file on disk, i.e., how records are physically arranged and accessed. Common primary organizations are:
- Heap (unordered) file — records placed in insertion order.
- Sequential (ordered) file — records sorted on an ordering field.
- Hash file — record placement determined by a hash function on the hash key.
- B+-tree / clustered organization.
It differs from a secondary organization (a secondary index), which only provides an extra access path without changing the physical storage of records.
Explain aggregation with a suitable example.
Aggregation is an abstraction in the (Enhanced) Entity-Relationship model used to treat a relationship (along with its participating entities) as a single higher-level entity, so that this relationship can itself participate in another relationship. It overcomes the ER limitation that a relationship cannot directly be connected to another relationship.
Example: Consider entities Employee and Project linked by a relationship WorksOn. Now suppose we want to record which Manager monitors a particular employee-on-a-project assignment. A manager monitors the WorksOn relationship instance, not the employee or project alone. Using aggregation, the WorksOn relationship is abstracted into a single composite object, and a new relationship Monitors is drawn between Manager and this aggregated WorksOn object.
In a diagram this is shown by enclosing the Employee–WorksOn–Project relationship inside a dashed box, and connecting Manager to that box via Monitors. This captures the semantics that monitoring applies to the assignment as a whole.
What is the ODMG object model? What is Object Definition Language (ODL)?
ODMG Object Model: The standard object model defined by the Object Data Management Group (ODMG) to provide a portable, common data model for object-oriented and object databases. Its main features:
- Basic building blocks are objects (have a unique OID and mutable state) and literals (have only a value, no OID).
- Types are described by interfaces (behavior only) and classes (state + behavior, instantiable).
- Supports atomic, structured, and collection types (set, bag, list, array, dictionary).
- Supports inheritance, relationships with inverses, attributes, and methods (operations).
- Provides three sub-languages: ODL (schema definition), OQL (querying), and language bindings (C++, Java, Smalltalk).
Object Definition Language (ODL): ODL is the data-definition language of the ODMG standard, used to specify the object schema — defining classes, their attributes, relationships (with inverse), and method signatures — in a programming-language-independent way.
class Employee (extent employees key eid) {
attribute int eid;
attribute string name;
relationship Department dept inverse Department::staff;
};
Explain the different steps in query processing.
Steps in Query Processing
Query processing translates a high-level query into a result through the following main steps:
- Parsing and translation: The query is scanned and parsed to check syntactic correctness, validated against the system catalog (relations/attributes exist, types match), and translated into an internal relational-algebra expression / query tree.
- Optimization: The optimizer generates equivalent execution plans and selects the one with the lowest estimated cost (disk I/O, CPU), choosing access paths (index vs. scan), join orders, and join algorithms using catalog statistics.
- Code/plan generation: The selected plan is converted into an executable query-execution plan (a tree of physical operators/iterators).
- Execution (evaluation): The runtime engine executes the plan against the stored data and returns the result to the user.
In short: SQL query → parse & validate → relational algebra → optimize → execution plan → execute → result.
Why is query optimization essential in databases? What is heuristic optimization?
Why query optimization is essential: A single SQL query can be evaluated by many different but equivalent execution plans, and their costs (disk I/O, CPU time, memory, response time) can differ by orders of magnitude. Query optimization is essential because it:
- selects the most efficient execution plan, minimizing resource usage and response time;
- relieves the user from writing low-level, efficient code (declarative SQL just states what, not how);
- improves throughput and scalability, especially for large relations and complex joins where a poor join order or a full scan instead of an index scan can be extremely costly.
Heuristic optimization: It is a rule-based optimization approach that applies a set of transformation (heuristic) rules to the relational-algebra query tree to obtain an equivalent but cheaper plan, without exhaustively computing the cost of every alternative. The core heuristic is to reduce the size of intermediate results early, mainly by:
- pushing selections () down the tree to filter tuples as early as possible;
- pushing projections () down to drop unneeded columns early;
- converting Cartesian product + selection into joins and choosing a good join order.
This is faster to compute than full cost-based optimization but may not always yield the absolute optimal plan.
Define fragmentation. Explain horizontal fragmentation with an example.
Fragmentation is the process in a distributed database of dividing a global relation into smaller pieces called fragments, which are stored at different sites of the network. It must satisfy completeness (every data item appears in some fragment), reconstruction (the original relation can be rebuilt from its fragments), and disjointness. Types: horizontal, vertical, and mixed (hybrid).
Horizontal fragmentation divides a relation by rows (tuples), where each fragment is the set of tuples satisfying a selection predicate. Each fragment is defined as , and the original relation is reconstructed by the union of all fragments: .
Example: Given a relation EMPLOYEE(eid, name, branch) for a bank with branches in Kathmandu and Pokhara:
- — stored at the Kathmandu site.
- — stored at the Pokhara site.
Each site holds only the employees relevant to it (improving local access), and reconstructs the full relation.
What are the characteristics of NoSQL systems? Explain.
Characteristics of NoSQL Systems
NoSQL ("Not Only SQL") systems are non-relational databases designed for large-scale, distributed, and high-availability data management, especially for big data and web applications. Their key characteristics:
- Schema-less / flexible schema: Data need not conform to a fixed predefined schema; records can have different structures (good for semi-structured data like JSON documents).
- Horizontal scalability: Designed to scale out across many commodity servers (sharding) rather than scaling up a single machine, handling huge data volumes and high throughput.
- Distribution and replication: Data is partitioned and replicated across nodes for fault tolerance and high availability.
- BASE instead of ACID: Favor Basically Available, Soft state, Eventual consistency — relaxing strict consistency for availability and performance (per the CAP theorem, NoSQL systems often prioritize Availability and Partition tolerance over strong Consistency).
- Variety of data models: include key-value stores (Redis), document stores (MongoDB), column-family stores (Cassandra, HBase), and graph databases (Neo4j).
- High performance for simple read/write operations and no (or limited) support for complex joins and full SQL.
These features make NoSQL suitable for big-data, real-time, and web-scale applications where flexibility and scalability matter more than rigid relational structure.
What is the concept of an active database? What are triggers?
Active database: An active database is a database system that can react automatically to events (changes/conditions) by executing predefined actions without explicit user request. It extends a conventional (passive) DBMS with reactive behavior, typically specified using ECA (Event–Condition–Action) rules:
- Event — an occurrence such as an INSERT, UPDATE, or DELETE.
- Condition — a predicate checked when the event fires.
- Action — the operation executed if the condition is true.
Active databases are used to enforce integrity constraints, business rules, automatic derived-data maintenance, alerting, and logging/auditing.
Triggers: A trigger is the most common mechanism that implements ECA rules. It is a named procedural block stored in the database that is automatically executed ("fired") in response to a specified data-modification event on a table.
CREATE TRIGGER check_salary
BEFORE INSERT ON Employee
FOR EACH ROW
WHEN (NEW.salary < 0)
BEGIN
RAISE_APPLICATION_ERROR(-20001, 'Salary cannot be negative');
END;
Here the event is INSERT, the condition is salary < 0, and the action raises an error — enforcing a business rule automatically.
Write short notes on: (a) Big Data (b) Information Retrieval.
Short Notes
(a) Big Data
Big Data refers to datasets that are too large, fast, and varied to be captured, stored, managed, and processed efficiently by traditional database tools. It is characterized by the "V"s:
- Volume — enormous amounts of data (terabytes to petabytes and beyond).
- Velocity — high speed of data generation and the need for real-time/near-real-time processing.
- Variety — structured, semi-structured, and unstructured data (text, images, logs, sensor data, social media).
- Veracity — uncertainty/quality of data, and Value — usefulness extracted from it.
Big Data is typically processed using distributed frameworks such as Hadoop (HDFS + MapReduce), Spark, and NoSQL databases that scale horizontally across clusters of commodity machines.
(b) Information Retrieval
Information Retrieval (IR) is the area concerned with finding relevant documents/information (usually unstructured text) from a large collection in response to a user's query, as opposed to retrieving exact structured records like a DBMS.
- A typical IR system indexes documents (e.g., using an inverted index), processes the query, ranks documents by relevance, and returns the most relevant results.
- Common models: Boolean, Vector Space Model (using TF–IDF weighting and cosine similarity), and Probabilistic models.
- Effectiveness is measured by precision (fraction of retrieved that are relevant) and recall (fraction of relevant that are retrieved).
- Web search engines (e.g., Google) are the most familiar large-scale IR systems.
Frequently asked questions
- Where can I find the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) question paper 2074?
- The full BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2074 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Advanced Database (BSc CSIT, CSC461) 2074 paper come with solutions?
- Yes. Every question on this Advanced Database (BSc CSIT, CSC461) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2074 paper?
- The BSc CSIT (TU) Advanced Database (BSc CSIT, CSC461) 2074 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
- Is practising this Advanced Database (BSc CSIT, CSC461) past paper free?
- Yes — reading and attempting this Advanced Database (BSc CSIT, CSC461) past paper on Kekkei is completely free.