BSc CSIT (TU) Science Data Warehousing and Data Mining (BSc CSIT, CSC410) Question Paper 2081 Nepal

Q: Where can I find the BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) question paper 2081?

The full BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2081 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Data Warehousing and Data Mining (BSc CSIT, CSC410) 2081 paper come with solutions?

Yes. Every question on this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2081 paper?

The BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper free?

Yes — reading and attempting this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is data mining? Explain the knowledge discovery in databases (KDD) process with a diagram, and discuss major data mining functionalities.

data-miningkdd

Answer 1

Data Mining

Data mining is the process of discovering interesting, previously unknown, valid, and potentially useful patterns, correlations, and knowledge from large volumes of data stored in databases, data warehouses, or other repositories. It is also called Knowledge Discovery from Data (KDD) and combines techniques from statistics, machine learning, and database systems.

Knowledge Discovery in Databases (KDD) Process

KDD is the overall process of converting raw data into useful knowledge. Data mining is one core step within it. The major steps are:

Data Cleaning – remove noise and inconsistent data.
Data Integration – combine data from multiple heterogeneous sources.
Data Selection – retrieve data relevant to the analysis task from the database.
Data Transformation – transform/consolidate data into forms suitable for mining (e.g., normalization, aggregation).
Data Mining – apply intelligent methods to extract patterns.
Pattern Evaluation – identify the truly interesting patterns using interestingness measures.
Knowledge Presentation – use visualization and knowledge-representation techniques to present mined knowledge to the user.

Diagram (described): A left-to-right pipeline of stacked boxes:

Databases / Data Warehouse → [Cleaning + Integration] → Data Warehouse → [Selection + Transformation] → Task-relevant Data → [Data Mining] → Patterns → [Evaluation + Presentation] → Knowledge

The first four steps are preprocessing (the bulk of the effort), data mining is the core step, and the last two deliver knowledge to the user.

Major Data Mining Functionalities

Characterization & Discrimination – summarize general features of a class (characterization) or compare features of contrasting classes (discrimination).
Association & Correlation Analysis – discover frequent patterns and association rules (e.g., market-basket analysis: bread ⇒ butter).
Classification – build a model to assign data to predefined categorical class labels (e.g., decision tree, Naïve Bayes).
Prediction (Regression) – predict missing or unavailable numeric values.
Cluster Analysis – group similar objects into clusters without predefined labels (unsupervised).
Outlier / Anomaly Analysis – detect objects that deviate from the general behaviour (e.g., fraud detection).
Evolution Analysis – model trends for data that changes over time.

Patterns are judged using interestingness measures: validity, novelty, usefulness, and understandability.

Answer 2

Association Rule Mining

Association rule mining discovers interesting relationships (co-occurrences) among items in large transactional datasets. A rule has the form $X \Rightarrow Y$ where $X, Y$ are itemsets and $X \cap Y = \emptyset$ .

Two key measures:

Support: $\text{supp}(X \Rightarrow Y) = P(X \cup Y)$ — fraction of transactions containing both $X$ and $Y$ .
Confidence: $\text{conf}(X \Rightarrow Y) = \dfrac{\text{supp}(X \cup Y)}{\text{supp}(X)} = P(Y\mid X)$ .

A strong rule satisfies both minimum support and minimum confidence.

Apriori Algorithm

Apriori uses the downward-closure (anti-monotone) property: every subset of a frequent itemset must also be frequent. It works level-wise, generating candidate $k$ -itemsets ( $C_k$ ) from frequent $(k-1)$ -itemsets and pruning those with infrequent subsets.

Worked Example

Consider 4 transactions with min_support = 50% (i.e., 2 transactions) and min_confidence = 70%:

TID	Items
T1	A, B, C
T2	A, C
T3	A, D
T4	B, C

Step 1 – $C_1$ → $L_1$ (count each item):

Item	Count	Frequent?
A	3	✓
B	2	✓
C	3	✓
D	1	✗ (pruned)

$L_1 = \{A, B, C\}$

Step 2 – $C_2$ → $L_2$ :

Itemset	Count	Frequent?
{A,B}	1	✗
{A,C}	2	✓
{B,C}	2	✓

$L_2 = \{ \{A,C\}, \{B,C\} \}$

Step 3 – $C_3$ : Candidate {A,B,C} requires subset {A,B} which is not frequent → pruned. No frequent 3-itemsets, so we stop.

Generating Strong Rules from frequent itemsets

From {A,C} (support = 2):

$A \Rightarrow C$ : conf $= 2/3 = 66.7\%$ → rejected
$C \Rightarrow A$ : conf $= 2/3 = 66.7\%$ → rejected

From {B,C} (support = 2):

$B \Rightarrow C$ : conf $= 2/2 = 100\%$ → strong rule ✓
$C \Rightarrow B$ : conf $= 2/3 = 66.7\%$ → rejected

Result: The only strong association rule is $B \Rightarrow C$ (support 50%, confidence 100%), meaning every customer who buys B also buys C.

Answer 3

Classification using a Decision Tree

Classification builds a model that assigns data objects to predefined categorical class labels. A decision tree is a flowchart-like tree where each internal node tests an attribute, each branch is an outcome of the test, and each leaf node holds a class label. To classify a new instance, we traverse from the root to a leaf following the attribute tests.

ID3 Algorithm

ID3 builds the tree top-down, selecting at each node the attribute with the highest information gain (greatest reduction in entropy).

Entropy(S) = -\sum_{i} p_i \log_2 p_i \qquad Gain(S,A) = Entropy(S) - \sum_{v \in A} \frac{|S_v|}{|S|}\,Entropy(S_v)

Worked Example (PlayTennis-style dataset)

Outlook	Temp	PlayTennis
Sunny	Hot	No
Sunny	Cool	No
Overcast	Hot	Yes
Overcast	Cool	Yes
Rain	Cool	Yes
Rain	Hot	No

Class distribution: 3 Yes, 3 No → $Entropy(S) = -\tfrac{3}{6}\log_2\tfrac{3}{6} - \tfrac{3}{6}\log_2\tfrac{3}{6} = 1.0$ .

Gain(Outlook):

Sunny (2 rows): both No → entropy 0
Overcast (2 rows): both Yes → entropy 0
Rain (2 rows): 1 Yes, 1 No → entropy 1.0

Gain(Outlook) = 1.0 - \left(\tfrac{2}{6}(0) + \tfrac{2}{6}(0) + \tfrac{2}{6}(1.0)\right) = 1.0 - 0.333 = 0.667

Gain(Temp): Hot {No,Yes,No}=entropy 0.918; Cool {No,Yes,Yes}=entropy 0.918 → $Gain = 1.0 - 0.918 = 0.082$ .

Since $Gain(Outlook) > Gain(Temp)$ , Outlook is the root.

Resulting Tree

          Outlook
         /   |    \
    Sunny Overcast Rain
      |       |      |
     No      Yes   Temp?
                   /    \
                Cool    Hot
                 |        |
                Yes       No

Sunny and Overcast branches are pure (leaf = No / Yes). The Rain branch is split further on Temp (Cool→Yes, Hot→No).

Prediction for a new instance

New instance: (Outlook = Rain, Temp = Cool). Traverse: Outlook = Rain → test Temp → Temp = Cool → PlayTennis = Yes.

Thus the model predicts the class label Yes.

Answer 4

Data Warehouse

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision-making (Bill Inmon's definition). It consolidates historical data from multiple operational sources into a central repository optimized for query, analysis, and reporting (OLAP) rather than transaction processing (OLTP).

Key Features

Subject-oriented – organized around major subjects (e.g., customer, product, sales) rather than around applications.
Integrated – data from heterogeneous sources is cleaned and made consistent (naming, units, encoding).
Time-variant – stores historical data with a time dimension, enabling trend analysis over long periods.
Non-volatile – data is loaded and read but not updated/deleted in real time; once entered it is stable.
Supports OLAP & decision support – optimized for complex analytical queries, aggregation, and summarization.

Answer 5

MOLAP vs ROLAP vs HOLAP

These are the three OLAP server architectures that differ in how multidimensional data is stored.

Feature	MOLAP	ROLAP	HOLAP
Storage	Multidimensional arrays/cubes	Relational tables (star/snowflake)	Hybrid: cubes + relational
Data volume	Limited (sparse cube blow-up)	Very large, scalable	Large
Query speed	Fastest (pre-aggregated)	Slower (SQL joins/aggregation)	Fast for summary, scalable for detail
Pre-computation	High (precomputed aggregates)	Computed on the fly	Aggregates in cube, detail in tables
Storage cost	High (dense storage)	Lower	Moderate

Summary:

MOLAP (Multidimensional OLAP): stores data in optimized multidimensional cubes → fast queries but poor scalability for huge/sparse data.
ROLAP (Relational OLAP): stores data in relational DBMS, generating SQL dynamically → highly scalable but slower queries.
HOLAP (Hybrid OLAP): combines both — summary/aggregated data in a MOLAP cube for speed, detailed data in relational tables for scalability, giving a balance of performance and capacity.

Answer 6

Slice and Dice OLAP Operations

OLAP operations let analysts explore a multidimensional data cube. Slice and Dice select sub-portions of the cube.

Slice

The slice operation selects a single value for one dimension, producing a sub-cube of lower dimensionality (a 2-D plane from a 3-D cube).

Example: From a cube with dimensions (Time, Item, Location), slice on Time = "Q1" returns a 2-D matrix of Item × Location sales for the first quarter only.

Dice

The dice operation selects a sub-cube by specifying a range/set of values on two or more dimensions, keeping the same dimensionality.

Example: dice for (Time ∈ {Q1, Q2}) AND (Item ∈ {Phone, Laptop}) AND (Location ∈ {Kathmandu, Pokhara}) returns a smaller cube restricted to those values.

Difference: Slice fixes one dimension to a single value (reduces dimensionality); Dice imposes conditions on multiple dimensions (keeps dimensionality but reduces size). Related operations include roll-up, drill-down, and pivot.

Answer 7

Limitations of the Apriori Algorithm

Multiple database scans – Apriori scans the entire transaction database once for every level $k$ to count candidate supports. For long frequent patterns this means many costly I/O passes.
Huge number of candidate itemsets – candidate generation can explode combinatorially; e.g., from $10^4$ frequent 1-itemsets it may generate ~ $10^7$ candidate 2-itemsets.
High computational and memory cost – counting support for a massive candidate set and holding them in memory is expensive.
Poor performance on dense/large datasets – with low minimum support, the number of frequent itemsets and candidates grows rapidly, degrading performance.
Repeated work – it does not reuse counting effort across passes efficiently.

These limitations motivated improved methods such as the FP-Growth algorithm, which mines frequent patterns using a compact FP-tree without candidate generation and needs only two database scans.

Answer 8

DBSCAN Algorithm

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups together points that are densely packed and marks low-density points as noise. It can find arbitrarily shaped clusters and does not require the number of clusters in advance.

It uses two parameters:

Eps ( $\varepsilon$ ) – radius of the neighbourhood around a point.
MinPts – minimum number of points required within the $\varepsilon$ -neighbourhood to form a dense region.

Point Types

Core point: a point that has at least MinPts points (including itself) within its $\varepsilon$ -neighbourhood. It lies in the interior of a cluster.
Border point: has fewer than MinPts neighbours, but falls within the $\varepsilon$ -neighbourhood of a core point. It lies on the edge of a cluster.
Noise point (outlier): neither a core point nor a border point — it does not belong to any cluster.

Algorithm Steps

1. For each unvisited point p, find its ε-neighbourhood.
2. If |neighbourhood| ≥ MinPts → p is a core point; start a new cluster
   and expand it by recursively adding all density-reachable points.
3. Otherwise mark p as noise (it may later become a border point
   if found within a core point's neighbourhood).
4. Repeat until all points are visited.

Advantages: discovers clusters of arbitrary shape, robust to noise/outliers, no need to specify cluster count. Drawback: sensitive to the choice of $\varepsilon$ and MinPts, and struggles with clusters of varying density.

Answer 9

Normalization in Data Preprocessing

Normalization is a data-transformation technique that scales attribute values into a small, specified range (commonly $[0,1]$ or $[-1,1]$ ). It prevents attributes with large ranges (e.g., income) from dominating attributes with small ranges (e.g., age) in distance-based algorithms such as k-NN, k-means, and neural networks, ensuring all features contribute fairly.

Min-Max Normalization

Min-max normalization performs a linear transformation that maps the original value $v$ of attribute $A$ to a new value $v'$ in the target range $[new\_min_A,\; new\_max_A]$ :

v' = \frac{v - min_A}{max_A - min_A}\,(new\_max_A - new\_min_A) + new\_min_A

For the common target range $[0,1]$ this simplifies to $v' = \dfrac{v - min_A}{max_A - min_A}$ .

Example: For income with $min_A = 12{,}000$ , $max_A = 98{,}000$ , mapping a value of $73{,}600$ to range $[0,1]$ :

v' = \frac{73600 - 12000}{98000 - 12000} = \frac{61600}{86000} = 0.716

Advantage: preserves the relationships among original data values. Drawback: sensitive to outliers and breaks if future data falls outside the original $[min, max]$ range.

Answer 10

Classification vs Prediction

Both are forms of supervised learning that build a model from training data, but they differ in the kind of output produced.

Aspect	Classification	Prediction (Regression)
Output	Discrete / categorical class label	Continuous numeric value
Goal	Assign data to predefined classes	Estimate an unknown numeric quantity
Example	Is a loan applicant risky or safe?	Predict the amount of a customer's purchase
Models	Decision tree, Naïve Bayes, SVM, k-NN	Linear/non-linear regression
Evaluation	Accuracy, precision, recall	Error metrics (MSE, RMSE, MAE)

Summary: Classification predicts categorical (class) labels (e.g., yes/no, high/medium/low), whereas prediction (often called regression) predicts continuous numeric values (e.g., salary, temperature). Both first build a model on labelled training data and then apply it to new, unseen data.

Answer 11

K-Medoids Clustering

K-medoids is a partitioning clustering method that divides $n$ objects into $k$ clusters, where each cluster is represented by its medoid — the most centrally located actual data object in the cluster (unlike k-means, whose centre is a computed mean). Because the representative is a real object, k-medoids is more robust to noise and outliers than k-means.

The most common implementation is PAM (Partitioning Around Medoids).

Algorithm (PAM)

1. Arbitrarily choose k objects as the initial medoids.
2. Assign each remaining object to the cluster of the nearest medoid
   (using a distance measure, e.g., Manhattan/Euclidean).
3. For each medoid m and each non-medoid o:
     - tentatively swap m with o and compute the total cost
       (sum of distances of all objects to their nearest medoid).
     - if the swap reduces total cost, keep it; else undo it.
4. Repeat steps 2–3 until no swap improves the cost (medoids stabilize).

The objective is to minimize the sum of dissimilarities between objects and their assigned medoid:

E = \sum_{i=1}^{k} \sum_{p \in C_i} \text{dist}(p, m_i)

Advantages: robust to outliers, works with arbitrary distance measures. Disadvantage: computationally costlier than k-means ( $O(k(n-k)^2)$ per iteration), so it does not scale well to very large datasets.

Answer 12

Applications of Data Mining

Data mining is widely applied across industries to extract actionable knowledge from large datasets:

Market Basket Analysis & Retail: association rule mining to discover items frequently bought together, enabling product placement, cross-selling, and promotions.
Banking & Finance: credit scoring, loan-default prediction, and fraud detection through anomaly/outlier analysis.
Healthcare & Bioinformatics: disease diagnosis and prediction, identifying effective treatments, and analyzing genetic/DNA sequences.
Telecommunications: customer churn prediction, fraud detection, and network fault analysis.
CRM & Marketing: customer segmentation, targeted advertising, and recommendation systems (e.g., e-commerce, streaming).
Education: predicting student performance and dropout risk (educational data mining).
Web & Social Media Mining: search engines, web-usage analysis, opinion/sentiment mining.
Manufacturing & Science: quality control, fault prediction, and analysis of scientific/sensor data.

In short, data mining supports decision-making wherever large volumes of data must be turned into useful patterns and predictions.

Level	BSc CSIT (TU)
Stream	Science
Subject	Data Warehousing and Data Mining (BSc CSIT, CSC410)
Year	2081 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions