Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What is data mining? Explain the knowledge discovery in databases (KDD) process with a diagram, and discuss major data mining functionalities.

Data Mining

Data mining is the process of discovering interesting, previously unknown, valid, and potentially useful patterns, correlations, and knowledge from large volumes of data stored in databases, data warehouses, or other repositories. It is also called Knowledge Discovery from Data (KDD) and combines techniques from statistics, machine learning, and database systems.

Knowledge Discovery in Databases (KDD) Process

KDD is the overall process of converting raw data into useful knowledge. Data mining is one core step within it. The major steps are:

  1. Data Cleaning – remove noise and inconsistent data.
  2. Data Integration – combine data from multiple heterogeneous sources.
  3. Data Selection – retrieve data relevant to the analysis task from the database.
  4. Data Transformation – transform/consolidate data into forms suitable for mining (e.g., normalization, aggregation).
  5. Data Mining – apply intelligent methods to extract patterns.
  6. Pattern Evaluation – identify the truly interesting patterns using interestingness measures.
  7. Knowledge Presentation – use visualization and knowledge-representation techniques to present mined knowledge to the user.

Diagram (described): A left-to-right pipeline of stacked boxes:

Databases / Data Warehouse → [Cleaning + Integration] → Data Warehouse → [Selection + Transformation] → Task-relevant Data → [Data Mining] → Patterns → [Evaluation + Presentation] → Knowledge

The first four steps are preprocessing (the bulk of the effort), data mining is the core step, and the last two deliver knowledge to the user.

Major Data Mining Functionalities

  • Characterization & Discrimination – summarize general features of a class (characterization) or compare features of contrasting classes (discrimination).
  • Association & Correlation Analysis – discover frequent patterns and association rules (e.g., market-basket analysis: bread ⇒ butter).
  • Classification – build a model to assign data to predefined categorical class labels (e.g., decision tree, Naïve Bayes).
  • Prediction (Regression) – predict missing or unavailable numeric values.
  • Cluster Analysis – group similar objects into clusters without predefined labels (unsupervised).
  • Outlier / Anomaly Analysis – detect objects that deviate from the general behaviour (e.g., fraud detection).
  • Evolution Analysis – model trends for data that changes over time.

Patterns are judged using interestingness measures: validity, novelty, usefulness, and understandability.

data-miningkdd
2long10 marks

Explain association rule mining. Apply the Apriori algorithm on a sample transaction dataset to generate strong association rules using given minimum support and confidence.

Association Rule Mining

Association rule mining discovers interesting relationships (co-occurrences) among items in large transactional datasets. A rule has the form XYX \Rightarrow Y where X,YX, Y are itemsets and XY=X \cap Y = \emptyset.

Two key measures:

  • Support: supp(XY)=P(XY)\text{supp}(X \Rightarrow Y) = P(X \cup Y) — fraction of transactions containing both XX and YY.
  • Confidence: conf(XY)=supp(XY)supp(X)=P(YX)\text{conf}(X \Rightarrow Y) = \dfrac{\text{supp}(X \cup Y)}{\text{supp}(X)} = P(Y\mid X).

A strong rule satisfies both minimum support and minimum confidence.

Apriori Algorithm

Apriori uses the downward-closure (anti-monotone) property: every subset of a frequent itemset must also be frequent. It works level-wise, generating candidate kk-itemsets (CkC_k) from frequent (k1)(k-1)-itemsets and pruning those with infrequent subsets.

Worked Example

Consider 4 transactions with min_support = 50% (i.e., 2 transactions) and min_confidence = 70%:

TIDItems
T1A, B, C
T2A, C
T3A, D
T4B, C

Step 1 – C1C_1L1L_1 (count each item):

ItemCountFrequent?
A3
B2
C3
D1✗ (pruned)

L1={A,B,C}L_1 = \{A, B, C\}

Step 2 – C2C_2L2L_2:

ItemsetCountFrequent?
{A,B}1
{A,C}2
{B,C}2

L2={{A,C},{B,C}}L_2 = \{ \{A,C\}, \{B,C\} \}

Step 3 – C3C_3: Candidate {A,B,C} requires subset {A,B} which is not frequent → pruned. No frequent 3-itemsets, so we stop.

Generating Strong Rules from frequent itemsets

From {A,C} (support = 2):

  • ACA \Rightarrow C: conf =2/3=66.7%= 2/3 = 66.7\% → rejected
  • CAC \Rightarrow A: conf =2/3=66.7%= 2/3 = 66.7\% → rejected

From {B,C} (support = 2):

  • BCB \Rightarrow C: conf =2/2=100%= 2/2 = 100\%strong rule ✓
  • CBC \Rightarrow B: conf =2/3=66.7%= 2/3 = 66.7\% → rejected

Result: The only strong association rule is BCB \Rightarrow C (support 50%, confidence 100%), meaning every customer who buys B also buys C.

association-rulesapriori
3long10 marks

Explain classification using a decision tree. Construct a decision tree using ID3 for the given dataset and predict the class label of a new instance.

Classification using a Decision Tree

Classification builds a model that assigns data objects to predefined categorical class labels. A decision tree is a flowchart-like tree where each internal node tests an attribute, each branch is an outcome of the test, and each leaf node holds a class label. To classify a new instance, we traverse from the root to a leaf following the attribute tests.

ID3 Algorithm

ID3 builds the tree top-down, selecting at each node the attribute with the highest information gain (greatest reduction in entropy).

Entropy(S)=ipilog2piGain(S,A)=Entropy(S)vASvSEntropy(Sv)Entropy(S) = -\sum_{i} p_i \log_2 p_i \qquad Gain(S,A) = Entropy(S) - \sum_{v \in A} \frac{|S_v|}{|S|}\,Entropy(S_v)

Worked Example (PlayTennis-style dataset)

OutlookTempPlayTennis
SunnyHotNo
SunnyCoolNo
OvercastHotYes
OvercastCoolYes
RainCoolYes
RainHotNo

Class distribution: 3 Yes, 3 No → Entropy(S)=36log23636log236=1.0Entropy(S) = -\tfrac{3}{6}\log_2\tfrac{3}{6} - \tfrac{3}{6}\log_2\tfrac{3}{6} = 1.0.

Gain(Outlook):

  • Sunny (2 rows): both No → entropy 0
  • Overcast (2 rows): both Yes → entropy 0
  • Rain (2 rows): 1 Yes, 1 No → entropy 1.0
Gain(Outlook)=1.0(26(0)+26(0)+26(1.0))=1.00.333=0.667Gain(Outlook) = 1.0 - \left(\tfrac{2}{6}(0) + \tfrac{2}{6}(0) + \tfrac{2}{6}(1.0)\right) = 1.0 - 0.333 = 0.667

Gain(Temp): Hot {No,Yes,No}=entropy 0.918; Cool {No,Yes,Yes}=entropy 0.918 → Gain=1.00.918=0.082Gain = 1.0 - 0.918 = 0.082.

Since Gain(Outlook)>Gain(Temp)Gain(Outlook) > Gain(Temp), Outlook is the root.

Resulting Tree

          Outlook
         /   |    \
    Sunny Overcast Rain
      |       |      |
     No      Yes   Temp?
                   /    \
                Cool    Hot
                 |        |
                Yes       No

Sunny and Overcast branches are pure (leaf = No / Yes). The Rain branch is split further on Temp (Cool→Yes, Hot→No).

Prediction for a new instance

New instance: (Outlook = Rain, Temp = Cool). Traverse: Outlook = Rain → test Temp → Temp = Cool → PlayTennis = Yes.

Thus the model predicts the class label Yes.

classificationdecision-treeid3
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Define data warehouse and list its key features.

Data Warehouse

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision-making (Bill Inmon's definition). It consolidates historical data from multiple operational sources into a central repository optimized for query, analysis, and reporting (OLAP) rather than transaction processing (OLTP).

Key Features

  • Subject-oriented – organized around major subjects (e.g., customer, product, sales) rather than around applications.
  • Integrated – data from heterogeneous sources is cleaned and made consistent (naming, units, encoding).
  • Time-variant – stores historical data with a time dimension, enabling trend analysis over long periods.
  • Non-volatile – data is loaded and read but not updated/deleted in real time; once entered it is stable.
  • Supports OLAP & decision support – optimized for complex analytical queries, aggregation, and summarization.
data-warehouse
5short5 marks

Compare MOLAP, ROLAP, and HOLAP.

MOLAP vs ROLAP vs HOLAP

These are the three OLAP server architectures that differ in how multidimensional data is stored.

FeatureMOLAPROLAPHOLAP
StorageMultidimensional arrays/cubesRelational tables (star/snowflake)Hybrid: cubes + relational
Data volumeLimited (sparse cube blow-up)Very large, scalableLarge
Query speedFastest (pre-aggregated)Slower (SQL joins/aggregation)Fast for summary, scalable for detail
Pre-computationHigh (precomputed aggregates)Computed on the flyAggregates in cube, detail in tables
Storage costHigh (dense storage)LowerModerate

Summary:

  • MOLAP (Multidimensional OLAP): stores data in optimized multidimensional cubes → fast queries but poor scalability for huge/sparse data.
  • ROLAP (Relational OLAP): stores data in relational DBMS, generating SQL dynamically → highly scalable but slower queries.
  • HOLAP (Hybrid OLAP): combines both — summary/aggregated data in a MOLAP cube for speed, detailed data in relational tables for scalability, giving a balance of performance and capacity.
olap
6short5 marks

Explain the slice and dice OLAP operations.

Slice and Dice OLAP Operations

OLAP operations let analysts explore a multidimensional data cube. Slice and Dice select sub-portions of the cube.

Slice

The slice operation selects a single value for one dimension, producing a sub-cube of lower dimensionality (a 2-D plane from a 3-D cube).

Example: From a cube with dimensions (Time, Item, Location), slice on Time = "Q1" returns a 2-D matrix of Item × Location sales for the first quarter only.

Dice

The dice operation selects a sub-cube by specifying a range/set of values on two or more dimensions, keeping the same dimensionality.

Example: dice for (Time ∈ {Q1, Q2}) AND (Item ∈ {Phone, Laptop}) AND (Location ∈ {Kathmandu, Pokhara}) returns a smaller cube restricted to those values.

Difference: Slice fixes one dimension to a single value (reduces dimensionality); Dice imposes conditions on multiple dimensions (keeps dimensionality but reduces size). Related operations include roll-up, drill-down, and pivot.

olap
7short5 marks

What are the limitations of the Apriori algorithm?

Limitations of the Apriori Algorithm

  1. Multiple database scans – Apriori scans the entire transaction database once for every level kk to count candidate supports. For long frequent patterns this means many costly I/O passes.
  2. Huge number of candidate itemsets – candidate generation can explode combinatorially; e.g., from 10410^4 frequent 1-itemsets it may generate ~10710^7 candidate 2-itemsets.
  3. High computational and memory cost – counting support for a massive candidate set and holding them in memory is expensive.
  4. Poor performance on dense/large datasets – with low minimum support, the number of frequent itemsets and candidates grows rapidly, degrading performance.
  5. Repeated work – it does not reuse counting effort across passes efficiently.

These limitations motivated improved methods such as the FP-Growth algorithm, which mines frequent patterns using a compact FP-tree without candidate generation and needs only two database scans.

apriori
8short5 marks

Explain the DBSCAN algorithm and the concepts of core, border, and noise points.

DBSCAN Algorithm

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups together points that are densely packed and marks low-density points as noise. It can find arbitrarily shaped clusters and does not require the number of clusters in advance.

It uses two parameters:

  • Eps (ε\varepsilon) – radius of the neighbourhood around a point.
  • MinPts – minimum number of points required within the ε\varepsilon-neighbourhood to form a dense region.

Point Types

  • Core point: a point that has at least MinPts points (including itself) within its ε\varepsilon-neighbourhood. It lies in the interior of a cluster.
  • Border point: has fewer than MinPts neighbours, but falls within the ε\varepsilon-neighbourhood of a core point. It lies on the edge of a cluster.
  • Noise point (outlier): neither a core point nor a border point — it does not belong to any cluster.

Algorithm Steps

1. For each unvisited point p, find its ε-neighbourhood.
2. If |neighbourhood| ≥ MinPts → p is a core point; start a new cluster
   and expand it by recursively adding all density-reachable points.
3. Otherwise mark p as noise (it may later become a border point
   if found within a core point's neighbourhood).
4. Repeat until all points are visited.

Advantages: discovers clusters of arbitrary shape, robust to noise/outliers, no need to specify cluster count. Drawback: sensitive to the choice of ε\varepsilon and MinPts, and struggles with clusters of varying density.

clusteringdbscan
9short5 marks

What is normalization in data preprocessing? Explain min-max normalization.

Normalization in Data Preprocessing

Normalization is a data-transformation technique that scales attribute values into a small, specified range (commonly [0,1][0,1] or [1,1][-1,1]). It prevents attributes with large ranges (e.g., income) from dominating attributes with small ranges (e.g., age) in distance-based algorithms such as k-NN, k-means, and neural networks, ensuring all features contribute fairly.

Min-Max Normalization

Min-max normalization performs a linear transformation that maps the original value vv of attribute AA to a new value vv' in the target range [new_minA,  new_maxA][new\_min_A,\; new\_max_A]:

v=vminAmaxAminA(new_maxAnew_minA)+new_minAv' = \frac{v - min_A}{max_A - min_A}\,(new\_max_A - new\_min_A) + new\_min_A

For the common target range [0,1][0,1] this simplifies to v=vminAmaxAminAv' = \dfrac{v - min_A}{max_A - min_A}.

Example: For income with minA=12,000min_A = 12{,}000, maxA=98,000max_A = 98{,}000, mapping a value of 73,60073{,}600 to range [0,1][0,1]:

v=73600120009800012000=6160086000=0.716v' = \frac{73600 - 12000}{98000 - 12000} = \frac{61600}{86000} = 0.716

Advantage: preserves the relationships among original data values. Drawback: sensitive to outliers and breaks if future data falls outside the original [min,max][min, max] range.

preprocessing
10short5 marks

Differentiate between classification and prediction.

Classification vs Prediction

Both are forms of supervised learning that build a model from training data, but they differ in the kind of output produced.

AspectClassificationPrediction (Regression)
OutputDiscrete / categorical class labelContinuous numeric value
GoalAssign data to predefined classesEstimate an unknown numeric quantity
ExampleIs a loan applicant risky or safe?Predict the amount of a customer's purchase
ModelsDecision tree, Naïve Bayes, SVM, k-NNLinear/non-linear regression
EvaluationAccuracy, precision, recallError metrics (MSE, RMSE, MAE)

Summary: Classification predicts categorical (class) labels (e.g., yes/no, high/medium/low), whereas prediction (often called regression) predicts continuous numeric values (e.g., salary, temperature). Both first build a model on labelled training data and then apply it to new, unseen data.

classification
11short5 marks

Explain the K-medoids clustering method.

K-Medoids Clustering

K-medoids is a partitioning clustering method that divides nn objects into kk clusters, where each cluster is represented by its medoid — the most centrally located actual data object in the cluster (unlike k-means, whose centre is a computed mean). Because the representative is a real object, k-medoids is more robust to noise and outliers than k-means.

The most common implementation is PAM (Partitioning Around Medoids).

Algorithm (PAM)

1. Arbitrarily choose k objects as the initial medoids.
2. Assign each remaining object to the cluster of the nearest medoid
   (using a distance measure, e.g., Manhattan/Euclidean).
3. For each medoid m and each non-medoid o:
     - tentatively swap m with o and compute the total cost
       (sum of distances of all objects to their nearest medoid).
     - if the swap reduces total cost, keep it; else undo it.
4. Repeat steps 2–3 until no swap improves the cost (medoids stabilize).

The objective is to minimize the sum of dissimilarities between objects and their assigned medoid:

E=i=1kpCidist(p,mi)E = \sum_{i=1}^{k} \sum_{p \in C_i} \text{dist}(p, m_i)

Advantages: robust to outliers, works with arbitrary distance measures. Disadvantage: computationally costlier than k-means (O(k(nk)2)O(k(n-k)^2) per iteration), so it does not scale well to very large datasets.

clustering
12short5 marks

Write short notes on the applications of data mining.

Applications of Data Mining

Data mining is widely applied across industries to extract actionable knowledge from large datasets:

  • Market Basket Analysis & Retail: association rule mining to discover items frequently bought together, enabling product placement, cross-selling, and promotions.
  • Banking & Finance: credit scoring, loan-default prediction, and fraud detection through anomaly/outlier analysis.
  • Healthcare & Bioinformatics: disease diagnosis and prediction, identifying effective treatments, and analyzing genetic/DNA sequences.
  • Telecommunications: customer churn prediction, fraud detection, and network fault analysis.
  • CRM & Marketing: customer segmentation, targeted advertising, and recommendation systems (e.g., e-commerce, streaming).
  • Education: predicting student performance and dropout risk (educational data mining).
  • Web & Social Media Mining: search engines, web-usage analysis, opinion/sentiment mining.
  • Manufacturing & Science: quality control, fault prediction, and analysis of scientific/sensor data.

In short, data mining supports decision-making wherever large volumes of data must be turned into useful patterns and predictions.

data-mining

Frequently asked questions

Where can I find the BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) question paper 2081?
The full BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2081 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Data Warehousing and Data Mining (BSc CSIT, CSC410) 2081 paper come with solutions?
Yes. Every question on this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2081 paper?
The BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2081 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper free?
Yes — reading and attempting this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper on Kekkei is completely free.