BSc CSIT (TU) Science Data Warehousing and Data Mining (BSc CSIT, CSC410) Question Paper 2078 Nepal

Q: Where can I find the BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) question paper 2078?

The full BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2078 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Data Warehousing and Data Mining (BSc CSIT, CSC410) 2078 paper come with solutions?

Yes. Every question on this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2078 paper?

The BSc CSIT (TU) Data Warehousing and Data Mining (BSc CSIT, CSC410) 2078 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper free?

Yes — reading and attempting this Data Warehousing and Data Mining (BSc CSIT, CSC410) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What are the advantages and disadvantages of MOLAP over ROLAP? Explain the multidimensional data model with the help of a data cube.

olapdata-model

Answer 1

MOLAP vs ROLAP

MOLAP (Multidimensional OLAP) stores aggregated data in optimized multidimensional array structures (data cubes). ROLAP (Relational OLAP) stores data in relational tables (star/snowflake schema) and generates SQL on the fly.

Advantages of MOLAP over ROLAP

Faster query performance — pre-computed and pre-aggregated cube cells give near-instant slice/dice/drill-down response.
Compact storage for dense cubes via array indexing (no need to store sparse keys).
Rich analytical functions (time-series, complex calculations) are natively supported.
Automatic aggregation is handled by the cube engine.

Disadvantages of MOLAP over ROLAP

Poor scalability — cube size explodes with many dimensions (the curse of dimensionality); cannot easily handle very large data volumes.
Sparse cube wastage — empty cells still consume structure overhead.
Limited detail — usually holds summarized data, so drilling to fine-grained records is hard.
Long cube build/load time and the cube must be rebuilt when source data changes.
ROLAP, in contrast, scales to terabytes, leverages mature RDBMS technology, and handles ad-hoc queries on detailed data.

Multidimensional Data Model and the Data Cube

The multidimensional data model views data as a set of measures (numeric facts such as sales, units_sold) organized by several dimensions (descriptive perspectives such as Time, Item, Location).

Dimension: a perspective for analysis, e.g. Time (day, month, quarter, year), Item (name, type, brand), Location (city, region, country).
Fact / Measure: the quantity analyzed, e.g. dollars sold.
Data Cube: an n-dimensional generalization of a 2-D spreadsheet that lets us model and view data along multiple dimensions.

Example data cube (3-D)

A cube with dimensions Time, Item, Location where each cell holds the measure sales amount:

Time \ Item (in City = Kathmandu)	TV	Phone	PC
Q1	605	825	14
Q2	680	952	31
Q3	812	1023	30
Q4	927	1038	38

Stacking such tables for several cities gives the third dimension (Location), producing a cube. Higher-dimensional data is modeled as a series of cuboids; the lattice of cuboids forms the cube.

OLAP operations on the cube

Roll-up: aggregate to a higher level (city → country).
Drill-down: go to finer detail (quarter → month).
Slice: select one dimension value (Time = Q1) giving a sub-cube.
Dice: select a range on several dimensions.
Pivot (rotate): reorient the cube for presentation.

This multidimensional structure is what enables fast, intuitive analytical querying.

Answer 2

The Apriori Algorithm

Apriori is a classic algorithm for mining frequent itemsets and generating association rules from transactional data. It is named after its use of prior knowledge of frequent-itemset properties.

Key principle (Apriori property)

Every non-empty subset of a frequent itemset must also be frequent (anti-monotonicity). Equivalently, if an itemset is infrequent, all its supersets are infrequent and can be pruned.

Algorithm (level-wise search)

Input: transaction database D, minimum support min_sup
1. L1 = {frequent 1-itemsets}                      // scan D, count supports
2. for (k = 2; L(k-1) != empty; k++) {
3.     Ck = apriori_gen(L(k-1))                      // candidate generation
4.        - JOIN: combine pairs of (k-1)-itemsets sharing first k-2 items
5.        - PRUNE: drop any candidate having an infrequent (k-1)-subset
6.     for each transaction t in D:                  // scan database
7.        increment count of candidates in Ck contained in t
8.     Lk = { c in Ck | support(c) >= min_sup }
9. }
10. return L = union of all Lk

After finding frequent itemsets, strong association rules ( $A \Rightarrow B$ ) are generated for those with $\text{confidence} = \frac{\text{support}(A \cup B)}{\text{support}(A)} \ge min\_conf$ .

Limitations of Apriori

Multiple database scans — it scans D once for each level $k$ , which is very costly for large databases.
Huge number of candidate itemsets — e.g. $10^4$ frequent 1-itemsets can generate $\sim 10^7$ candidate 2-itemsets.
High memory and computation to store and test candidates against every transaction.
Performance degrades sharply with low support thresholds and long patterns.

FP-Growth and How It Overcomes These Limitations

FP-Growth (Frequent-Pattern Growth) mines frequent itemsets without candidate generation using a compact tree structure.

Steps

Scan 1 — count item supports, discard infrequent items, sort frequent items by descending support.
Scan 2 — build an FP-tree: insert each transaction's sorted frequent items as a path, sharing common prefixes; a header table links nodes of the same item.
Mining — for each item (bottom-up), build its conditional pattern base and conditional FP-tree, recursively growing frequent patterns.

How it overcomes Apriori's limitations

Only two database scans are needed (versus many) — solves the repeated-scan problem.
No candidate generation — avoids the costly candidate explosion of Apriori.
Compact FP-tree stores the database in main memory by sharing prefixes, reducing space.
A divide-and-conquer strategy on conditional trees makes mining much faster, especially for dense data and long patterns.

Thus FP-Growth is typically an order of magnitude faster than Apriori on large datasets.

Answer 3

Building a Decision Tree with ID3 (Information Gain)

(A standard dataset is assumed since the table was not supplied; the canonical "Play Tennis" dataset is used to demonstrate the full method and calculations.)

Dataset (14 examples)

Outlook	Temp	Humidity	Wind	PlayTennis
Sunny	Hot	High	Weak	No
Sunny	Hot	High	Strong	No
Overcast	Hot	High	Weak	Yes
Rain	Mild	High	Weak	Yes
Rain	Cool	Normal	Weak	Yes
Rain	Cool	Normal	Strong	No
Overcast	Cool	Normal	Strong	Yes
Sunny	Mild	High	Weak	No
Sunny	Cool	Normal	Weak	Yes
Rain	Mild	Normal	Weak	Yes
Sunny	Mild	Normal	Strong	Yes
Overcast	Mild	High	Strong	Yes
Overcast	Hot	Normal	Weak	Yes
Rain	Mild	High	Strong	No

Totals: 9 Yes, 5 No.

Formulas

\text{Entropy}(S) = -\sum_i p_i \log_2 p_i, \qquad \text{Gain}(S,A) = \text{Entropy}(S) - \sum_{v \in A} \frac{|S_v|}{|S|}\,\text{Entropy}(S_v)

Step 1 — Entropy of the whole set

\text{Entropy}(S) = -\tfrac{9}{14}\log_2\tfrac{9}{14} - \tfrac{5}{14}\log_2\tfrac{5}{14} = 0.940

Step 2 — Gain of each attribute at the root

Outlook: Sunny(2Y,3N), Overcast(4Y,0N), Rain(3Y,2N)

E(Sunny)=0.971, E(Overcast)=0, E(Rain)=0.971

\text{Gain(Outlook)} = 0.940 - \big(\tfrac{5}{14}(0.971)+\tfrac{4}{14}(0)+\tfrac{5}{14}(0.971)\big) = 0.940 - 0.693 = \mathbf{0.247}

Humidity: High(3Y,4N→0.985), Normal(6Y,1N→0.592)

\text{Gain(Humidity)} = 0.940 - \big(\tfrac{7}{14}(0.985)+\tfrac{7}{14}(0.592)\big) = \mathbf{0.151}

Wind: Weak(6Y,2N→0.811), Strong(3Y,3N→1.0)

\text{Gain(Wind)} = 0.940 - \big(\tfrac{8}{14}(0.811)+\tfrac{6}{14}(1.0)\big) = \mathbf{0.048}

Temperature: Gain ≈ 0.029 (computed similarly).

Outlook has the highest gain (0.247) → it becomes the root.

Step 3 — Split on Outlook

Overcast → all 4 are Yes → leaf Yes (pure).
Sunny and Rain are impure → recurse.

Sunny branch (2Y, 3N, E=0.971)

Gain(Humidity)= 0.971 − (3/5·0 + 2/5·0) = 0.971 (High→all No, Normal→all Yes)
Gain(Temp)=0.571, Gain(Wind)=0.020 → Split on Humidity: High → No, Normal → Yes (both pure leaves).

Rain branch (3Y, 2N, E=0.971)

Gain(Wind)= 0.971 − (3/5·0 + 2/5·0) = 0.971 (Weak→all Yes, Strong→all No) → Split on Wind: Weak → Yes, Strong → No.

Final Decision Tree

Outlook?
 ├─ Sunny    → Humidity?
 │              ├─ High   → No
 │              └─ Normal → Yes
 ├─ Overcast → Yes
 └─ Rain     → Wind?
                ├─ Weak   → Yes
                └─ Strong → No

ID3 stops because every leaf is now pure (all examples same class).

Answer 4

Beam Search

Beam search is a heuristic search algorithm that explores a graph/tree by expanding only the most promising nodes at each level, keeping memory bounded.

At each level it generates all successors of the current candidates, scores them with a heuristic, and retains only the best $w$ nodes (where $w$ is the beam width). The rest are discarded.
It is a bounded-width best-first search: a compromise between greedy search ( $w=1$ ) and full breadth-first/best-first search ( $w=\infty$ ).

Characteristics

Width $w$ controls the trade-off: larger $w$ → better solutions but more memory/time; smaller $w$ → faster but may miss the optimum.
Not complete and not optimal — pruning may discard the path to the goal.
Memory-efficient: only $O(w)$ nodes per level are stored.

Uses in data mining / AI

Feature subset selection, rule induction (e.g. CN2), sequence decoding in NLP/machine translation, and large search spaces where exhaustive search is infeasible.

Answer 5

Signed Networks

A signed network is a graph in which each edge carries a sign — positive (+) or negative (−) — representing the nature of the relationship between two nodes, not just its existence.

Positive edge: friendship, trust, agreement, similarity, "like."
Negative edge: enmity, distrust, disagreement, "dislike," antagonism.

Examples: social platforms (friend/foe, trust/distrust on Epinions or Slashdot), product reviews, and protein interaction networks.

Structural balance theory

Signed networks are analyzed using balance theory, governed by rules on triads:

"A friend of my friend is my friend" (+,+,+) — balanced.
"An enemy of my enemy is my friend" (−,−,+) — balanced.
Triads with an odd number of negative edges (e.g. +,+,−) are unbalanced and tend to be unstable.

Relevance to data mining

Sign / link prediction: predict whether an unknown edge is + or −.
Community detection: clusters have dense positive ties internally and negative ties between groups.
Trust propagation and recommendation: spreading trust/distrust to rank or recommend.
Anomaly detection: unbalanced or hostile substructures.

Thus signed networks let data mining capture polarity of relationships, enabling richer analysis than ordinary (unsigned) graphs.

Answer 6

Concept Hierarchy

A concept hierarchy defines a sequence of mappings from a set of low-level (detailed) concepts to higher-level (more general) concepts. It organizes the values of an attribute into multiple levels of abstraction and is used in data warehousing and OLAP for roll-up/drill-down and for attribute-oriented generalization.

Example (Location): street → city → state → country. Example (Time): day → month → quarter → year.

How concept hierarchies are generated

Schema-level (specified by users/experts) — a total or partial ordering of attributes is given explicitly, e.g. street < city < state < country.
Set-grouping hierarchy — explicitly defining groupings of values, e.g. price {0–100} = cheap, {100–500} = moderate, {>500} = expensive.
Automatic generation for numeric data (discretization) using:
- Binning (equal-width / equal-frequency),
- Histogram analysis,
- Clustering,
- Entropy-based discretization,
- Natural partitioning (3-4-5 rule).
Automatic generation for categorical data based on the number of distinct values — the attribute with the most distinct values is placed at the lowest level and the one with the fewest at the highest level (e.g. country has fewer distinct values than street, so country is higher).

Concept hierarchies thus allow data to be viewed and mined at different levels of granularity.

Answer 7

Decision Tree vs Naive Bayes Classifier

Aspect	Decision Tree	Naive Bayes
Type / basis	Discriminative; recursive partitioning using information gain / Gini	Generative; probabilistic, based on Bayes' theorem
Core assumption	None about feature independence	Assumes all features are conditionally independent given the class
Model output	A tree of if-then rules; interpretable	Posterior probabilities $P(C\mid X) \propto P(C)\prod_i P(x_i\mid C)$
Handling of features	Handles numeric & categorical; captures feature interactions	Works well with categorical; needs distribution (e.g. Gaussian) for numeric; ignores interactions
Training speed	Slower (must evaluate splits)	Very fast (just count probabilities)
Data requirement	Needs reasonably large data; can overfit	Works well even with small data
Overfitting	Prone to overfitting (needs pruning)	Less prone; high bias
Interpretability	Highly interpretable (visual rules)	Less intuitive (probabilities)
Robustness	Sensitive to noisy data and small changes	Robust to noise and irrelevant features; sensitive when independence is badly violated

Summary

Use a decision tree when interpretability and feature interactions matter; use Naive Bayes when features are roughly independent, data is small, training must be fast (e.g. text classification / spam filtering).

Answer 8

DBSCAN Clustering Algorithm

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups together points that are densely packed, marking sparse points as noise/outliers. It can find clusters of arbitrary shape and does not require the number of clusters to be specified in advance.

Key parameters

Eps ( $\varepsilon$ ): radius of the neighborhood around a point.
MinPts: minimum number of points required within the Eps-neighborhood to form a dense region.

Point types

Core point: has at least MinPts points (including itself) within its $\varepsilon$ -neighborhood.
Border point: within $\varepsilon$ of a core point but has fewer than MinPts neighbors.
Noise point: neither core nor border.

Key concepts

Directly density-reachable, density-reachable, and density-connected define how points join a cluster from core points.

Algorithm

for each unvisited point p:
    mark p visited
    N = neighbors of p within Eps
    if |N| < MinPts:
        mark p as NOISE          // may later become border
    else:
        create new cluster C; add p
        expand cluster: for each point q in N (and their neighbors),
            if q is unvisited: mark visited, get its neighbors,
               if it is a core point, add its neighbors to N
            if q not in any cluster: add q to C

Advantages

Finds arbitrarily shaped clusters; identifies outliers/noise; no need to predefine k.

Disadvantages

Sensitive to choice of Eps and MinPts; struggles with clusters of varying density and high-dimensional data.

Answer 9

Data Integration

Data integration is the process of combining data from multiple heterogeneous sources (databases, data cubes, flat files) into a coherent, unified store such as a data warehouse. It is a key step in data preprocessing.

Major issues involved

Schema integration & entity identification problem — matching equivalent real-world entities from different sources, e.g. customer_id in one DB vs cust_number in another. Metadata helps resolve such conflicts.
Redundancy — an attribute may be derivable from another, or duplicated across sources. Redundancy can be detected by correlation analysis (Pearson's $r$ for numeric, chi-square $\chi^2$ test for categorical) and covariance.
Tuple duplication — the same record appearing more than once must be detected and removed.
Data value conflicts — the same attribute may have different values/units/representations across sources (e.g. price in NPR vs USD, metric vs imperial, different encodings or scales).
Inconsistent metadata / naming — different names, formats, or meanings (semantic heterogeneity) for the same concept.

Resolving these issues produces clean, non-redundant, consistent integrated data, improving the speed and quality of subsequent mining.

Answer 10

Lift and Conviction in Association Rules

For a rule $A \Rightarrow B$ :

Lift

Lift measures how much more likely $B$ is given $A$ , compared to $B$ 's overall likelihood — i.e. the correlation between $A$ and $B$ .

\text{Lift}(A \Rightarrow B) = \frac{P(A \cup B)}{P(A)\,P(B)} = \frac{\text{Confidence}(A \Rightarrow B)}{P(B)}

Interpretation:

Lift = 1 → $A$ and $B$ are independent (no relationship).
Lift > 1 → positive correlation (presence of $A$ increases likelihood of $B$ ); the rule is interesting.
Lift < 1 → negative correlation ( $A$ discourages $B$ ).

Conviction

Conviction measures how much the rule would be wrong if $A$ and $B$ were independent — i.e. the ratio of expected frequency of $A$ without $B$ to the observed frequency of incorrect predictions.

\text{Conviction}(A \Rightarrow B) = \frac{1 - P(B)}{1 - \text{Confidence}(A \Rightarrow B)} = \frac{P(A)\,P(\lnot B)}{P(A \cup \lnot B)}

Interpretation:

Conviction = 1 → independence.
Conviction > 1 → a meaningful (correct, directional) rule; higher value means stronger implication.
Conviction = ∞ when confidence = 1 (the rule never fails). Unlike lift, conviction is directional (it distinguishes $A\Rightarrow B$ from $B\Rightarrow A$ ).

Answer 11

Data Reduction Techniques

Data reduction obtains a reduced representation of a dataset that is much smaller in volume yet produces (almost) the same analytical/mining results. It reduces storage and speeds up mining without significant loss of quality.

Main techniques

Dimensionality reduction — reduce the number of attributes/variables:
- Wavelet transforms and Principal Component Analysis (PCA) project data onto fewer dimensions.
- Attribute subset selection (e.g. stepwise forward selection, backward elimination, decision-tree induction) removes irrelevant/redundant attributes.
Numerosity reduction — replace data by smaller representations:
- Parametric: fit a model (e.g. regression, log-linear models) and store only parameters.
- Non-parametric: histograms, clustering, sampling (e.g. SRSWOR, SRSWR, stratified sampling), and data cube aggregation.
Data compression — apply encoding/transformations to compress the dataset (lossless or lossy, e.g. wavelet, PCA outputs).
Data cube aggregation — aggregate detailed data to a higher summary level (e.g. daily sales → yearly sales).
Discretization & concept hierarchy generation — replace raw values by interval/conceptual labels (e.g. age → {young, middle-aged, senior}), reducing distinct values.

These techniques cut data size while preserving the integrity needed for accurate mining.

Answer 12

Short Note: Text Mining

Text mining (text data mining / knowledge discovery from text) is the process of extracting useful, non-trivial patterns and knowledge from large collections of unstructured or semi-structured text (documents, emails, web pages, social media, reviews). Since most text is unstructured, it must first be converted into a structured form before mining.

Typical process / steps

Text preprocessing — tokenization, stop-word removal, stemming/lemmatization, case normalization.
Feature extraction / representation — build a vector space / bag-of-words or TF-IDF model; term-document matrix; n-grams or embeddings.
Dimensionality reduction — e.g. LSI/LSA, feature selection.
Mining / analysis — apply data-mining techniques: classification, clustering, association, summarization.

Common applications

Document classification & categorization (e.g. news, email folders).
Spam filtering and sentiment analysis of reviews/social media.
Information retrieval and extraction (named-entity recognition).
Topic detection / document clustering and text summarization.
Search engines and question answering.

Challenges

High dimensionality, ambiguity of natural language (synonymy, polysemy), noise, and multilingual content.

In short, text mining bridges information retrieval, NLP, and data mining to turn raw text into actionable knowledge.

Level	BSc CSIT (TU)
Stream	Science
Subject	Data Warehousing and Data Mining (BSc CSIT, CSC410)
Year	2078 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

Section A: Long Answer Questions

MOLAP vs ROLAP

Advantages of MOLAP over ROLAP

Disadvantages of MOLAP over ROLAP

Multidimensional Data Model and the Data Cube

Example data cube (3-D)

OLAP operations on the cube

The Apriori Algorithm

Key principle (Apriori property)

Algorithm (level-wise search)

Limitations of Apriori

FP-Growth and How It Overcomes These Limitations

Steps

How it overcomes Apriori's limitations

Building a Decision Tree with ID3 (Information Gain)

Dataset (14 examples)

Formulas

Step 1 — Entropy of the whole set

Step 2 — Gain of each attribute at the root

Step 3 — Split on Outlook

Sunny branch (2Y, 3N, E=0.971)

Rain branch (3Y, 2N, E=0.971)

Final Decision Tree

Section B: Short Answer Questions

Beam Search

Characteristics

Uses in data mining / AI

Signed Networks

Structural balance theory

Relevance to data mining

Concept Hierarchy

How concept hierarchies are generated

Decision Tree vs Naive Bayes Classifier

Summary

DBSCAN Clustering Algorithm

Key parameters

Point types

Key concepts

Algorithm

Advantages

Disadvantages

Data Integration

Major issues involved

Lift and Conviction in Association Rules

Lift

Conviction

Data Reduction Techniques

Main techniques

Short Note: Text Mining

Typical process / steps

Common applications

Challenges

Frequently asked questions