Probability Engine · MDS 501

Fundamentals of Data Science: the questions likely to come

107 analyzed questions from 13 past papers (4 board exams, 2078-2082), grouped by syllabus unit — each with its probability, how often it's been asked, and where to study the answer.

Papers analyzed

incl. 4 board exams · 2078-2082

107

Analyzed questions

across 6 syllabus units

23%

Board marks from repeats

questions asked before

Units = 80% of marks

study these first

Model answers for this subject are being written. Every question links to its original paper so you can study from the source meanwhile.

Which exams to include?Showing: Board only (default)

Pick a unit

U2 · Q1/9 · 20826 marks

Data Munging

What is data munging? Explain what are the different issues in real world data along with the steps needed for handling those steps detailly.

49%

Possible to appearAppeared in 1 of the last 1 board papers

Seen in

How well do you know this?rating moves you on

MODEL ANSWERU2 · 6 marks

Data Munging

Data munging (also called data wrangling) is the process of transforming and mapping raw, messy data into a clean, structured, usable format suitable for analysis and modeling. It typically consumes 60–80% of a data scientist's time and includes detecting, correcting, reshaping, and enriching data.

Common issues in real-world data and how to handle them

1. Missing values

Real data often has gaps (empty fields, NaN). Handling: Delete rows/columns if missingness is small; otherwise impute — mean/median for numeric, mode for categorical, or model-based (KNN, regression) imputation. Add a missing-indicator flag if missingness is informative.

2. Outliers and noise

Extreme or erroneous values distort statistics and models. Handling: Detect via box plots, IQR rule (outside $Q1-1.5\,\text{IQR}$ to $Q3+1.5\,\text{IQR}$ ), or z-scores ( $|z|>3$ ). Treat by removal, capping/winsorizing, or transformation (log).

3. Inconsistent / non-standard formats

Dates, units, capitalization, and encodings differ across sources (e.g., "NP" vs "Nepal", "2082-01-01" vs "01/01/2082"). Handling: Standardize formats, normalize units, trim whitespace, fix casing, and apply consistent categorical labels.

4. Duplicate records

Same entity appears multiple times. Handling: Identify with key/fuzzy matching and deduplicate, keeping the most complete or most recent record.

5. Structural problems

Data may be in the wrong shape (wide vs long), nested, or split across files. Handling: Reshape (pivot/melt), join/merge tables on keys, parse nested fields.

6. Incorrect data types

Numbers stored as strings, dates as text. Handling: Cast columns to correct types (numeric, datetime, category).

7. Scaling differences

Features on very different scales bias distance-based models. Handling: Apply normalization (min–max to [0,1]) or standardization (z-score).

General workflow

Discover/profile the data and its problems.
Structure — reshape into tidy format.
Clean — fix missing values, outliers, duplicates, types.
Enrich — derive new features, merge external data.
Validate — verify consistency and ranges.
Publish the clean dataset for analysis.

Conclusion: Good munging directly determines model quality — garbage in, garbage out — so each issue must be diagnosed during EDA and resolved with a documented, repeatable step.

AI-generated answerView in 2082 paper →

U2 · Question 1 of 9

Question Priority · U2ranked by appearance likelihood — study top-down

Data Munging

Analyzed next49%

★ TOP PICK

What is data munging? Explain what are the different issues in real world data along with the steps needed for handling those steps detailly.

6 marksSEEN IN

49%

What do you mean by missing data? How do you tackle them in any data science project? Explain with example of your own.

3 marksSEEN IN

30%

You are analyzing a dataset containing information about customer orders for an e-commerce platform. However, upon initial inspection, you notice several data quality issues that may impact the reliability of your analysis.

Describe the common data quality issues that you may have identified in the dataset, providing specific examples for each issue. Explain the potential consequences of these issues on your analysis and propose strategies to address them effectively.

6 marksSEEN IN

23%

Describe the common data quality issues with tabular data and their mitigation techniques with appropriate examples.

6 marksSEEN IN

23%

Describe in detail how the quality of data can be assessed during data munging.

List and describe how you would address various kinds of issues during data cleanup while doing data munging.

6 marksSEEN IN

19%

How encoding is done for categorical variables. Explain.

3 marksSEEN IN

25%

Discuss the importance of data validation in data science. Name some common methods of data validation.

3 marksSEEN IN

23%

Describe the common data enrichment techniques often used by data scientist.

3 marksSEEN IN

23%

What are the different commonly used data formats used across data science project?

3 marksSEEN IN

21%

03The mock

Sit a probable paper

A full mock exam built from the most likely questions, mirroring the real paper's structure. Every slot is a real past question.

Most Probable Paper

Mirrors the real structure · 45 marks · based on 5 past papers

Group A

1.
Briefly explain the various methods used to handle missing values during data cleanup.
[3 marks]
Data MungingVery likelyfrom 2080 paper →
This question has recurred in 2 of 5 years; including the board exam 1× (2080); and its topic (Data Munging) appears in 100% of years.
2.
List and explain in short three main limitations of Data Science.
[3 marks]
Introduction to Data ScienceVery likelyfrom 2079 paper →
This question has recurred in 2 of 5 years; including the board exam 1× (2079); and its topic (Introduction to Data Science) appears in 100% of years.
3.
What do you mean by backpropagation? Conceptually, how do they differ from forward propagation in neural network?
[3 marks]
Machine LearningVery likelyfrom 2080 paper →
Asked once (2080); including the board exam 1× (2080); and its topic (Machine Learning) appears in 100% of years.
4.
What do you mean by regression and classification? Are they supervised or unsupervised technique? Justify.
[3 marks]
Machine LearningVery likelyfrom 2080 paper →
Asked once (2080); including the board exam 1× (2080); and its topic (Machine Learning) appears in 100% of years.
5.
What are decision trees? What does impurity of a node mean in context of decision tree?
[3 marks]
Machine LearningVery likelyfrom 2079 paper →
Asked once (2079); including the board exam 1× (2079); and its topic (Machine Learning) appears in 100% of years.

Group B

1.
Describe, at a high-level, the major steps that need to be taken for data cleanup/munging.
[6 marks]
Data MungingVery likelyfrom 2082 paper →
This question has recurred in 3 of 5 years; including the board exam 1× (2082); and its topic (Data Munging) appears in 100% of years.
2.
List major differences between linear and logistics regressions with examples.
[6 marks]
Data Analysis TechniqueVery likelyfrom 2079 paper →
This question has recurred in 2 of 5 years; including the board exam 1× (2079); and its topic (Data Analysis Technique) appears in 100% of years.
3.
Apply map-reduce to the following set of data:

Data, Science, Engineering

Engineering, Data, Analytics

Analytics, Intelligence, Science

OR

What is Hadoop? Explain the different components of Hadoop.
[6 marks]
Introduction to Big DataVery likelyfrom 2080 paper →
This question has recurred in 2 of 5 years; including the board exam 1× (2080); and its topic (Introduction to Big Data) appears in 100% of years.
4.
Explain node and weights in neural networks. Consider following Neural Network and compute its output considering sigmoid as activation function for all layers. Weights of synaptic links are provided above each link.

Input: $X1 = 2$ , $X2 = 3$ . Feedforward net: input nodes feed hidden nodes 1 and 2, which feed nodes 3 and 4, which feed output node 5. Weights: $X1 \to$ node 1 = 0.8, $X1 \to$ node 2 = 0.4, $X2 \to$ node 1 = 1, $X2 \to$ node 2 = 0.6, node 1 $\to$ node 3 = 1.2, node 1 $\to$ node 4 = 0.4, node 2 $\to$ node 3 = 0.7, node 2 $\to$ node 4 = 0.5, node 3 $\to$ node 5 = 1.5, node 4 $\to$ node 5 = 0.5.

OR

What is multi-layer neural network? How is learning done in neural networks? Explain backpropagation algorithm.
[6 marks]
Machine LearningVery likelyfrom 2082 paper →
Asked once (2082); including the board exam 1× (2082); and its topic (Machine Learning) appears in 100% of years.
5.
How does machine learning differs from traditional learning? Explain the various type of machine learning techniques.
[6 marks]
Machine LearningVery likelyfrom 2081 paper →
Asked once (2081); including the board exam 1× (2081); and its topic (Machine Learning) appears in 100% of years.

04The receipts

Behind the numbers

The raw evidence the predictions are computed from: marks per unit per year, syllabus weights, trends, and coverage.

Show the heatmap, topic table and coverage analysis

The receipt: marks per unit, per year

Each row is a syllabus unit, each column an exam year, each cell the marks that unit earned that year. Click any cell to see the actual questions behind it.

Marks:nonefew → many

2078

2079

2080

2081

2082

Total

U2Data Munging

U1Introduction to Data Science

U4Machine Learning

U3Data Analysis Technique

U5Introduction to Big Data

U6Ethical Issues in Data Science

#	Syllabus unit	Probability	Appeared	Avg marks	Syllabus weight	Exam vs syllabus	Trend	Questions
1	U2Data Munging	Very likely80%	2079 2080 2081 2082	4.7	17%8 lecture hrs	Balancedexam 19% · syllabus 17%	Steady	none repeat9 total
2	U1Introduction to Data Science	Very likely80%	2079 2080 2081 2082	4.3	21%10 lecture hrs	Balancedexam 19% · syllabus 21%	Steady	none repeat9 total
3	U4Machine Learning	Very likely80%	2079 2080 2081 2082	4.1	17%8 lecture hrs	Over-examinedexam 23% · syllabus 17%	Steady	none repeat8 total
4	U3Data Analysis Technique	Very likely80%	2079 2080 2081 2082	5	21%10 lecture hrs	Balancedexam 17% · syllabus 21%	Steady	none repeat6 total
5	U5Introduction to Big Data	Very likely80%	2079 2080 2081 2082	5.4	17%8 lecture hrs	Under-examinedexam 9% · syllabus 17%	Steady	none repeat5 total
6	U6Ethical Issues in Data Science	Very likely60%	2080 2081 2082	3	8%4 lecture hrs	Balancedexam 12% · syllabus 8%	Steady	none repeat3 total

20783 sittings

first assessmentfirst reassessmentsecond assessment

20791 sitting

board

20803 sittings

boardfirst assessmentsecond assessment

20814 sittings

boardfirst assessmentsecond assessment

20822 sittings

boardsecond assessment

Study smart, not hard

Drag the slider: studying the top 5 units in priority order covers ~91% of all observed marks.

Units to study5/6

~80% line

Lecture time vs exam marks

Where the exam pays more than the curriculum spends: ● lectures vs ● exam marks, as a share of the whole course. A long teal-leading bar = high-yield unit.

U4Machine Learning

17% of lectures → 23% of markshigh yield

U2Data Munging

17% of lectures → 19% of marks

U1Introduction to Data Science

21% of lectures → 19% of marks

U3Data Analysis Technique

21% of lectures → 17% of marks

U6Ethical Issues in Data Science

8% of lectures → 12% of marks

U5Introduction to Big Data

17% of lectures → 9% of markslow yield

Fundamentals of Data Science: the questions likely to come

Data Munging

Common issues in real-world data and how to handle them

1. Missing values

2. Outliers and noise

3. Inconsistent / non-standard formats

4. Duplicate records

5. Structural problems

6. Incorrect data types

7. Scaling differences

General workflow

Data Munging

Sit a probable paper

Most Probable Paper

Behind the numbers

The receipt: marks per unit, per year

Study smart, not hard

Lecture time vs exam marks

2082 B.S.

Fundamentals of Data Science

Fundamentals of Data Science

2081 B.S.

Fundamentals of Data Science

Fundamentals of Data Science

Fundamentals of Data Science

Fundamentals of Data Science

2080 B.S.

Fundamentals of Data Science

Fundamentals of Data Science

Fundamentals of Data Science

2079 B.S.

Fundamentals of Data Science

2078 B.S.

Fundamentals of Data Science

Fundamentals of Data Science

Fundamentals of Data Science