Fundamentals of Data Science: what's likely to come
Every real question from 13 past papers across 5 years (board + internal assessments each year) mapped to its official syllabus unit. Each prediction shows its receipt: the actual years it appeared.
13 papers across 5 years
This program sits several exams each year: one official board exam plus internal assessments. Every sitting is analysed.
Topic predictions, ordered by what to study first
Every syllabus unit scored by how often it appears, its mark-weight, and its trend. See the exact questions behind each unit in the Explore-by-unit section below.
| # | Syllabus unit | Probability | Appeared | Avg marks | Syllabus weight | Exam vs syllabus | Trend | Questions |
|---|---|---|---|---|---|---|---|---|
| 1 | U4Machine Learning | Very likely100% | 24.6 | 17%8 lecture hrs | Balancedexam 21% · syllabus 17% | Steady | 3 recurring25 total | |
| 2 | U2Data Munging | Very likely100% | 22.8 | 17%8 lecture hrs | Balancedexam 19% · syllabus 17% | Steady | 3 recurring23 total | |
| 3 | U3Data Analysis Technique | Very likely100% | 22.8 | 21%10 lecture hrs | Balancedexam 19% · syllabus 21% | Steady | 1 recurring23 total | |
| 4 | U1Introduction to Data Science | Very likely100% | 22.2 | 21%10 lecture hrs | Balancedexam 19% · syllabus 21% | Steady | 3 recurring21 total | |
| 5 | U6Ethical Issues in Data Science | Very likely80% | 17.2 | 8%4 lecture hrs | Balancedexam 12% · syllabus 8% | Rising | 1 recurring14 total | |
| 6 | U5Introduction to Big Data | Very likely100% | 10.8 | 17%8 lecture hrs | Under-examinedexam 9% · syllabus 17% | Steady | 2 recurring9 total |
Explore by unit: every question, ranked
Pick a syllabus unit and walk its questions from most-important to asked-once. The fastest way to revise one topic end to end.
Define entropy. How is it used with decision tree?
Describe and explain with examples the various types of machine learning methods (Supervised, Unsupervised, and Reinforcement).
OR
Explain with examples and highlight the relationship between Artificial Intelligence and Machine Learning.
You want to identify global weather patterns that may have been affected by climate change. To do so, you want to use machine learning algorithms to find patterns that would otherwise be imperceptible to a human meteorologist. Discuss what machine learning method (supervised, unsupervised, reinforcement) would you use and why.
The following table represents a dataset of 10 objects with attributes Color, Type, Origin and the "class", whether the customer who bought was satisfied or not.
| S. No | Color | Type | Origin | Satisfied? |
|---|---|---|---|---|
| 1 | Red | Casual | Domestic | Yes |
| 2 | Red | Casual | Domestic | No |
| 3 | Red | Casual | Domestic | Yes |
| 4 | Yellow | Casual | Domestic | No |
| 5 | Yellow | Casual | Imported | Yes |
| 6 | Yellow | Casual | Imported | Yes |
| 7 | Yellow | Formal | Imported | No |
| 8 | Yellow | Formal | Imported | Yes |
| 9 | Yellow | Formal | Domestic | No |
| 10 | Red | Formal | Imported | No |
| 11 | Red | Casual | Imported | Yes |
Use ID3 algorithm to find the attribute with maximum information gain.
OR
What is multi-layer neural network? How is learning done in neural networks? Explain backpropagation algorithm. [1 + 1 + 4]
Explain node and weights in neural networks. Consider following Neural Network and compute its output considering sigmoid as activation function for all layers. Weights of synaptic links are provided above each link.
Input: , . Feedforward net: input nodes feed hidden nodes 1 and 2, which feed nodes 3 and 4, which feed output node 5. Weights: node 1 = 0.8, node 2 = 0.4, node 1 = 1, node 2 = 0.6, node 1 node 3 = 1.2, node 1 node 4 = 0.4, node 2 node 3 = 0.7, node 2 node 4 = 0.5, node 3 node 5 = 1.5, node 4 node 5 = 0.5.
OR
What is multi-layer neural network? How is learning done in neural networks? Explain backpropagation algorithm.
How does machine learning differs from traditional learning? Explain the various type of machine learning techniques.
Consider a dataset representing whether students passed an exam based on three features: Study Hours (Low, Medium, High), Previous Grades (Low, Medium, High), and Tutoring (Yes or No). The target variable is Exam Result (Pass or Fail).
| Study Hours | Previous Grades | Tutoring | Exam Result |
|---|---|---|---|
| Low | Low | Yes | Fail |
| Low | Medium | No | Fail |
| Medium | High | Yes | Pass |
| High | Low | No | Fail |
| Medium | Medium | Yes | Pass |
| High | High | Yes | Pass |
| High | High | No | Pass |
| Low | Low | No | Fail |
Using the ID3 algorithm, calculate the information gain for each feature (Study Hours, Previous Grades, Tutoring) and determine which feature should be chosen as the root node for the decision tree.
OR
Consider a dataset containing the coordinates of 8 points in a two-dimensional space:
- Point 1: (2, 3)
- Point 2: (3, 4)
- Point 3: (3, 5)
- Point 4: (4, 6)
- Point 5: (7, 8)
- Point 6: (8, 7)
- Point 7: (9, 8)
- Point 8: (10, 9)
Apply the K-Means algorithm to cluster these points into 3 clusters.
The following table presents a dataset of 11 objects with attributes Color, Type, Origin and the 'class' whether the customer who bought was satisfied or not.
| S. No | Color | Type | Origin | Satisfied? |
|---|---|---|---|---|
| 1 | Red | Casual | Domestic | Yes |
| 2 | Red | Casual | Domestic | No |
| 3 | Red | Casual | Domestic | Yes |
| 4 | Yellow | Casual | Domestic | No |
| 5 | Yellow | Casual | Imported | Yes |
| 6 | Yellow | Casual | Imported | Yes |
| 7 | Yellow | Formal | Imported | No |
| 8 | Yellow | Formal | Imported | Yes |
| 9 | Yellow | Formal | Domestic | No |
| 10 | Red | Formal | Imported | No |
| 11 | Red | Casual | Imported | Yes |
Now classify a new object using Naïve Bayes classifier with the following properties: Color = Red, Origin = Imported and Type = Formal
OR
What is clustering technique? Divide the data points into two clusters using K-Means Clustering technique. [1+5]
Compute the output of following neural network using sigmoid activation function. Weights of synaptic links are provided above each link.
Input: , . Network is a feedforward net with input nodes feeding hidden nodes 1 and 2, which feed nodes 3 and 4, which feed output node 5. Weights: node 1 = 0.8, node 2 = 0.2, node 1 = 0.4, node 2 = 0.6, node 1 node 3 = 1.2, node 1 node 4 = 0.4, node 2 node 3 = 0.7, node 2 node 4 = 0.5, node 3 node 5 = 1.5, node 4 node 5 = 0.5.
What do you mean by support vectors? Explain with appropriate example of your own how supports vectors are useful in machine learning?
Explain neural network along with forward propagation and back propagation.
OR
Differentiate between Regression and Classification in supervised learning.
Apply Naïve bayes algorithm to decide if Married Female with salary of 42000 is probable to have illness or not based on data given below:
| Marital Status | Gender | Income | Illness |
|---|---|---|---|
| Married | Male | 40000 | Yes |
| Unmarried | Male | 35000 | No |
| Married | Male | 60000 | Yes |
| Married | Female | 61000 | Yes |
| Unmarried | Female | 36000 | Yes |
| Married | Female | 47500 | No |
| Unmarried | Female | 32000 | No |
OR
Explain the forward propagation and backward propagation of neural networks.
Elaborate the concept behind Naïve Bayes algorithm for classification task.
Please answer any ONE of the following:
(a) Explain what is Naïve Bayes. Describe its common use cases with examples.
OR
(b) Explain Decision Trees. Describe its common use cases with examples.
Explain how classification is done by SVM classifier.
Discuss on machine learning and its types.
What is deep learning? How is it similar or different from neural network.
What do you mean by backpropagation? Conceptually, how do they differ from forward propagation in neural network?
How machine learning is different from traditional programming? What actually machine learns through machine learning? Justify your answer.
What do you mean by deep learning? How is it similar/different to Neural Network?
List the differences between supervised and unsupervised machine learning methods including examples.
Define and briefly describe neural networks. List a few example use-cases where neural networks can be used.
What are decision trees? What does impurity of a node mean in context of decision tree?
What kind of problem can a Decision Tree solve? Explain with an example.
Define Support Vector Machines. Describe briefly how SVMs are used for classification.
Most-asked questions across all years
The questions that come back exam after exam, grouped across years and ranked by how often they're asked. Open one to read its real past answer.
Apply map-reduce to the following set of data:
Data, Science, Engineering
Engineering, Data, Analytics
Analytics, Intelligence, Science
OR
What is Hadoop? Explain the different components of Hadoop.
What is Hadoop? Explain its components in detail.
Explain the CRISP-DM lifecycle for data mining process.
OR
Explain the TDSP lifecycle for data science.
Define entropy. How is it used with decision tree?
Explain why data ethics is crucial in data science.
Describe, at a high-level, the major steps that need to be taken for data cleanup/munging.
With an example, explain how you would determine the True Negative and False Negative data from research dataset.
Describe and explain with examples the various types of machine learning methods (Supervised, Unsupervised, and Reinforcement).
OR
Explain with examples and highlight the relationship between Artificial Intelligence and Machine Learning.
Lowest priority: asked only once (102)
- U12082
You are a data scientist. Take a data science project and complete it using the CRISP-DM approach. Explain what should be done in every step. [1+5]
OR
What is data driven decision making and how does data science assist data driven decision making? Explain OSEMN lifecycle for your data science project. [3 + 3]
- U12082
You are a data scientist. Take a data science project with title and complete it using the TDSP approach. Explain what should be done in every step detailly.
OR
Explain CRISP-DM with its steps and compare and contrast it with OSEMN framework.
- U22082
What is data munging? Explain what are the different issues in real world data along with the steps needed for handling those steps detailly.
- U32082
What is feature selection? Explain filters and wrappers method for feature selection. [1 + 5]
- U32082
Explain linear regression, logistic regression and decision trees for fitting model in detail.
- U42082
The following table represents a dataset of 10 objects with attributes Color, Type, Origin and the "class", whether the customer who bought was satisfied or not.
S. No Color Type Origin Satisfied? 1 Red Casual Domestic Yes 2 Red Casual Domestic No 3 Red Casual Domestic Yes 4 Yellow Casual Domestic No 5 Yellow Casual Imported Yes 6 Yellow Casual Imported Yes 7 Yellow Formal Imported No 8 Yellow Formal Imported Yes 9 Yellow Formal Domestic No 10 Red Formal Imported No 11 Red Casual Imported Yes Use ID3 algorithm to find the attribute with maximum information gain.
OR
What is multi-layer neural network? How is learning done in neural networks? Explain backpropagation algorithm. [1 + 1 + 4]
- U42082
Explain node and weights in neural networks. Consider following Neural Network and compute its output considering sigmoid as activation function for all layers. Weights of synaptic links are provided above each link.
Input: , . Feedforward net: input nodes feed hidden nodes 1 and 2, which feed nodes 3 and 4, which feed output node 5. Weights: node 1 = 0.8, node 2 = 0.4, node 1 = 1, node 2 = 0.6, node 1 node 3 = 1.2, node 1 node 4 = 0.4, node 2 node 3 = 0.7, node 2 node 4 = 0.5, node 3 node 5 = 1.5, node 4 node 5 = 0.5.
OR
What is multi-layer neural network? How is learning done in neural networks? Explain backpropagation algorithm.
- U62082
What is cognitive bias? Explain any two cognitive biases. Explain techniques for addressing bias.
- U12081
Define and explain the TDSP lifecycle in data science.
OR
A leading retail chain in Nepal wants to use data science to enhance its customer experience and optimize inventory management. They have data from customer transactions, online browsing behavior, and social media interactions.
Briefly explain how data science can be applied in the retail industry to improve customer experience and optimize inventory management. Provide specific examples of data science techniques that could be used in this context.
- U12081
Elaborate on TDSP (Team Data Science Process) as a framework for the data science lifecycle.
OR
Discuss CRISP-DM (Cross-Industry Standard Process for Data Mining) as an agile approach to the data science lifecycle.
- U12081
What is Data Science? Explain CRISP-DM approach for Data Science. [1+5]
OR
What is Data Science Lifecycle? Explain TDSP approach for Data Science.
- U22081
Describe the common data quality issues with tabular data and their mitigation techniques with appropriate examples.
Study smart and sit a probable paper
How far a few high-priority topics take you, and a full mock paper built from the most likely questions, mirroring the real exam structure.
Study smart, not hard
Study the units in priority order. Each bar shows the share of total marks you'd have covered by then. The top 5 units alone cover ~80% of marks.
Most Probable Paper
Mirrors the real structure · 45 marks · based on 5 past papers
- 1.[3 marks]
List and explain in short three main limitations of Data Science.
This question has recurred in 2 of 5 years; including the board exam 1× (2079); and its topic (Introduction to Data Science) appears in 100% of years.
- 2.[3 marks]
What do you mean by backpropagation? Conceptually, how do they differ from forward propagation in neural network?
Asked once (2080); including the board exam 1× (2080); and its topic (Machine Learning) appears in 100% of years.
- 3.[3 marks]
List the differences between supervised and unsupervised machine learning methods including examples.
Asked once (2079); including the board exam 1× (2079); and its topic (Machine Learning) appears in 100% of years.
- 4.[3 marks]
Define and briefly describe neural networks. List a few example use-cases where neural networks can be used.
Asked once (2079); including the board exam 1× (2079); and its topic (Machine Learning) appears in 100% of years.
- 5.[3 marks]
What are decision trees? What does impurity of a node mean in context of decision tree?
Asked once (2079); including the board exam 1× (2079); and its topic (Machine Learning) appears in 100% of years.
- 1.[6 marks]
Apply map-reduce to the following set of data:
Data, Science, Engineering
Engineering, Data, Analytics
Analytics, Intelligence, Science
OR
What is Hadoop? Explain the different components of Hadoop.
This question has recurred in 2 of 5 years; including the board exam 1× (2080); and its topic (Introduction to Big Data) appears in 100% of years.
- 2.[6 marks]
Explain node and weights in neural networks. Consider following Neural Network and compute its output considering sigmoid as activation function for all layers. Weights of synaptic links are provided above each link.
Input: , . Feedforward net: input nodes feed hidden nodes 1 and 2, which feed nodes 3 and 4, which feed output node 5. Weights: node 1 = 0.8, node 2 = 0.4, node 1 = 1, node 2 = 0.6, node 1 node 3 = 1.2, node 1 node 4 = 0.4, node 2 node 3 = 0.7, node 2 node 4 = 0.5, node 3 node 5 = 1.5, node 4 node 5 = 0.5.
OR
What is multi-layer neural network? How is learning done in neural networks? Explain backpropagation algorithm.
Asked once (2082); including the board exam 1× (2082); and its topic (Machine Learning) appears in 100% of years.
- 3.[6 marks]
How does machine learning differs from traditional learning? Explain the various type of machine learning techniques.
Asked once (2081); including the board exam 1× (2081); and its topic (Machine Learning) appears in 100% of years.
- 4.[6 marks]
Elaborate the concept behind Naïve Bayes algorithm for classification task.
Asked once (2080); including the board exam 1× (2080); and its topic (Machine Learning) appears in 100% of years.
- 5.[6 marks]
What is data munging? Explain what are the different issues in real world data along with the steps needed for handling those steps detailly.
Asked once (2082); including the board exam 1× (2082); and its topic (Data Munging) appears in 100% of years.