BE Computer Engineering (Pokhara University) Machine Learning (PU, CMP 364) Question Paper 2079
This is the official BE Computer Engineering (Pokhara University) Machine Learning (PU, CMP 364) question paper for 2079, as set in the regular annual examination. It carries 100 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Machine Learning (PU, CMP 364) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BE Computer Engineering (Pokhara University) Machine Learning (PU, CMP 364) exam or solving previous years' question papers, this 2079 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt all / any as specified.
(a) Define supervised learning and clearly distinguish it from unsupervised learning with one suitable example of each. (4)
(b) Consider a simple linear regression model . Derive the closed-form (normal equation) solution for the parameters and by minimizing the mean squared error cost function . (7)
(c) The following data is given for hours studied () and marks obtained (): . Fit the best-fit line using the result derived in part (b) and predict the marks for a student who studies for 6 hours. (3)
(a) Explain the working principle of a decision tree classifier. Define entropy and information gain, and explain how they are used to select the splitting attribute at each node. (6)
(b) A dataset of 14 training examples for the target PlayTennis (Yes/No) contains 9 Yes and 5 No instances. The attribute Outlook has three values: Sunny (2 Yes, 3 No), Overcast (4 Yes, 0 No), and Rain (3 Yes, 2 No). Compute the entropy of the whole dataset and the information gain of the attribute Outlook. (6)
(c) State two advantages and two limitations of decision trees compared to other classifiers. (2)
(a) Draw the architecture of a multilayer feedforward neural network with one hidden layer and explain the role of activation functions. Why is a non-linear activation function (such as sigmoid or ReLU) necessary? (6)
(b) Describe the backpropagation algorithm. Derive the weight-update rule for a single output-layer weight using gradient descent on the squared error loss. (5)
(c) Define overfitting in the context of neural networks and explain how L2 regularization (weight decay) and dropout help to reduce it. (3)
(a) Explain the k-means clustering algorithm step by step. State the objective function it tries to minimize. (6)
(b) Given the one-dimensional data points and with initial cluster centroids and , perform two iterations of k-means and report the final clusters and centroids. (6)
(c) Discuss two limitations of k-means and briefly explain how the elbow method helps in choosing the value of . (2)
Section B: Short Answer Questions
Attempt all / any as specified.
A binary classifier produces the following confusion matrix on a test set: True Positives = 40, False Positives = 10, False Negatives = 20, True Negatives = 30. Compute the accuracy, precision, recall, and F1-score of the classifier. Briefly comment on what the precision and recall values indicate about its performance.
Differentiate between regression and classification problems with one example each. Also explain why Mean Squared Error (MSE) is an appropriate loss function for regression but not directly suitable for a classification task.
Explain the bias-variance tradeoff with the help of a diagram. Relate the concepts of high bias and high variance to underfitting and overfitting respectively.
Explain the working of the k-Nearest Neighbours (k-NN) algorithm. How does the choice of affect the bias and variance of the model? What is the effect of feature scaling on k-NN?
What is k-fold cross-validation? Explain the procedure with a diagram for and state why it gives a more reliable estimate of model performance than a single train-test split.
Explain logistic regression for binary classification. Write the sigmoid (logistic) function, state how the decision boundary is obtained, and explain why the cross-entropy loss is preferred over squared error for training it.
Compare hierarchical clustering with partitional (k-means) clustering. Briefly explain agglomerative clustering and the role of a dendrogram in deciding the number of clusters.
Write short notes on any two of the following:
(a) Gradient descent and the role of the learning rate (b) Curse of dimensionality (c) Support Vector Machines and the concept of margin (d) Feature engineering