BE Computer Engineering (Pokhara University) Machine Learning (PU, CMP 364) Question Paper 2078
This is the official BE Computer Engineering (Pokhara University) Machine Learning (PU, CMP 364) question paper for 2078, as set in the regular annual examination. It carries 100 full marks and a time allowance of 180 minutes, across 12 questions. On Kekkei you can attempt this Machine Learning (PU, CMP 364) past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your BE Computer Engineering (Pokhara University) Machine Learning (PU, CMP 364) exam or solving previous years' question papers, this 2078 paper is a great way to practise under real exam conditions.
Section A: Long Answer Questions
Attempt all / any as specified.
(a) Define machine learning and distinguish clearly between supervised and unsupervised learning, giving one real-world application of each. (5 marks)
(b) Consider a simple linear regression model . Derive the least squares cost function and obtain the gradient descent update rules for and . (7 marks)
(c) Explain the effect of choosing a learning rate that is too large versus too small on the convergence of gradient descent. (3 marks)
(a) A decision tree learning algorithm uses an attribute-selection measure to decide which feature to split on. Define entropy and information gain, and explain how the ID3 algorithm uses information gain to construct a tree. (6 marks)
(b) Given the following training dataset, compute the information gain of the attribute Weather with respect to the target Play and state which attribute you would select as the root. (6 marks)
| Weather | Temperature | Play |
|---|---|---|
| Sunny | Hot | No |
| Sunny | Mild | No |
| Overcast | Hot | Yes |
| Rainy | Mild | Yes |
| Rainy | Cool | Yes |
| Overcast | Cool | Yes |
(c) Decision trees are prone to overfitting. Explain two pruning strategies used to control tree complexity. (3 marks)
(a) Draw the architecture of a multilayer feedforward neural network with one hidden layer and explain the role of weights, bias, and activation functions. (5 marks)
(b) Derive the backpropagation weight-update equation for a single output-layer weight using the sigmoid activation function and the squared-error loss. (7 marks)
(c) Why are non-linear activation functions essential in a neural network? What happens if all activations are linear? (3 marks)
(a) Describe the k-means clustering algorithm step by step, clearly stating its objective (distortion) function and its termination condition. (6 marks)
(b) Given the one-dimensional data points {2, 4, 10, 12, 3, 20, 30, 11, 25} and with initial centroids and , perform one full iteration of k-means and report the updated centroids. (6 marks)
(c) Explain how the elbow method helps in choosing an appropriate value of . (3 marks)
Section B: Short Answer Questions
Attempt all / any as specified.
Explain the Naive Bayes classifier. State the conditional-independence assumption it makes and discuss one situation where this assumption may fail yet the classifier still performs well.
Differentiate between overfitting and underfitting in terms of the bias–variance trade-off. Explain how L1 (Lasso) and L2 (Ridge) regularization help reduce overfitting, and state how their effects on model parameters differ.
For a binary classifier evaluated on a test set, the confusion matrix is given below.
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | 40 | 10 |
| Actual Negative | 5 | 45 |
Compute the accuracy, precision, recall, and F1-score. Explain why accuracy alone can be misleading on an imbalanced dataset.
Explain the k-Nearest Neighbours (k-NN) algorithm. Discuss how the choice of affects the bias and variance of the model, and why k-NN is called a lazy learner.
Explain logistic regression for binary classification. (a) Write the sigmoid hypothesis function and explain how a decision boundary is obtained. (b) Why is the squared-error cost function not used for logistic regression?
Describe k-fold cross-validation and explain how it gives a more reliable estimate of model performance than a single train/test split. What is the role of a separate validation set versus the test set?
Compare partitional (k-means) clustering with hierarchical (agglomerative) clustering. Explain what a dendrogram represents and how it is used to decide the number of clusters.
Write short notes on any two of the following: (a) Support Vector Machine and the concept of margin; (b) The vanishing gradient problem in deep neural networks; (c) Feature scaling and its importance in gradient-based learning.