Browse papers
A

Section A: Long Answer Questions

Attempt all / any as specified.

4 questions
1long15 marks

(a) Define machine learning and distinguish clearly between supervised and unsupervised learning, giving one real-world application of each. (5 marks)

(b) Consider a simple linear regression model hθ(x)=θ0+θ1xh_\theta(x) = \theta_0 + \theta_1 x. Derive the least squares cost function J(θ0,θ1)J(\theta_0, \theta_1) and obtain the gradient descent update rules for θ0\theta_0 and θ1\theta_1. (7 marks)

(c) Explain the effect of choosing a learning rate that is too large versus too small on the convergence of gradient descent. (3 marks)

supervised-learningregression
2long15 marks

(a) A decision tree learning algorithm uses an attribute-selection measure to decide which feature to split on. Define entropy and information gain, and explain how the ID3 algorithm uses information gain to construct a tree. (6 marks)

(b) Given the following training dataset, compute the information gain of the attribute Weather with respect to the target Play and state which attribute you would select as the root. (6 marks)

WeatherTemperaturePlay
SunnyHotNo
SunnyMildNo
OvercastHotYes
RainyMildYes
RainyCoolYes
OvercastCoolYes

(c) Decision trees are prone to overfitting. Explain two pruning strategies used to control tree complexity. (3 marks)

decision-treesclassificationmodel-evaluation
3long15 marks

(a) Draw the architecture of a multilayer feedforward neural network with one hidden layer and explain the role of weights, bias, and activation functions. (5 marks)

(b) Derive the backpropagation weight-update equation for a single output-layer weight using the sigmoid activation function and the squared-error loss. (7 marks)

(c) Why are non-linear activation functions essential in a neural network? What happens if all activations are linear? (3 marks)

neural-networksoverfitting-regularization
4long15 marks

(a) Describe the k-means clustering algorithm step by step, clearly stating its objective (distortion) function and its termination condition. (6 marks)

(b) Given the one-dimensional data points {2, 4, 10, 12, 3, 20, 30, 11, 25} and k=2k = 2 with initial centroids c1=4c_1 = 4 and c2=12c_2 = 12, perform one full iteration of k-means and report the updated centroids. (6 marks)

(c) Explain how the elbow method helps in choosing an appropriate value of kk. (3 marks)

clusteringmodel-evaluation
B

Section B: Short Answer Questions

Attempt all / any as specified.

8 questions
5short7 marks

Explain the Naive Bayes classifier. State the conditional-independence assumption it makes and discuss one situation where this assumption may fail yet the classifier still performs well.

classification
6short7 marks

Differentiate between overfitting and underfitting in terms of the bias–variance trade-off. Explain how L1 (Lasso) and L2 (Ridge) regularization help reduce overfitting, and state how their effects on model parameters differ.

overfitting-regularizationregression
7short7 marks

For a binary classifier evaluated on a test set, the confusion matrix is given below.

Predicted PositivePredicted Negative
Actual Positive4010
Actual Negative545

Compute the accuracy, precision, recall, and F1-score. Explain why accuracy alone can be misleading on an imbalanced dataset.

model-evaluationclassification
8short6 marks

Explain the k-Nearest Neighbours (k-NN) algorithm. Discuss how the choice of kk affects the bias and variance of the model, and why k-NN is called a lazy learner.

classificationsupervised-learning
9short6 marks

Explain logistic regression for binary classification. (a) Write the sigmoid hypothesis function and explain how a decision boundary is obtained. (b) Why is the squared-error cost function not used for logistic regression?

regressionclassification
10short6 marks

Describe k-fold cross-validation and explain how it gives a more reliable estimate of model performance than a single train/test split. What is the role of a separate validation set versus the test set?

model-evaluationoverfitting-regularization
11short6 marks

Compare partitional (k-means) clustering with hierarchical (agglomerative) clustering. Explain what a dendrogram represents and how it is used to decide the number of clusters.

clustering
12short5 marks

Write short notes on any two of the following: (a) Support Vector Machine and the concept of margin; (b) The vanishing gradient problem in deep neural networks; (c) Feature scaling and its importance in gradient-based learning.

supervised-learningneural-networks