Browse papers
A

Section A: Long Answer Questions

Attempt all / any as specified.

4 questions
1long14 marks

(a) Define supervised learning and clearly distinguish it from unsupervised learning with one suitable example of each. (4)

(b) Consider a simple linear regression model hθ(x)=θ0+θ1xh_\theta(x) = \theta_0 + \theta_1 x. Derive the closed-form (normal equation) solution for the parameters θ0\theta_0 and θ1\theta_1 by minimizing the mean squared error cost function J(θ)=12mi=1m(hθ(x(i))y(i))2J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2. (7)

(c) The following data is given for hours studied (xx) and marks obtained (yy): (1,2),(2,4),(3,5),(4,4),(5,6)(1,2),(2,4),(3,5),(4,4),(5,6). Fit the best-fit line using the result derived in part (b) and predict the marks for a student who studies for 6 hours. (3)

supervised-learningregression
2long14 marks

(a) Explain the working principle of a decision tree classifier. Define entropy and information gain, and explain how they are used to select the splitting attribute at each node. (6)

(b) A dataset of 14 training examples for the target PlayTennis (Yes/No) contains 9 Yes and 5 No instances. The attribute Outlook has three values: Sunny (2 Yes, 3 No), Overcast (4 Yes, 0 No), and Rain (3 Yes, 2 No). Compute the entropy of the whole dataset and the information gain of the attribute Outlook. (6)

(c) State two advantages and two limitations of decision trees compared to other classifiers. (2)

decision-treesclassification
3long14 marks

(a) Draw the architecture of a multilayer feedforward neural network with one hidden layer and explain the role of activation functions. Why is a non-linear activation function (such as sigmoid or ReLU) necessary? (6)

(b) Describe the backpropagation algorithm. Derive the weight-update rule for a single output-layer weight using gradient descent on the squared error loss. (5)

(c) Define overfitting in the context of neural networks and explain how L2 regularization (weight decay) and dropout help to reduce it. (3)

neural-networksoverfitting-regularization
4long14 marks

(a) Explain the k-means clustering algorithm step by step. State the objective function it tries to minimize. (6)

(b) Given the one-dimensional data points {2,4,10,12,3,20,30,11,25}\{2, 4, 10, 12, 3, 20, 30, 11, 25\} and k=2k=2 with initial cluster centroids μ1=2\mu_1 = 2 and μ2=30\mu_2 = 30, perform two iterations of k-means and report the final clusters and centroids. (6)

(c) Discuss two limitations of k-means and briefly explain how the elbow method helps in choosing the value of kk. (2)

clusteringunsupervised-learning
B

Section B: Short Answer Questions

Attempt all / any as specified.

8 questions
5short6 marks

A binary classifier produces the following confusion matrix on a test set: True Positives = 40, False Positives = 10, False Negatives = 20, True Negatives = 30. Compute the accuracy, precision, recall, and F1-score of the classifier. Briefly comment on what the precision and recall values indicate about its performance.

classificationmodel-evaluation
6short6 marks

Differentiate between regression and classification problems with one example each. Also explain why Mean Squared Error (MSE) is an appropriate loss function for regression but not directly suitable for a classification task.

regressionmodel-evaluation
7short6 marks

Explain the bias-variance tradeoff with the help of a diagram. Relate the concepts of high bias and high variance to underfitting and overfitting respectively.

overfitting-regularizationmodel-evaluation
8short6 marks

Explain the working of the k-Nearest Neighbours (k-NN) algorithm. How does the choice of kk affect the bias and variance of the model? What is the effect of feature scaling on k-NN?

classification
9short6 marks

What is k-fold cross-validation? Explain the procedure with a diagram for k=5k=5 and state why it gives a more reliable estimate of model performance than a single train-test split.

model-evaluation
10short6 marks

Explain logistic regression for binary classification. Write the sigmoid (logistic) function, state how the decision boundary is obtained, and explain why the cross-entropy loss is preferred over squared error for training it.

classificationregression
11short6 marks

Compare hierarchical clustering with partitional (k-means) clustering. Briefly explain agglomerative clustering and the role of a dendrogram in deciding the number of clusters.

clusteringunsupervised-learning
12short4 marks

Write short notes on any two of the following:

(a) Gradient descent and the role of the learning rate (b) Curse of dimensionality (c) Support Vector Machines and the concept of margin (d) Feature engineering

supervised-learningoverfitting-regularization