Master in Data Science (SMS, TU) Statistical Computing with R Question Paper 2078 (Set Second Assessment 2078 (Open Book), p15-17) Nepal
This is the official Master in Data Science (SMS, TU) Statistical Computing with R question paper for 2078 Set Second Assessment 2078 (Open Book), p15-17, as set in the Second Assessment examination. It carries 45 full marks and a time allowance of 120 minutes, across 10 questions. On Kekkei you can attempt this Statistical Computing with R past paper online with a timer, get instant AI feedback and step-by-step solutions, and track the topics where you lose marks — completely free. Whether you are revising for your Master in Data Science (SMS, TU) Statistical Computing with R exam or solving previous years' question papers, this 2078 paper is a great way to practise under real exam conditions.
| Level | Master in Data Science (SMS, TU) |
|---|---|
| Subject | Statistical Computing with R |
| Year | 2078 BS |
| Exam session | Second Assessment · Set Second Assessment 2078 (Open Book), p15-17 |
| Full marks | 45 |
| Time allowed | 120 minutes |
| Questions | 10, all with step-by-step solutions |
Group A
Compare following methods used in supervised learning models:
a) Ordinary least square b) Gradient descent c) Maximum likelihood
Explain advantage and limitations of these concepts used in supervised learning:
a) Validation b) Cross-validation c) Cross-validation with repetitions/replications
Explain when these models are used in supervised learning:
a) Log-transformed models b) Polynomial models c) Neural network models
Differentiate supervised classification (decision tree) model using:
a) Bagging b) Improved bagging c) Boosting
Explain unsupervised association rules mining with focus on:
a) Method b) Use/example c) Limitations
Group B
Do the following in R Studio with R script so that it can be knitted as PDF for review/grading:
a) Prepare a data with 50 random observations and two variables: miles per gallon (mpg) with random range between 10 to 50 and transmission (am) as random binary variable (0=automatic, 1=Manual), do not forget to use your class roll number as random seed to replicate the result b) Perform goodness-of-fit test on miles per gallon (mpg) variable to check if it follows normal distribution or not c) Perform goodness-of-fit test on miles per gallon (mpg) variable to check if the variances of mpg are equal or not on am variable categories d) Perform the best independent sample t-test based on goodness-of-fit results and interpret it carefully e) Can you use the independent sample t-test for this data? Why?
Do the followings in R Studio using R script so that it can be knitted as PDF for review/grading:
a) Prepare a data with 100 random observations and four variables: miles per gallon (mpg) with random range between 10 to 50; transmission (am) as random binary variable (0=automatic, 1=Manual), weight (wt) with random range of 1 to 10 and horse power (hp) with random range of 125 and 400, do not forget to use your exam roll number as random seed to replicate the result b) Divide this data into train and test datasets with 70:30 random splits with your roll number as random seed c) Fit supervised linear regression model and KNN regression model on train data with miles per gallon (mpg) as dependent variable and all other variables as independent variable d) Get the summary of the model, fit indices and select the best model e) Predict the mpg variable in the test data with best model, get fit indices and interpret the results carefully
Do the following in R Studio with R script so that it can be knitted as PDF for review/grading:
a) Prepare a data with 150 observations and four variables: transmission (am) as random binary variable (0=automatic, 1=Manual), miles per gallon (mpg) with random range between 10 to 50; weight (wt) with random range of 1 to 10 and horse power (hp) with random range of 125 and 400, do not forget to use your exam roll number as random seed to replicate the result b) Divide this data into train and test datasets with 80:20 random splits with your roll number as random seed c) Fit supervised binary logistic regression model and naïve Bayes classification model on train data with transmission (am) as dependent variable and miles per gallon (mpg), weight (wt) and horse power (hp) as independent variable and select the best model d) Predict the transmission variable in the test data using best model and interpret the result carefully e) Get the confusion matrix, sensitivity, specificity of the predicted model and interpret them carefully
Do as follows using "USArrests" dataset in R studio with R script so that it can be knitted as PDF for review/grading:
a) Check the head and the structure of the dataset and interpret them carefully b) Use all the variables of this dataset in the Principal Component Analysis (PCA) to create "criminality score" for the US states c) Interpret the results of the PCA model carefully d) Get scree-plot and explain how many components must be retained e) Get the bi-plot of the fitted model and interpret it carefully f) Improve the fitted model with VARIMAX process and interpret the results carefully
OR
Do as follows using "USArrests" dataset in R studio with R script so that it can be knitted as PDF for review/grading:
a) Get dissimilarity distance as "state.dissimilarity" object b) Fit a classical multidimensional model using the "state.dissimilarity" object c) Get the summary of the model and interpret it carefully d) Get the plot of the model and interpret it carefully e) Compare this model with the first two components from principal component analysis model in this data
Do as follows in the R Studio with R Script so that it can be knitted as PDF for review/grading:
a) Define a 2-column matrix "x" with 50 normally distributed random observations for each column, do not forget to set random seed as your roll number for replication b) Define "dist" object of x matrix to compute 50x50 inter-observation Euclidean distance matrix c) Fit a hierarchical clustering model using "dist" object and single linkage and, get dendogram d) Fit a hierarchical clustering model using "dist" object and complete linkage and, get dendogram e) Show the number of clusters (k) to retain for the data using ablines in the dendogram of these models f) Get the best value of number of clusters to form (k) using the best model fitted above
OR
Do as follows in the R Studio with R Script so that it can be knitted as PDF for review/grading:
a) Define a 2-column matrix "x" with 50 normally distributed random observations for each column, do not forget to set random seed as your roll number for replication b) Assign the first 25 observation of the "x" matrix data as "1" and next 25 observation as "2" of a new column of this matrix c) Fit a k-means clustering model in the "x" matrix data with k=2 and nstart = 20 d) Fit a k-means clustering model in the "x" matrix data with k=3 and nstart = 20 e) Plot the clusters formed by k=2 and k=3, compare the results and interpret them carefully f) Visualize the clusters for best k value and interpret it carefully
Frequently asked questions
- Where can I find the Master in Data Science (SMS, TU) Statistical Computing with R question paper 2078?
- The full Master in Data Science (SMS, TU) Statistical Computing with R 2078 (Second Assessment) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
- Does the Statistical Computing with R 2078 paper come with solutions?
- Yes. Every question on this Statistical Computing with R past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
- How many marks is the Master in Data Science (SMS, TU) Statistical Computing with R 2078 paper?
- The Master in Data Science (SMS, TU) Statistical Computing with R 2078 paper carries 45 full marks and is meant to be completed in 120 minutes, across 10 questions.
- Is practising this Statistical Computing with R past paper free?
- Yes — reading and attempting this Statistical Computing with R past paper on Kekkei is completely free.