Browse papers
LevelMaster in Data Science (SMS, TU)
SubjectStatistical Computing with R
Year2082 BS
Exam sessionSecond Assessment · Set pages 13-14; Second Assessment 2082
Full marks45
Time allowed120 minutes
Questions10, all with step-by-step solutions
A

Group A

5 questions·3 marks each
1Short answer3 marks

Describe the following concepts with examples: a) Wilkinson's approach of grammar of graphics b) ggplot2 approach of grammar of graphics

grammar-of-graphics
2Short answer3 marks

Discuss following concepts with clear examples: a) BLUE in linear regression b) LINE in linear regression

linear-regressionblueline
3Short answer3 marks

Explain followings concepts with clear examples: a) Spearman's rank correlation b) Mann-Whitney U test

spearmanmann-whitney
4Short answer3 marks

Explain supervised learning classification regression model with: a) Multinominal logistic regression b) Random forest model

classificationrandom-forest
5Short answer3 marks

Discuss supervised linear regression models with focus on: a) Support vector machine b) Multilayer perceptron

svmmlp
B

Group B

5 questions·6 marks each
6Long answer6 marks

Do the following in R Studio using ggplot2 and dplyr packages and knit output as PDF file: a) Create a dataset with following variables: age (20-59 years), height (110 – 190 centimeters), weight (40-90 kg) with random 150 cases of each variable. Your roll number must be used to set the random seed. b) Compute body mass index (BMI) variable as: BMI = [(weight in kg) / (height in meter squared)] c) Create body mass index categories: <18, 18-24, 25-30, 30+ and label them as "underweight", "normal", "overweight" and "obese" respectively using dplyr package d) Show the percentage distribution of labelled BMI variable with pie chart using ggplot2 package

ggplot2dplyrbmi
7Long answer6 marks

Do the following in R Studio using "airquality" dataset of R with R script to knit PDF output: a) Perform test of normality of "Temp" variable by each category of "Month" variable and interpret all the results carefully b) Perform test of equality of variance on "Temp" variable by each category of Month variable and interpret the result carefully c) Compare Temp by Month using the best statistical test for this data and interpret the result carefully d) Perform the post-hoc test and interpret the results carefully

airqualitynormalityanova
8Long answer6 marks

Do the following in R Studio with R script to knit PDF output: a) Create a dataset with 200 random cases, 1 random binary (1 and 0) variable and four random non-binary (categorical and continuous) variables with your roll number as random seed b) Divide it into train and test datasets with 70:30 random splits c) Fit a supervised logistic regression and decision tree classification models on train data with binary variable as dependent variable and all other four variables as independent variable d) Predict the released variable in the test datasets of both the models and interpret the results carefully

logistic-regressiondecision-treetrain-test
9Long answer6 marks

Do as follows using in-built "USArrests" dataset with R script to knit PDF output: a) Create a "criminality scale" of four variables of this dataset using the Principal Component Analysis b) Compute the eigenvalues and interpret the PCA result carefully using Kaiser's criteria c) Show the Scree plot and decide on the number of components to retain with careful interpretation d) Revise the criminality scale using VARIMAX rotation and interpret the result carefully

OR

Do as follows using given dataset of 10 US cities in R studio with R script:

CityAtlantaChicagoDenverHoustonLos AngelesMiamiNew YorkSan FranciscoSeattleWashington D.C
Atlanta05871212701193660474821392182543
Chicago58709209401745118871318581737597
Denver121292008798311726163194910211494
Houston701940879013749681420164518911220
Los Angeles1936174583113740233924513479592300
Miami6041188172696823390109225942734923
New York7487131631142024511092025712408205
San Francisco2139185894916453472594257106782442
Seattle21821737102118919592734240867802329
Washington D.C543597149412202300923205244223290

a) Get dissimilarity distance as city.dissimilarity object b) Fit a classical multidimensional model using the city.dissimilarity object c) Get the summary of the model and interpret it carefully d) Get the bi-plot of the model and interpret it carefully

pcamdsusarrests
10Long answer6 marks

Do the following in R Studio with R script to knit PDF output: a) Create a dataset with 200 random cases and five random variables with your roll number as random seed b) Fit a hierarchical clustering model using single linkage and get the dendogram for this model c) Fit a hierarchical clustering model using complete linkage and get the dendogram for this model d) Fit a hierarchical clustering model using average linkage and get the dendogram for this model e) Find the best hierarchical clustering model for this data and locate the number of clusters for it

OR

Use the four variables of "USArrests" data file into R Studio and do as follows with R script to knit PDF output: a) Fit a k-means clustering model in the data with k=2, plot it in the single graph and interpret it carefully b) Add cluster centers for the plot of clusters formed with k=2 and interpret it carefully c) Fit a k-means clustering model in the data with k=3, plot it in the single graph and interpret it carefully d) Add cluster centers for the plot of clusters formed with k=3 and interpret it carefully

hierarchical-clusteringkmeans

Frequently asked questions

Where can I find the Master in Data Science (SMS, TU) Statistical Computing with R question paper 2082?
The full Master in Data Science (SMS, TU) Statistical Computing with R 2082 (Second Assessment) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Statistical Computing with R 2082 paper come with solutions?
Yes. Every question on this Statistical Computing with R past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the Master in Data Science (SMS, TU) Statistical Computing with R 2082 paper?
The Master in Data Science (SMS, TU) Statistical Computing with R 2082 paper carries 45 full marks and is meant to be completed in 120 minutes, across 10 questions.
Is practising this Statistical Computing with R past paper free?
Yes — reading and attempting this Statistical Computing with R past paper on Kekkei is completely free.