Browse papers
LevelMaster in Data Science (SMS, TU)
SubjectStatistical Computing with R
Year2079 BS
Exam sessionBoard
Full marks45
Time allowed120 minutes
Questions10, all with step-by-step solutions
A

Group A

5 questions·3 marks each
1Short answer3 marks

Describe the following concepts with focus on R software:

a) Loops b) Function c) Pipe

r-loopsr-functionsr-pipe
2Short answer3 marks

Explain following concepts with examples focusing on R software:

a) Big data b) Data wrangling c) Tidy data

big-datadata-wranglingtidy-data
3Short answer3 marks

Explain the following concept with examples focusing on R software:

a) Measures of central tendency b) Measures of dispersion c) Measures of relative position

central-tendencydispersionrelative-position
4Short answer3 marks

Explain the following concepts with examples focusing on R software:

a) Correlation b) Parametric tests c) Non-parametric tests

correlationparametric-testsnon-parametric-tests
5Short answer3 marks

Compare following model with focus on R software:

a) Naïve Bayes and Support Vector Machine b) Decision Tree and Random Forest c) Feed-forward and feed-backward neural network

naive-bayesdecision-treeneural-network
B

Group B

5 questions·6 marks each
6Long answer6 marks

Do the following in R Studio with R script so that it can be knitted as PDF:

a) Prepare a column vector of miles per gallon (mpg) variable with random range between 10 to 50 of 500 values, do not forget to use your exam roll number as random seed to replicate the result a) Plot histogram of this "mpg" variable and interpret it carefully b) Refine the histogram by filling the bars with "blue" color and changing number of bins to 8 c) Add a vertical abline at the arithmetic mean of the mpg variable d) Plot Q-Q plot of mpg variable, add normal Q-Q line of red color on it and interpret it carefully e) Plot density plot of mpg variable without the border, fill it with yellow color and interpret it

OR

Use the "ggplot2" package and do as follow in R studio:

a) Define first layer of the ggplot object with diamond data, carat as x-axis and price as y-axis b) Add layer with geometric aesthetic as "point", statistics and position as "identity" c) Add layers with scale of y and x variables as continuous d) Add layer with coordinate system as Cartesian e) Add layer with appropriate title and interpret the resulting graph carefully

r-histogramggplot2data-visualization
7Long answer6 marks

Do the following in R Studio with R script so that it can be knitted as PDF:

a) Prepare a data with 100 random observations and two variables: miles per gallon (mpg) with random range between 10 to 50 and transmission gears (gear) as random binary variable (3=3 gear, 4=four gear and 5=five gears), do not forget to use your class roll number as random seed to replicate the result b) Perform goodness-of-fit test on miles per gallon (mpg) variable to check if it follows normal distribution or not c) Perform goodness-of-fit test on miles per gallon (mpg) variable to check if the variances of mpg are equal or not on gears variable categories d) Perform the best 1-way analysis of variance test based on goodness-of-fit results with justification e) Can you use this test for this data? Interpret the result carefully, if applicable.

goodness-of-fitanovar-statistics
8Long answer6 marks

Do the followings in R Studio using R script so that it can be knitted as PDF:

b) Prepare a data with 200 random observations and four variables: miles per gallon (mpg) with random range between 10 to 50; transmission (am) as random binary variable (0=automatic, 1=Manual), weight (wt) with random range of 1 to 10 and horse power (hp) with random range of 125 and 400, do not forget to use your exam roll number as random seed to replicate the result c) Divide this data into train and test datasets with 70:30 random splits with your exam roll number as random seed for replication d) Fit a supervised linear regression model for the train data e) Explain the model fit and BLUE coefficients for the fitted model f) Predict the mpg variable in the test data, get fit indices and interpret them carefully

linear-regressiontrain-test-splitr-modeling
9Long answer6 marks

Do the following in R Studio with R script so that it can be knitted as PDF:

a) Prepare a data with four random variables and 300 observations: miles per gallon (mpg) with random range between 10 to 50; transmission (am) as random binary variable (0=automatic, 1=Manual), weight (wt) with random range of 1 to 10 and horse power (hp) with random range of 125 and 400, do not forget to use your exam roll number as random seed to replicate the result b) Divide this data into train and test datasets with 80:20 random splits with your exam roll number as random seed for replication c) Fit a supervised logistic regression model on train data with transmission (am) as dependent variable and miles per gallon (mpg), horse power (hp) and weight (wt) as independent variable d) Predict the transmission variable in the test data and interpret the predicted result carefully e) Get the confusion matrix, sensitivity, specificity of the predicted model and interpret them carefully

logistic-regressionconfusion-matrixr-modeling
10Long answer6 marks

Do as follows using "mtcars" dataset in R studio with R script so that it can be knitted as PDF:

a) Check the head and the structure of the dataset b) Create a "cars scale" using the Principal Component Analysis (PCA) model based on nine numerical variables with centering and scaling of the variables c) Based on the PCA summary result, how many components must be extracted? Why? d) Get the bi-plot of the fitted model and interpret it carefully e) Improve the fitted model with VARIMAX process and interpret the results carefully

OR

Do as follows using "USArrests" dataset in R studio with R script so that it can be knitted as PDF:

a) Get dissimilarity distance as state.dissimilarity object b) Fit a classical multidimensional model using the state.dissimilarity object c) Get the summary of the model and interpret it carefully d) Get the plot of the model and interpret it carefully e) Compare this model with the first two components from principal component analysis in this data

pcamultidimensional-scalingr-modeling

Frequently asked questions

Where can I find the Master in Data Science (SMS, TU) Statistical Computing with R question paper 2079?
The full Master in Data Science (SMS, TU) Statistical Computing with R 2079 (Board) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Statistical Computing with R 2079 paper come with solutions?
Yes. Every question on this Statistical Computing with R past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the Master in Data Science (SMS, TU) Statistical Computing with R 2079 paper?
The Master in Data Science (SMS, TU) Statistical Computing with R 2079 paper carries 45 full marks and is meant to be completed in 120 minutes, across 10 questions.
Is practising this Statistical Computing with R past paper free?
Yes — reading and attempting this Statistical Computing with R past paper on Kekkei is completely free.