Probability Engine · MDS 503

Statistical Computing with R: what's likely to come

Every real question from 14 past papers across 5 years (board + internal assessments each year) mapped to its official syllabus unit. Each prediction shows its receipt: the actual years it appeared.

14
Papers analyzed
2078-2082 · 5 yrs, multiple sittings
6
Syllabus units
from the official course
5
Very likely units
high-probability topics
4
Units = 80% of marks
study these first

14 papers across 5 years

This program sits several exams each year: one official board exam plus internal assessments. Every sitting is analysed.

20783 sittings
first assessmentfirst reassessmentsecond assessment
20791 sitting
board
20804 sittings
boardfa reassessmentfirst assessmentsecond assessment
20814 sittings
boardfirst assessmentfirst reassessmentsecond assessment
20822 sittings
boardsecond assessment
Which exams to include?Showing: All exams
01The ranking

Topic predictions, ordered by what to study first

Every syllabus unit scored by how often it appears, its mark-weight, and its trend. See the exact questions behind each unit in the Explore-by-unit section below.

#Syllabus unitProbabilityAppearedAvg marksSyllabus weightExam vs syllabusTrendQuestions
1U3R Software for Data Summary and VisualizationVery likely100%35.421%10 lecture hrsOver-examinedexam 28% · syllabus 21%Steady3 recurring30 total
2U4R Software for Supervised LearningVery likely100%33.621%10 lecture hrsOver-examinedexam 27% · syllabus 21%Steady1 recurring37 total
3U2R Software for Data ManipulationVery likely100%21.612%6 lecture hrsBalancedexam 17% · syllabus 12%Steady4 recurring23 total
4U5R Software for Unsupervised LearningVery likely100%16.217%8 lecture hrsBalancedexam 13% · syllabus 17%Steadynone repeat15 total
5U1R Software for Basic ProgrammingVery likely80%2417%8 lecture hrsBalancedexam 15% · syllabus 17%Fading5 recurring18 total
6U6R Software for CommunicationOccasional0%
012%6 lecture hrsUnder-examinedexam 0% · syllabus 12%SteadyNone
02Drill down

Explore by unit: every question, ranked

Pick a syllabus unit and walk its questions from most-important to asked-once. The fastest way to revise one topic end to end.

U3R Software for Data Summary and Visualization
30 questions · 36 appearances · 35.4 avg marks
03High yield

Most-asked questions across all years

The questions that come back exam after exam, grouped across years and ranked by how often they're asked. Open one to read its real past answer.

Lowest priority: asked only once (110)

  • U2

    Load the "igraph" package in R studio and do the basic SNA with R script to knit PDF output: a) Define g1 as graph object with ("R", "S", "S", "T", "T", "R", "R", "T", "U", "S") as its elements b) Plot g1 with node color as green, node size as 30, link color as red and link size as 5 and interpret it c) Get degree of g1 and interpret them carefully d) Get closeness of g1 and interpret them carefully

    OR

    Do the following in R Studio using "airquality" dataset with R script to knit PDF output: a) Replace missing values of "Ozone" variable with its median and save it as "corrected Ozone" b) Get the histogram of the "corrected Ozone" variable using base R plot and interpret it carefully c) Get the boxplot of "corrected Ozone" variable using based R plot and interpret it carefully d) Get the appropriate summary measures of "corrected Ozone" variable with justification

    2082
  • U3

    Do as follows in R Studio and do as follows with R script to knit PDF output: a) Open R and then go to Help and "Manuals in PDF" and open "An Introduction to R" file b) Import this pdf file in R studio using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards using "tm" package d) Find the most frequent terms and create histogram of the most frequent terms

    OR

    Do the following in R Studio using "mtcars" dataset with R script to knit PDF output: a) Get the bar plot of the "mpg" variable using ggplot2 package and interpret it carefully b) Get the boxplot of "mpg" variable using ggplot2 package and interpret it carefully c) Get scatterplot of "mpg" and "wt" variable using ggplot2 package and interpret it carefully d) Get appropriate correlation coefficient for "mpg" and "wt" and interpret it carefully

    2082
  • U3

    Do the following in R Studio using ggplot2 and dplyr packages and knit output as PDF file: a) Create a dataset with following variables: age (20-59 years), height (110 – 190 centimeters), weight (40-90 kg) with random 150 cases of each variable. Your roll number must be used to set the random seed. b) Compute body mass index (BMI) variable as: BMI = [(weight in kg) / (height in meter squared)] c) Create body mass index categories: <18, 18-24, 25-30, 30+ and label them as "underweight", "normal", "overweight" and "obese" respectively using dplyr package d) Show the percentage distribution of labelled BMI variable with pie chart using ggplot2 package

    2082
  • U4

    Do the following in R Studio using "airquality" dataset with R markdown to knit PDF output: a) Perform Shapiro-Wilk test on "Wind" variable and check normality of this variable b) Perform Bartlett test on "Wind" variable by "Month" variable and check equality of variance c) Fit 1-way ANOVA to compare "Wind" variable by "Month" variable and interpret the result carefully d) Fit the TukeyHSD post-hoc test with 95% confidence interval and interpret the result carefully

    2082
  • U4

    Do the followings in R Studio using "mtcars" dataset with R markdown to knit PDF output: a) Divide the data into train and test datasets with 70:30 random splits and your roll number as random seed b) Fit a supervised linear regression model and KNN regression model on train data with "mpg" as dependent variable and all other variables as independent variable c) Predict the miles per gallon variable in the test data using these models and get values for "wt=6000 lbs" d) Compare the fit indices (R-square, MSE, RMSE) of the predicted models and choose the best model

    2082
  • U4

    Do the following in R Studio using "airquality" dataset of R with R script to knit PDF output: a) Perform test of normality of "Temp" variable by each category of "Month" variable and interpret all the results carefully b) Perform test of equality of variance on "Temp" variable by each category of Month variable and interpret the result carefully c) Compare Temp by Month using the best statistical test for this data and interpret the result carefully d) Perform the post-hoc test and interpret the results carefully

    2082
  • U4

    Do the following in R Studio with R script to knit PDF output: a) Create a dataset with 200 random cases, 1 random binary (1 and 0) variable and four random non-binary (categorical and continuous) variables with your roll number as random seed b) Divide it into train and test datasets with 70:30 random splits c) Fit a supervised logistic regression and decision tree classification models on train data with binary variable as dependent variable and all other four variables as independent variable d) Predict the released variable in the test datasets of both the models and interpret the results carefully

    2082
  • U5

    Use the "USArrests" data and do as follows in the R Studio with R markdown to knit PDF output: a) Fit a hierarchical clustering model using single linkage and get the dendogram for this model b) Fit a hierarchical clustering model using complete linkage and get the dendogram for this model c) Fit a hierarchical clustering model using average linkage and get the dendogram for this model d) Show the number of clusters (k) to retain for the data using ablines in the dendogram of the best model

    2082
  • U5

    Do as follows using in-built "USArrests" dataset with R script to knit PDF output: a) Create a "criminality scale" of four variables of this dataset using the Principal Component Analysis b) Compute the eigenvalues and interpret the PCA result carefully using Kaiser's criteria c) Show the Scree plot and decide on the number of components to retain with careful interpretation d) Revise the criminality scale using VARIMAX rotation and interpret the result carefully

    OR

    Do as follows using given dataset of 10 US cities in R studio with R script:

    CityAtlantaChicagoDenverHoustonLos AngelesMiamiNew YorkSan FranciscoSeattleWashington D.C
    Atlanta05871212701193660474821392182543
    Chicago58709209401745118871318581737597
    Denver121292008798311726163194910211494
    Houston701940879013749681420164518911220
    Los Angeles1936174583113740233924513479592300
    Miami6041188172696823390109225942734923
    New York7487131631142024511092025712408205
    San Francisco2139185894916453472594257106782442
    Seattle21821737102118919592734240867802329
    Washington D.C543597149412202300923205244223290

    a) Get dissimilarity distance as city.dissimilarity object b) Fit a classical multidimensional model using the city.dissimilarity object c) Get the summary of the model and interpret it carefully d) Get the bi-plot of the model and interpret it carefully

    2082
  • U5

    Do the following in R Studio with R script to knit PDF output: a) Create a dataset with 200 random cases and five random variables with your roll number as random seed b) Fit a hierarchical clustering model using single linkage and get the dendogram for this model c) Fit a hierarchical clustering model using complete linkage and get the dendogram for this model d) Fit a hierarchical clustering model using average linkage and get the dendogram for this model e) Find the best hierarchical clustering model for this data and locate the number of clusters for it

    OR

    Use the four variables of "USArrests" data file into R Studio and do as follows with R script to knit PDF output: a) Fit a k-means clustering model in the data with k=2, plot it in the single graph and interpret it carefully b) Add cluster centers for the plot of clusters formed with k=2 and interpret it carefully c) Fit a k-means clustering model in the data with k=3, plot it in the single graph and interpret it carefully d) Add cluster centers for the plot of clusters formed with k=3 and interpret it carefully

    2082
  • U1

    Open the R or R studio software and do the followings with R script to produce HTML output: a) Define integers from 1 to 15 using three different coding approaches in R b) Define these five numbers: 1.1, 2.2, 3.3, 4.4 and 5.5 and save it as column vector N c) Add, subtract, multiply and divide vector R from vector N and interpret the results carefully d) Define a list using "This" "is" "my" "first" "programming" "in" "R" and save it as L

    2081
  • U1

    Import the "airquality" data as "aq" object in R studio and do as follows in R script for HTML output: a) Check the structure of the aq dataset and explain class of each variable b) Explain how to handle missing value of two variables of aq dataset c) Get the summary of all the variables and interpret them carefully d) Get summary statistics of "temp" variables by "Month" categories and interpret it carefully

    2081
04The strategy

Study smart and sit a probable paper

How far a few high-priority topics take you, and a full mock paper built from the most likely questions, mirroring the real exam structure.

Study smart, not hard

Study the units in priority order. Each bar shows the share of total marks you'd have covered by then. The top 4 units alone cover ~80% of marks.

1R Software for Data Summary and Visualization28%
2+ R Software for Supervised Learning55%
3+ R Software for Data Manipulation72%
4+ R Software for Unsupervised Learning85%
← study up to here for ~80% of marks
5+ R Software for Basic Programming100%
6+ R Software for Communication100%

Most Probable Paper

Mirrors the real structure · 45 marks · based on 5 past papers

Group A
  1. 1.

    Explain the following concept with examples: a) Grammar of graphics – Wilkinson's approach b) Five number summary

    [3 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2082 paper →

    Asked once (2082); including the board exam 1× (2082); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

  2. 2.

    Explain the following concepts with focus on R software:

    a) Boxplot with five number summaries with example b) Boxplot with outliers with example

    [3 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2081 paper →

    Asked once (2081); including the board exam 1× (2081); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

  3. 3.

    Explain the followings with examples for R:

    a) Reference range based on mean b) Reference range based on median c) Outliers and extreme values

    [3 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2080 paper →

    This question appeared 2× (same year); including the board exam 1× (2080); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

  4. 4.

    Explain the following concepts with focus on R software:

    a) Test of normality b) Parametric tests c) Residual analysis

    [3 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2080 paper →

    Asked once (2080); including the board exam 1× (2080); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

  5. 5.

    Explain the following concept with examples focusing on R software:

    a) Measures of central tendency b) Measures of dispersion c) Measures of relative position

    [3 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2079 paper →

    Asked once (2079); including the board exam 1× (2079); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

Group B
  1. 1.

    Load the “igraph” package in R studio and do the basic SNA as follows with R scripts to knit HTML output: a) Define g as graph object with (1,2,3,4) as its elements b) Plot the g and interpret it carefully c) Define g1 as graph object with (“Sita”, “Ram”, “Rita”, “Gita”, “Gita”, “Sita”, “Sita”, “Gita”, “Anita”, “Rita”, “Ram”, “Sita”) as its elements d) Plot g1 with node color as green, node size as 20, link color as red and link size as 10 and interpret it e) Get degree, closeness and betweenness of g1 and interpret them carefully.

    OR

    Do as follows in R console and then to R Studio with R script to knit HTML outputs: a) Open R console and then go to Help and Manuals (in PDF) and open “An Introduction to R” file b) Save this file in the working directory and import this pdf file in R studio using “pdftools” package c) Perform pre-processing and create ‘corpus’ using “tm” package d) Find the most frequent terms and create histogram of the most frequent terms e) Create word cloud of the corpus, color it using rainbow or R Color Brewer package f) Perform topic modelling and interpret the result carefully

    [6 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2081 paper →

    This question has recurred in 3 of 5 years; so far only in internal assessments, not the board; and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

  2. 2.

    Do as follows in R Studio and do as follows with R script to knit PDF output: a) Open R and then go to Help and "Manuals in PDF" and open "An Introduction to R" file b) Import this pdf file in R studio using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards using "tm" package d) Find the most frequent terms and create histogram of the most frequent terms

    OR

    Do the following in R Studio using "mtcars" dataset with R script to knit PDF output: a) Get the bar plot of the "mpg" variable using ggplot2 package and interpret it carefully b) Get the boxplot of "mpg" variable using ggplot2 package and interpret it carefully c) Get scatterplot of "mpg" and "wt" variable using ggplot2 package and interpret it carefully d) Get appropriate correlation coefficient for "mpg" and "wt" and interpret it carefully

    [6 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2082 paper →

    Asked once (2082); including the board exam 1× (2082); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

  3. 3.

    Load the "igraph" package in R studio and do the basic SNA as follows with R script and HTML output:

    a) Define "g1" as graph object with ("R", "S", "S", "T", "T", "R", "R", "T", "U", "S") as its elements b) Plot "g1" with node color as green, node size as 30, link color as red and link size as 5 and interpret it c) Get degree, closeness and betweenness of g1 and interpret them carefully d) Get hub and communities of this data and interpret them carefully

    OR

    Do the following in R Studio using "airquality" dataset with R script:

    a) Replace missing values of "Ozone" variable with median of this variable as corrected Ozone b) Get the histogram of the corrected Ozone variable using base R plot and interpret it carefully c) Get the boxplot of Wind variable using based R plot and interpret it carefully d) Get the Wind variable outliers using median and interquartile range and compare them with boxplot outlier values with justification

    [6 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2081 paper →

    Asked once (2081); including the board exam 1× (2081); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

  4. 4.

    Do as follows in R Studio and do as follows with R script and HTML outputs:

    a) Open R and go to Help and Manuals in PDF and open "An Introduction to R" file b) Import this pdf file in R using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards d) Find the most frequent terms, create its bar diagram and interpret carefully

    OR

    Do the following in R Studio using "airquality" dataset with R script:

    a) Get the boxplot of Temp variable using ggplot2 package and interpret it carefully b) Create class intervals of Temp variable using dplyr package and show it as frequency distribution c) Get pie chart of Temp variable class intervals using ggplot2 package and interpret it carefully d) Get scatter plot of corrected Temp and Wind variables using ggplot2 package and interpret it carefully

    [6 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2081 paper →

    Asked once (2081); including the board exam 1× (2081); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

  5. 5.

    Do the following in R Studio with R script:

    a) Create a dataset with following variables: age (18-99 years), sex (male/female), educational levels (No education/Primary/Secondary/Beyond secondary), socio-economic status (Low, Middle, High) and body mass index (14 – 38) with 150 random cases of each variable. Your exam roll number must be used to set the random seed. b) Show a sub-divided bar diagram of body mass index variable by sex and socio-economic variables separately with interpretations. c) Show multiple bar diagram of age variable with sex and educational level variables and interpret it carefully. d) Show boxplots of age and body mass index variable separately and interpret the results carefully. e) Create histogram of age and body mass index variable separately and interpret the results carefully.

    [6 marks]
    R Software for Data Summary and VisualizationVery likelyfrom 2080 paper →

    Asked once (2080); including the board exam 1× (2080); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.

Topics are the official MDS 503 syllabus units. Predictions are data-driven probabilities computed from 14 past papers (2078-2082) by mapping each real question to its syllabus unit. They indicate what has historically been likely, not guaranteed questions. Always study the full syllabus.