Statistical Computing with R: what's likely to come
Every real question from 14 past papers across 5 years (board + internal assessments each year) mapped to its official syllabus unit. Each prediction shows its receipt: the actual years it appeared.
14 papers across 5 years
This program sits several exams each year: one official board exam plus internal assessments. Every sitting is analysed.
Topic predictions, ordered by what to study first
Every syllabus unit scored by how often it appears, its mark-weight, and its trend. See the exact questions behind each unit in the Explore-by-unit section below.
| # | Syllabus unit | Probability | Appeared | Avg marks | Syllabus weight | Exam vs syllabus | Trend | Questions |
|---|---|---|---|---|---|---|---|---|
| 1 | U3R Software for Data Summary and Visualization | Very likely100% | 35.4 | 21%10 lecture hrs | Over-examinedexam 28% · syllabus 21% | Steady | 3 recurring30 total | |
| 2 | U4R Software for Supervised Learning | Very likely100% | 33.6 | 21%10 lecture hrs | Over-examinedexam 27% · syllabus 21% | Steady | 1 recurring37 total | |
| 3 | U2R Software for Data Manipulation | Very likely100% | 21.6 | 12%6 lecture hrs | Balancedexam 17% · syllabus 12% | Steady | 4 recurring23 total | |
| 4 | U5R Software for Unsupervised Learning | Very likely100% | 16.2 | 17%8 lecture hrs | Balancedexam 13% · syllabus 17% | Steady | none repeat15 total | |
| 5 | U1R Software for Basic Programming | Very likely80% | 24 | 17%8 lecture hrs | Balancedexam 15% · syllabus 17% | Fading | 5 recurring18 total | |
| 6 | U6R Software for Communication | Occasional0% | 0 | 12%6 lecture hrs | Under-examinedexam 0% · syllabus 12% | Steady | None |
Explore by unit: every question, ranked
Pick a syllabus unit and walk its questions from most-important to asked-once. The fastest way to revise one topic end to end.
Load the “igraph” package in R studio and do the basic SNA as follows with R scripts to knit HTML output: a) Define g as graph object with (1,2,3,4) as its elements b) Plot the g and interpret it carefully c) Define g1 as graph object with (“Sita”, “Ram”, “Rita”, “Gita”, “Gita”, “Sita”, “Sita”, “Gita”, “Anita”, “Rita”, “Ram”, “Sita”) as its elements d) Plot g1 with node color as green, node size as 20, link color as red and link size as 10 and interpret it e) Get degree, closeness and betweenness of g1 and interpret them carefully.
OR
Do as follows in R console and then to R Studio with R script to knit HTML outputs: a) Open R console and then go to Help and Manuals (in PDF) and open “An Introduction to R” file b) Save this file in the working directory and import this pdf file in R studio using “pdftools” package c) Perform pre-processing and create ‘corpus’ using “tm” package d) Find the most frequent terms and create histogram of the most frequent terms e) Create word cloud of the corpus, color it using rainbow or R Color Brewer package f) Perform topic modelling and interpret the result carefully
Explain social network analysis and describe its use in a real-life situation with: a) Nodes b) Links c) Attributes
Explain the followings with examples for R:
a) Reference range based on mean b) Reference range based on median c) Outliers and extreme values
Do as follows in R Studio and do as follows with R script to knit PDF output: a) Open R and then go to Help and "Manuals in PDF" and open "An Introduction to R" file b) Import this pdf file in R studio using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards using "tm" package d) Find the most frequent terms and create histogram of the most frequent terms
OR
Do the following in R Studio using "mtcars" dataset with R script to knit PDF output: a) Get the bar plot of the "mpg" variable using ggplot2 package and interpret it carefully b) Get the boxplot of "mpg" variable using ggplot2 package and interpret it carefully c) Get scatterplot of "mpg" and "wt" variable using ggplot2 package and interpret it carefully d) Get appropriate correlation coefficient for "mpg" and "wt" and interpret it carefully
Do the following in R Studio using ggplot2 and dplyr packages and knit output as PDF file: a) Create a dataset with following variables: age (20-59 years), height (110 – 190 centimeters), weight (40-90 kg) with random 150 cases of each variable. Your roll number must be used to set the random seed. b) Compute body mass index (BMI) variable as: BMI = [(weight in kg) / (height in meter squared)] c) Create body mass index categories: <18, 18-24, 25-30, 30+ and label them as "underweight", "normal", "overweight" and "obese" respectively using dplyr package d) Show the percentage distribution of labelled BMI variable with pie chart using ggplot2 package
Load the "igraph" package in R studio and do the basic SNA as follows with R script and HTML output:
a) Define "g1" as graph object with ("R", "S", "S", "T", "T", "R", "R", "T", "U", "S") as its elements b) Plot "g1" with node color as green, node size as 30, link color as red and link size as 5 and interpret it c) Get degree, closeness and betweenness of g1 and interpret them carefully d) Get hub and communities of this data and interpret them carefully
OR
Do the following in R Studio using "airquality" dataset with R script:
a) Replace missing values of "Ozone" variable with median of this variable as corrected Ozone b) Get the histogram of the corrected Ozone variable using base R plot and interpret it carefully c) Get the boxplot of Wind variable using based R plot and interpret it carefully d) Get the Wind variable outliers using median and interquartile range and compare them with boxplot outlier values with justification
Do as follows in R Studio and do as follows with R script and HTML outputs:
a) Open R and go to Help and Manuals in PDF and open "An Introduction to R" file b) Import this pdf file in R using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards d) Find the most frequent terms, create its bar diagram and interpret carefully
OR
Do the following in R Studio using "airquality" dataset with R script:
a) Get the boxplot of Temp variable using ggplot2 package and interpret it carefully b) Create class intervals of Temp variable using dplyr package and show it as frequency distribution c) Get pie chart of Temp variable class intervals using ggplot2 package and interpret it carefully d) Get scatter plot of corrected Temp and Wind variables using ggplot2 package and interpret it carefully
Use the “mtcars” dataset of tidiverse package and do as follows with R script to knit HTML output: a) Plot histogram of mpg variable and interpret it carefully b) Refine the histogram by filling the bars with “blue” color and changing number of bins to 10 c) Add a vertical abline at mean of the mpg variable d) Plot Q-Q plot of mpg variable, add normal Q-Q line of red color on it and interpret it carefully e) Plot density plot of mpg variable without the border, fill it with yellow color and interpret it
OR
Use the “air quality” dataset of R to do following using base R to knit HTML output with R script: a) Create line plot of “Temp” with “Day” as the row index and interpret it carefully b) Create bar plot of “Temp” variable after defining class intervals systematically c) Create histogram of “Temp” variable and compare it with the bar plot of “Temp” variable d) Plot Normal Q-Q plot of “Temp” variable and interpret it carefully e) Create a scatter plot of “Temp” and “Wind” variables and interpret it carefully
Use the "aq" datafile in R studio and do as follows with R script to produce HTML outputs: a) Create bar plot of "Month" variable and interpret it carefully b) Create histogram of "Temp" variable and interpret it carefully c) Create line plot of "Temp" and "Day" variables and interpret it carefully d) Create scatterplot of "Ozone" and "Solar.R" variables and interpret it carefully
OR
Use the "mpg" dataset of tidyverse package and do as follows with R script to knit HTML output: a) Plot histogram of hwy variable and interpret it carefully b) Add a vertical abline at mean and standard deviation of hwy variable and interpret it carefully c) Locate mode of hwy variable graphically and interpret it carefully d) Locate median of hwy variable graphical and interpret it carefully
Do the following in R Studio using ggplot2 package with R script to knit PDF output:
a) Create a dataset with following variables: age (10-99 years), sex (male/female), educational levels (No education/Primary/Secondary/Beyond secondary), socio-economic status (Low, Middle, High) and body mass index (14 – 38) with random 200 cases of each variable. Your roll number must be used to set the random seed.
b) Create scatter plot of age and body mass index variables using ggplot2 package and interpret the result carefully.
c) Create classes of body mass index variable as: <18, 18-24, 25-30, 30+ and show it as pie chart using ggplot2 package and interpret it carefully
d) Create histogram of age variable with bin size of 15 using the ggplot2 package and interpret it carefully
Do the following in R Studio with R script:
a) Create a dataset with following variables: age (18-99 years), sex (male/female), educational levels (No education/Primary/Secondary/Beyond secondary), socio-economic status (Low, Middle, High) and body mass index (14 – 38) with 150 random cases of each variable. Your exam roll number must be used to set the random seed. b) Show a sub-divided bar diagram of body mass index variable by sex and socio-economic variables separately with interpretations. c) Show multiple bar diagram of age variable with sex and educational level variables and interpret it carefully. d) Show boxplots of age and body mass index variable separately and interpret the results carefully. e) Create histogram of age and body mass index variable separately and interpret the results carefully.
Use the cleaned "AQ" file in R studio and do as follows with R Scripts and HTML outputs:
a) Get reference range of "Ozone" variable using mean and standard deviation b) Plot histogram of "Ozone" variable and show the outliers of "Ozone" with reference range limits c) Get reference range of "Ozone" variable using median and inter-quartile range d) Plot boxplot of "Ozone" variable and show the outliers of "Ozone" with reference range limits e) Write a summary of the results obtained from the histogram and boxplot
OR
Do as follows in R Studio and do as follows with R script and HTML outputs:
a) Open R and then go to Help and Manuals if PDF and open "An Introduction to R" file b) Import this pdf file in R using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards d) Find the most frequent terms and create histogram of the most frequent e) Create word cloud of the corpus, color it using rainbow or R Color Brewer package f) Perform topic modelling and interpret the result carefully
Use the cleaned "AQ" object to do following in R Studio with R script to knit HTML output:
a) Create line plot of "Temp" with "Day" as the row index and interpret it carefully b) Create bar plot of "Temp" variable after defining class intervals systematically c) Create histogram of "Temp" variable and compare it with the bar plot of "Temp" variable d) Plot Normal Q-Q plot of "Temp" variable and interpret it carefully e) Create a scatter plot of "Temp" and "Wind" variables and interpret it carefully
Do the following in R Studio with tidy verse package using R Script to knit HTML output:
a) Define a tibble having country, year, cases and population variables with 10 random data each b) Transform this tibble to long format and interpret it carefully in terms of tidy data format c) Transform the cases variable as log of cases (LnCase) and population variable as log of population (LnPop) d) Create scatter plots of 1. Cases and population, 2. LnCase and population, 3. Cases and LnPop and 4. LnCase and LnPop in a single graph window and interpret it carefully
OR
Use the cleaned "AQ" file in R studio and do as follows with R Scripts and HTML outputs:
a) Get reference range of "Temp" variable using mean and standard deviation b) Plot histogram of "Temp" variable and show the outliers of "Temp" with reference range limits c) Get reference range of "Temp" variable using median and inter-quartile range d) Plot box plot of "Temp" variable and show the outliers of "Temp" with reference range limits e) Which measure of central tendency and dispersion should be used for this variable? Why?
Do the following in R Studio using ggplot2 package with R script:
(a) Create a dataset with following variables: age (18-99 years), sex (male/female), educational levels (No education/Primary/Secondary/Beyond secondary), socio-economic status (Low, Middle, High) and body mass index (14 – 38) with random 100 cases of each variable. Your roll number must be used to set the random seed. (b) Create a line chart of age variable using ggplot2 package and interpret the result carefully (c) Create scatter plot of age and body mass index variables using ggplot2 package and interpret the result carefully. (d) Create classes of body mass index variable as: <18, 18-24, 25-30, 30+ and show it as pie chart using ggplot2 package and interpret it carefully (e) Create classes of age variable as <15, 15-59 and 60+ and show it as bar diagram using the ggplot2 package and interpret it carefully
Do the following in R Studio with R script so that it can be knitted as PDF:
a) Prepare a column vector of miles per gallon (mpg) variable with random range between 10 to 50 of 500 values, do not forget to use your exam roll number as random seed to replicate the result a) Plot histogram of this "mpg" variable and interpret it carefully b) Refine the histogram by filling the bars with "blue" color and changing number of bins to 8 c) Add a vertical abline at the arithmetic mean of the mpg variable d) Plot Q-Q plot of mpg variable, add normal Q-Q line of red color on it and interpret it carefully e) Plot density plot of mpg variable without the border, fill it with yellow color and interpret it
OR
Use the "ggplot2" package and do as follow in R studio:
a) Define first layer of the ggplot object with diamond data, carat as x-axis and price as y-axis b) Add layer with geometric aesthetic as "point", statistics and position as "identity" c) Add layers with scale of y and x variables as continuous d) Add layer with coordinate system as Cartesian e) Add layer with appropriate title and interpret the resulting graph carefully
Import the "pollution.csv" file into R studio and do as follows with R script:
a) Check the structure of the data and explain class of each variable b) Change the attributes of "particulate matter", "date time" and "value" variables c) Get the summary of all the variables and replace the outliers as missing value d) Get summary statistics of "value" variables by "particulate matter" variable categories e) Write a summary of the results obtained in the earlier steps with interpretation and conclusion
Use the "pollution.csv" file imported and cleaned in R studio and do as follows with R script:
a) Create bar plot of "particulate matter" variable b) Create histogram of "value" variable c) Create line plot of "date time" and "value" variables d) Create histogram of "value" variable by particulate matter categories e) Write a summary of the results obtained in the earlier steps with interpretation and conclusion
Load the "term Doc Matrix.R data" file into R studio and do as follows with R script:
a) Define the term document matrix data object as matrix and store it as "m" object b) Define the frequencies of the terms using "row Sums" function and get the term frequencies c) Create a histogram of the term frequencies using ggplot2 package d) Create a histogram of the terms with 10 or more frequencies using ggplot2 package e) Create word cloud of term frequencies using word cloud package and interpret it carefully
OR
Load the "rdm Tweets.rdata" file in R studio and do as follows with "tm" and "tweetR" packages:
a) Convert twitter list as data frame and assign it as "df" object b) Create corpus using the "text" column of the data frame c) Perform pre-processing to clean the corpus for text mining d) Create term document matrix using the cleaned corpus e) Find the most frequent terms using the term document matrix f) Find the co-occurrence of the term "r" with filter of 0.1 and above.
Do the followings with R script in R Studio:
a) Define a column vector X with numbers between 1 and 30 b) Define another column vector Y with cubes of X c) Combine the two column vectors in a new data frame called DF d) Get plot X and Y variables and decide which type of relationship is seen e) Get the appropriate correlation coefficient for this plot and interpret it carefully
Use the "mtcars" dataset of R and do as follows:
a) Plot histogram of mpg variable and interpret it carefully b) Refine the histogram by filling the bars with "blue" color and changing number of bins to 10 c) Add a vertical abline at mean of the mpg variable d) Plot Q-Q plot of mpg variable, add normal Q-Q line of red color on it and interpret it carefully e) Plot density plot of mpg variable without the border, fill it with yellow color and interpret it
OR
Use the "ggplot2" package and do as follow in R studio:
a) Define first layer with diamond data, carat as x-axis and price as y-axis b) Add layer with geometric aesthetic as "point", statistics and position as "identity" c) Add layers with scale of y and x variables as continuous d) Add layer with coordinate system as Cartesian e) Add layer with appropriate title and interpret the resulting graph carefully
Explain the following concept with examples: a) Grammar of graphics – Wilkinson's approach b) Five number summary
Describe the following concepts with examples: a) Wilkinson's approach of grammar of graphics b) ggplot2 approach of grammar of graphics
Explain followings concepts with clear examples: a) Spearman's rank correlation b) Mann-Whitney U test
Explain the following concepts with focus on R software:
a) Boxplot with five number summaries with example b) Boxplot with outliers with example
Describe data visualization with focus on:
a) Concept of grammar of graphics with Wilkinson's approach
b) Layers in grammar of graphics with ggplot package's approach
c) Statistical transformations in grammar of graphics
Explain the following in R with example:
a) Nodes and edges b) Diameter c) Edge density
Explain the following concepts with focus on R software:
a) Test of normality b) Parametric tests c) Residual analysis
Explain the following concept with examples focusing on R software:
a) Measures of central tendency b) Measures of dispersion c) Measures of relative position
Explain the following concepts with examples focusing on R software:
a) Correlation b) Parametric tests c) Non-parametric tests
Most-asked questions across all years
The questions that come back exam after exam, grouped across years and ranked by how often they're asked. Open one to read its real past answer.
Load the “igraph” package in R studio and do the basic SNA as follows with R scripts to knit HTML output: a) Define g as graph object with (1,2,3,4) as its elements b) Plot the g and interpret it carefully c) Define g1 as graph object with (“Sita”, “Ram”, “Rita”, “Gita”, “Gita”, “Sita”, “Sita”, “Gita”, “Anita”, “Rita”, “Ram”, “Sita”) as its elements d) Plot g1 with node color as green, node size as 20, link color as red and link size as 10 and interpret it e) Get degree, closeness and betweenness of g1 and interpret them carefully.
OR
Do as follows in R console and then to R Studio with R script to knit HTML outputs: a) Open R console and then go to Help and Manuals (in PDF) and open “An Introduction to R” file b) Save this file in the working directory and import this pdf file in R studio using “pdftools” package c) Perform pre-processing and create ‘corpus’ using “tm” package d) Find the most frequent terms and create histogram of the most frequent terms e) Create word cloud of the corpus, color it using rainbow or R Color Brewer package f) Perform topic modelling and interpret the result carefully
Create a function and do as follows: a) Define a function: “roll” of a fair “die” twice with random sampling with replacement as true b) Get the first roll and interpret the result c) Get the second roll and interpret the result d) Get the third roll and interpret the result e) Write a summary of the results obtained in the earlier steps with conclusion
Describe data mining in data science with focus and examples on: a) Descriptive mining b) Predictive mining c) Prescriptive mining
Explain how you can do sub-setting with codes in R software: a) Define the 6x5 matrix and select last two rows b) Select third and fifth row with second and fourth column c) Add 3 new rows in this matrix
Explain different types of pipe operators with R codes and examples: a) Compound assignment operator b) Tee operator c) Exposition operator
Describe how can you import following types of data into the R software with simple examples/codes: a) a text file saved in the local computer b) a table embedded in any webpage c) json file with web API
Explain how to work efficiently with "big data" in R software in relation to the: a) Subsetting with base R and dplyr packages b) ff, ffbase and ffbase2 packages c) data.table package
Explain social network analysis and describe its use in a real-life situation with: a) Nodes b) Links c) Attributes
Lowest priority: asked only once (110)
- U22082
Load the "igraph" package in R studio and do the basic SNA with R script to knit PDF output: a) Define g1 as graph object with ("R", "S", "S", "T", "T", "R", "R", "T", "U", "S") as its elements b) Plot g1 with node color as green, node size as 30, link color as red and link size as 5 and interpret it c) Get degree of g1 and interpret them carefully d) Get closeness of g1 and interpret them carefully
OR
Do the following in R Studio using "airquality" dataset with R script to knit PDF output: a) Replace missing values of "Ozone" variable with its median and save it as "corrected Ozone" b) Get the histogram of the "corrected Ozone" variable using base R plot and interpret it carefully c) Get the boxplot of "corrected Ozone" variable using based R plot and interpret it carefully d) Get the appropriate summary measures of "corrected Ozone" variable with justification
- U32082
Do as follows in R Studio and do as follows with R script to knit PDF output: a) Open R and then go to Help and "Manuals in PDF" and open "An Introduction to R" file b) Import this pdf file in R studio using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards using "tm" package d) Find the most frequent terms and create histogram of the most frequent terms
OR
Do the following in R Studio using "mtcars" dataset with R script to knit PDF output: a) Get the bar plot of the "mpg" variable using ggplot2 package and interpret it carefully b) Get the boxplot of "mpg" variable using ggplot2 package and interpret it carefully c) Get scatterplot of "mpg" and "wt" variable using ggplot2 package and interpret it carefully d) Get appropriate correlation coefficient for "mpg" and "wt" and interpret it carefully
- U32082
Do the following in R Studio using ggplot2 and dplyr packages and knit output as PDF file: a) Create a dataset with following variables: age (20-59 years), height (110 – 190 centimeters), weight (40-90 kg) with random 150 cases of each variable. Your roll number must be used to set the random seed. b) Compute body mass index (BMI) variable as: BMI = [(weight in kg) / (height in meter squared)] c) Create body mass index categories: <18, 18-24, 25-30, 30+ and label them as "underweight", "normal", "overweight" and "obese" respectively using dplyr package d) Show the percentage distribution of labelled BMI variable with pie chart using ggplot2 package
- U42082
Do the following in R Studio using "airquality" dataset with R markdown to knit PDF output: a) Perform Shapiro-Wilk test on "Wind" variable and check normality of this variable b) Perform Bartlett test on "Wind" variable by "Month" variable and check equality of variance c) Fit 1-way ANOVA to compare "Wind" variable by "Month" variable and interpret the result carefully d) Fit the TukeyHSD post-hoc test with 95% confidence interval and interpret the result carefully
- U42082
Do the followings in R Studio using "mtcars" dataset with R markdown to knit PDF output: a) Divide the data into train and test datasets with 70:30 random splits and your roll number as random seed b) Fit a supervised linear regression model and KNN regression model on train data with "mpg" as dependent variable and all other variables as independent variable c) Predict the miles per gallon variable in the test data using these models and get values for "wt=6000 lbs" d) Compare the fit indices (R-square, MSE, RMSE) of the predicted models and choose the best model
- U42082
Do the following in R Studio using "airquality" dataset of R with R script to knit PDF output: a) Perform test of normality of "Temp" variable by each category of "Month" variable and interpret all the results carefully b) Perform test of equality of variance on "Temp" variable by each category of Month variable and interpret the result carefully c) Compare Temp by Month using the best statistical test for this data and interpret the result carefully d) Perform the post-hoc test and interpret the results carefully
- U42082
Do the following in R Studio with R script to knit PDF output: a) Create a dataset with 200 random cases, 1 random binary (1 and 0) variable and four random non-binary (categorical and continuous) variables with your roll number as random seed b) Divide it into train and test datasets with 70:30 random splits c) Fit a supervised logistic regression and decision tree classification models on train data with binary variable as dependent variable and all other four variables as independent variable d) Predict the released variable in the test datasets of both the models and interpret the results carefully
- U52082
Use the "USArrests" data and do as follows in the R Studio with R markdown to knit PDF output: a) Fit a hierarchical clustering model using single linkage and get the dendogram for this model b) Fit a hierarchical clustering model using complete linkage and get the dendogram for this model c) Fit a hierarchical clustering model using average linkage and get the dendogram for this model d) Show the number of clusters (k) to retain for the data using ablines in the dendogram of the best model
- U52082
Do as follows using in-built "USArrests" dataset with R script to knit PDF output: a) Create a "criminality scale" of four variables of this dataset using the Principal Component Analysis b) Compute the eigenvalues and interpret the PCA result carefully using Kaiser's criteria c) Show the Scree plot and decide on the number of components to retain with careful interpretation d) Revise the criminality scale using VARIMAX rotation and interpret the result carefully
OR
Do as follows using given dataset of 10 US cities in R studio with R script:
City Atlanta Chicago Denver Houston Los Angeles Miami New York San Francisco Seattle Washington D.C Atlanta 0 587 1212 701 1936 604 748 2139 2182 543 Chicago 587 0 920 940 1745 1188 713 1858 1737 597 Denver 1212 920 0 879 831 1726 1631 949 1021 1494 Houston 701 940 879 0 1374 968 1420 1645 1891 1220 Los Angeles 1936 1745 831 1374 0 2339 2451 347 959 2300 Miami 604 1188 1726 968 2339 0 1092 2594 2734 923 New York 748 713 1631 1420 2451 1092 0 2571 2408 205 San Francisco 2139 1858 949 1645 347 2594 2571 0 678 2442 Seattle 2182 1737 1021 1891 959 2734 2408 678 0 2329 Washington D.C 543 597 1494 1220 2300 923 205 2442 2329 0 a) Get dissimilarity distance as city.dissimilarity object b) Fit a classical multidimensional model using the city.dissimilarity object c) Get the summary of the model and interpret it carefully d) Get the bi-plot of the model and interpret it carefully
- U52082
Do the following in R Studio with R script to knit PDF output: a) Create a dataset with 200 random cases and five random variables with your roll number as random seed b) Fit a hierarchical clustering model using single linkage and get the dendogram for this model c) Fit a hierarchical clustering model using complete linkage and get the dendogram for this model d) Fit a hierarchical clustering model using average linkage and get the dendogram for this model e) Find the best hierarchical clustering model for this data and locate the number of clusters for it
OR
Use the four variables of "USArrests" data file into R Studio and do as follows with R script to knit PDF output: a) Fit a k-means clustering model in the data with k=2, plot it in the single graph and interpret it carefully b) Add cluster centers for the plot of clusters formed with k=2 and interpret it carefully c) Fit a k-means clustering model in the data with k=3, plot it in the single graph and interpret it carefully d) Add cluster centers for the plot of clusters formed with k=3 and interpret it carefully
- U12081
Open the R or R studio software and do the followings with R script to produce HTML output: a) Define integers from 1 to 15 using three different coding approaches in R b) Define these five numbers: 1.1, 2.2, 3.3, 4.4 and 5.5 and save it as column vector N c) Add, subtract, multiply and divide vector R from vector N and interpret the results carefully d) Define a list using "This" "is" "my" "first" "programming" "in" "R" and save it as L
- U12081
Import the "airquality" data as "aq" object in R studio and do as follows in R script for HTML output: a) Check the structure of the aq dataset and explain class of each variable b) Explain how to handle missing value of two variables of aq dataset c) Get the summary of all the variables and interpret them carefully d) Get summary statistics of "temp" variables by "Month" categories and interpret it carefully
Study smart and sit a probable paper
How far a few high-priority topics take you, and a full mock paper built from the most likely questions, mirroring the real exam structure.
Study smart, not hard
Study the units in priority order. Each bar shows the share of total marks you'd have covered by then. The top 4 units alone cover ~80% of marks.
Most Probable Paper
Mirrors the real structure · 45 marks · based on 5 past papers
- 1.[3 marks]
Explain the following concept with examples: a) Grammar of graphics – Wilkinson's approach b) Five number summary
Asked once (2082); including the board exam 1× (2082); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 2.[3 marks]
Explain the following concepts with focus on R software:
a) Boxplot with five number summaries with example b) Boxplot with outliers with example
Asked once (2081); including the board exam 1× (2081); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 3.[3 marks]
Explain the followings with examples for R:
a) Reference range based on mean b) Reference range based on median c) Outliers and extreme values
This question appeared 2× (same year); including the board exam 1× (2080); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 4.[3 marks]
Explain the following concepts with focus on R software:
a) Test of normality b) Parametric tests c) Residual analysis
Asked once (2080); including the board exam 1× (2080); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 5.[3 marks]
Explain the following concept with examples focusing on R software:
a) Measures of central tendency b) Measures of dispersion c) Measures of relative position
Asked once (2079); including the board exam 1× (2079); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 1.[6 marks]
Load the “igraph” package in R studio and do the basic SNA as follows with R scripts to knit HTML output: a) Define g as graph object with (1,2,3,4) as its elements b) Plot the g and interpret it carefully c) Define g1 as graph object with (“Sita”, “Ram”, “Rita”, “Gita”, “Gita”, “Sita”, “Sita”, “Gita”, “Anita”, “Rita”, “Ram”, “Sita”) as its elements d) Plot g1 with node color as green, node size as 20, link color as red and link size as 10 and interpret it e) Get degree, closeness and betweenness of g1 and interpret them carefully.
OR
Do as follows in R console and then to R Studio with R script to knit HTML outputs: a) Open R console and then go to Help and Manuals (in PDF) and open “An Introduction to R” file b) Save this file in the working directory and import this pdf file in R studio using “pdftools” package c) Perform pre-processing and create ‘corpus’ using “tm” package d) Find the most frequent terms and create histogram of the most frequent terms e) Create word cloud of the corpus, color it using rainbow or R Color Brewer package f) Perform topic modelling and interpret the result carefully
This question has recurred in 3 of 5 years; so far only in internal assessments, not the board; and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 2.[6 marks]
Do as follows in R Studio and do as follows with R script to knit PDF output: a) Open R and then go to Help and "Manuals in PDF" and open "An Introduction to R" file b) Import this pdf file in R studio using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards using "tm" package d) Find the most frequent terms and create histogram of the most frequent terms
OR
Do the following in R Studio using "mtcars" dataset with R script to knit PDF output: a) Get the bar plot of the "mpg" variable using ggplot2 package and interpret it carefully b) Get the boxplot of "mpg" variable using ggplot2 package and interpret it carefully c) Get scatterplot of "mpg" and "wt" variable using ggplot2 package and interpret it carefully d) Get appropriate correlation coefficient for "mpg" and "wt" and interpret it carefully
Asked once (2082); including the board exam 1× (2082); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 3.[6 marks]
Load the "igraph" package in R studio and do the basic SNA as follows with R script and HTML output:
a) Define "g1" as graph object with ("R", "S", "S", "T", "T", "R", "R", "T", "U", "S") as its elements b) Plot "g1" with node color as green, node size as 30, link color as red and link size as 5 and interpret it c) Get degree, closeness and betweenness of g1 and interpret them carefully d) Get hub and communities of this data and interpret them carefully
OR
Do the following in R Studio using "airquality" dataset with R script:
a) Replace missing values of "Ozone" variable with median of this variable as corrected Ozone b) Get the histogram of the corrected Ozone variable using base R plot and interpret it carefully c) Get the boxplot of Wind variable using based R plot and interpret it carefully d) Get the Wind variable outliers using median and interquartile range and compare them with boxplot outlier values with justification
Asked once (2081); including the board exam 1× (2081); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 4.[6 marks]
Do as follows in R Studio and do as follows with R script and HTML outputs:
a) Open R and go to Help and Manuals in PDF and open "An Introduction to R" file b) Import this pdf file in R using "pdftools" package c) Perform pre-processing and create 'corpus' afterwards d) Find the most frequent terms, create its bar diagram and interpret carefully
OR
Do the following in R Studio using "airquality" dataset with R script:
a) Get the boxplot of Temp variable using ggplot2 package and interpret it carefully b) Create class intervals of Temp variable using dplyr package and show it as frequency distribution c) Get pie chart of Temp variable class intervals using ggplot2 package and interpret it carefully d) Get scatter plot of corrected Temp and Wind variables using ggplot2 package and interpret it carefully
Asked once (2081); including the board exam 1× (2081); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.
- 5.[6 marks]
Do the following in R Studio with R script:
a) Create a dataset with following variables: age (18-99 years), sex (male/female), educational levels (No education/Primary/Secondary/Beyond secondary), socio-economic status (Low, Middle, High) and body mass index (14 – 38) with 150 random cases of each variable. Your exam roll number must be used to set the random seed. b) Show a sub-divided bar diagram of body mass index variable by sex and socio-economic variables separately with interpretations. c) Show multiple bar diagram of age variable with sex and educational level variables and interpret it carefully. d) Show boxplots of age and body mass index variable separately and interpret the results carefully. e) Create histogram of age and body mass index variable separately and interpret the results carefully.
Asked once (2080); including the board exam 1× (2080); and its topic (R Software for Data Summary and Visualization) appears in 100% of years.