Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Define dispersion. Explain different measures of dispersion. Compute the standard deviation and coefficient of variation for a given frequency distribution and comment on consistency.

Dispersion

Dispersion (or variation) measures the extent to which individual observations in a data set are scattered about a central value (mean/median). A small dispersion means the data are clustered closely around the average; a large dispersion means they are spread out. Dispersion describes the consistency, reliability, and homogeneity of a series.

Measures of Dispersion

1. Absolute measures (expressed in the same units as the data):

  • Range =LS= L - S (largest value minus smallest value). Simplest but unstable.
  • Quartile Deviation (Semi-IQR) =Q3Q12= \dfrac{Q_3 - Q_1}{2}. Based on the middle 50% of data.
  • Mean Deviation =fxxˉN= \dfrac{\sum f|x-\bar{x}|}{N}. Average of absolute deviations from the mean (or median).
  • Standard Deviation σ=f(xxˉ)2N\sigma = \sqrt{\dfrac{\sum f(x-\bar{x})^2}{N}}. The most important and widely used measure.

2. Relative measures (unit-free, used to compare two series):

  • Coefficient of Range =LSL+S= \dfrac{L-S}{L+S}
  • Coefficient of Quartile Deviation =Q3Q1Q3+Q1= \dfrac{Q_3-Q_1}{Q_3+Q_1}
  • Coefficient of Variation C.V.=σxˉ×100%C.V. = \dfrac{\sigma}{\bar{x}}\times 100\%

Worked Computation (illustrative frequency distribution)

ClassffMid xxfxfxfx2fx^2
0–105525125
10–208151201800
20–3015253759375
30–407352458575
40–5054522510125
Total4099030000

Mean: xˉ=fxN=99040=24.75\bar{x} = \dfrac{\sum fx}{N} = \dfrac{990}{40} = 24.75

Standard deviation:

σ=fx2N(fxN)2=3000040(24.75)2\sigma = \sqrt{\frac{\sum fx^2}{N} - \left(\frac{\sum fx}{N}\right)^2} = \sqrt{\frac{30000}{40} - (24.75)^2} =750612.56=137.44=11.72= \sqrt{750 - 612.56} = \sqrt{137.44} = 11.72

Coefficient of variation:

C.V.=σxˉ×100=11.7224.75×100=47.36%C.V. = \frac{\sigma}{\bar{x}}\times 100 = \frac{11.72}{24.75}\times 100 = 47.36\%

Comment on Consistency

The series whose C.V. is smaller is more consistent / uniform / stable, while a higher C.V. indicates greater variability. (When comparing two distributions, compute C.V. for each and conclude that the one with the lower C.V. is more consistent.) Here, a C.V. of about 47%47\% indicates a fairly high degree of variability in the data.

dispersionstandard-deviation
2long10 marks

Explain the method of least squares. Fit a straight line of regression of Y on X to the given data and estimate the value of Y for a given X.

Method of Least Squares

The method of least squares fits a line (or curve) to data so that the sum of squares of the vertical deviations of observed points from the fitted line is a minimum. For the line of regression of YY on XX, written as

Y=a+bX,Y = a + bX,

we minimise S=(YiabXi)2S = \sum (Y_i - a - bX_i)^2. Setting S/a=0\partial S/\partial a = 0 and S/b=0\partial S/\partial b = 0 gives the two normal equations:

Y=na+bX\sum Y = na + b\sum X XY=aX+bX2\sum XY = a\sum X + b\sum X^2

Solving these gives:

b=nXYXYnX2(X)2,a=YˉbXˉb = \frac{n\sum XY - \sum X \sum Y}{n\sum X^2 - (\sum X)^2}, \qquad a = \bar{Y} - b\bar{X}

Here b=bYXb = b_{YX} is the regression coefficient of YY on XX, the average change in YY per unit change in XX.

Worked Example (fitting YY on XX)

For data X:1,2,3,4,5X: 1,2,3,4,5 and Y:2,4,5,4,6Y: 2,4,5,4,6:

XXYYXYXYX2X^2
1221
2484
35159
441616
563025
15217155

n=5, X=15, Y=21, XY=71, X2=55n=5,\ \sum X=15,\ \sum Y=21,\ \sum XY=71,\ \sum X^2=55

b=5(71)(15)(21)5(55)(15)2=355315275225=4050=0.8b = \frac{5(71) - (15)(21)}{5(55) - (15)^2} = \frac{355 - 315}{275 - 225} = \frac{40}{50} = 0.8 Xˉ=3,Yˉ=4.2,a=4.20.8(3)=1.8\bar{X} = 3,\quad \bar{Y} = 4.2,\qquad a = 4.2 - 0.8(3) = 1.8

Regression line:   Y=1.8+0.8X\;Y = 1.8 + 0.8X

Estimating Y for a given X

For example, at X=6X = 6:

Y^=1.8+0.8(6)=1.8+4.8=6.6\hat{Y} = 1.8 + 0.8(6) = 1.8 + 4.8 = 6.6

So the estimated value of YY when X=6X=6 is 6.6. (Substitute the actual XX asked in the question into the fitted line to obtain the estimate.)

regressionleast-squares
3long10 marks

Define a random variable. Explain probability mass function and probability density function. Find the mean and variance of a given probability distribution.

Random Variable

A random variable is a real-valued function XX that assigns a numerical value to each outcome (sample point) of a random experiment, i.e. X:SRX: S \to \mathbb{R}. It is discrete if it takes a finite or countably infinite set of values (e.g. number of heads in tosses) and continuous if it can take any value in an interval (e.g. height, time).

Probability Mass Function (PMF)

For a discrete random variable XX, the PMF is

p(x)=P(X=x)p(x) = P(X = x)

satisfying (i) p(x)0p(x) \ge 0 for all xx, and (ii) xp(x)=1\sum_x p(x) = 1. It gives the probability that XX equals each specific value.

Probability Density Function (PDF)

For a continuous random variable XX, the PDF f(x)f(x) satisfies (i) f(x)0f(x) \ge 0, (ii) f(x)dx=1\int_{-\infty}^{\infty} f(x)\,dx = 1, and the probability over an interval is

P(aXb)=abf(x)dx.P(a \le X \le b) = \int_a^b f(x)\,dx.

For a continuous variable P(X=x)=0P(X=x)=0; only interval probabilities are meaningful.

Mean and Variance (worked example)

Let XX have the distribution:

xx0123
p(x)p(x)0.10.30.40.2

Mean (Expectation):

E(X)=xp(x)=0(0.1)+1(0.3)+2(0.4)+3(0.2)=0.3+0.8+0.6=1.7E(X) = \sum x\,p(x) = 0(0.1) + 1(0.3) + 2(0.4) + 3(0.2) = 0.3 + 0.8 + 0.6 = 1.7

E(X2)E(X^2):

E(X2)=x2p(x)=0+1(0.3)+4(0.4)+9(0.2)=0.3+1.6+1.8=3.7E(X^2) = \sum x^2 p(x) = 0 + 1(0.3) + 4(0.4) + 9(0.2) = 0.3 + 1.6 + 1.8 = 3.7

Variance:

Var(X)=E(X2)[E(X)]2=3.7(1.7)2=3.72.89=0.81Var(X) = E(X^2) - [E(X)]^2 = 3.7 - (1.7)^2 = 3.7 - 2.89 = 0.81

Standard deviation =0.81=0.9= \sqrt{0.81} = 0.9.

For a continuous distribution the same idea is used with integrals: E(X)=xf(x)dxE(X)=\int x f(x)\,dx and Var(X)=x2f(x)dx[E(X)]2Var(X)=\int x^2 f(x)\,dx - [E(X)]^2.

random-variabledistribution
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Define dispersion and its measures.

Dispersion is the degree to which the values of a data set are scattered or spread about a central (average) value. It indicates the consistency and homogeneity of the data.

Its measures are of two kinds:

  • Absolute measures (same units as data): Range, Quartile Deviation, Mean Deviation, Standard Deviation.
  • Relative measures (unit-free, for comparison): Coefficient of Range, Coefficient of Quartile Deviation, Coefficient of Mean Deviation, and Coefficient of Variation (σxˉ×100%)\left(\dfrac{\sigma}{\bar{x}}\times 100\%\right).

A smaller dispersion means more consistent/uniform data; a larger dispersion means more variability.

dispersion
5short5 marks

What is the range of a data set? Find the range of 12, 15, 20, 8, 25.

The range is the simplest absolute measure of dispersion, defined as the difference between the largest (LL) and smallest (SS) values in a data set:

Range=LS.\text{Range} = L - S.

For the data 12,15,20,8,2512, 15, 20, 8, 25: largest L=25L = 25, smallest S=8S = 8.

Range=258=17.\text{Range} = 25 - 8 = \mathbf{17}.
dispersion
6short5 marks

Define regression coefficients.

Regression coefficients are the slopes of the two lines of regression; each measures the average rate of change of one variable per unit change in the other.

  • Regression coefficient of YY on XX:   bYX=rσyσx=Cov(x,y)σx2\;b_{YX} = r\dfrac{\sigma_y}{\sigma_x} = \dfrac{\text{Cov}(x,y)}{\sigma_x^2} — change in YY per unit change in XX.
  • Regression coefficient of XX on YY:   bXY=rσxσy=Cov(x,y)σy2\;b_{XY} = r\dfrac{\sigma_x}{\sigma_y} = \dfrac{\text{Cov}(x,y)}{\sigma_y^2} — change in XX per unit change in YY.

Important properties:

  • r=±bYXbXYr = \pm\sqrt{b_{YX}\cdot b_{XY}}, so the correlation coefficient is the geometric mean of the two regression coefficients.
  • Both have the same sign as rr.
  • Their product cannot exceed 1: bYXbXY1b_{YX}\cdot b_{XY} \le 1.
regression
7short5 marks

State the multiplication theorem of probability.

The multiplication theorem of probability gives the probability of the joint occurrence of two events.

For dependent events AA and BB (general form):

P(AB)=P(A)P(BA)=P(B)P(AB),P(A \cap B) = P(A)\cdot P(B\mid A) = P(B)\cdot P(A\mid B),

where P(BA)P(B\mid A) is the conditional probability of BB given that AA has occurred.

For independent events, P(BA)=P(B)P(B\mid A) = P(B), so the theorem reduces to:

P(AB)=P(A)P(B).P(A \cap B) = P(A)\cdot P(B).

This extends to nn events: P(A1A2An)=P(A1)P(A2A1)P(AnA1An1)P(A_1\cap A_2\cap\dots\cap A_n) = P(A_1)P(A_2\mid A_1)\cdots P(A_n\mid A_1\cap\dots\cap A_{n-1}).

probability
8short5 marks

What is a random variable?

A random variable is a real-valued function XX that assigns a numerical value to every outcome of a random experiment, i.e. X:SRX: S \to \mathbb{R} where SS is the sample space.

It is classified as:

  • Discrete random variable — takes a finite or countably infinite number of distinct values (e.g. number of heads when a coin is tossed three times: 0,1,2,30,1,2,3).
  • Continuous random variable — can assume any value within an interval (e.g. height, weight, time).

Example: Tossing two coins, let XX = number of heads. Then XX takes values 0,1,20, 1, 2 with probabilities 14,12,14\tfrac14, \tfrac12, \tfrac14.

random-variable
9short5 marks

Define positive and negative correlation.

Correlation describes the direction and strength of the linear relationship between two variables.

  • Positive correlation: both variables move in the same direction — as one increases, the other also increases (and as one decreases, the other decreases). The correlation coefficient rr lies in (0,+1](0, +1]. Example: height and weight; income and expenditure.

  • Negative correlation: the two variables move in opposite directions — as one increases, the other decreases. Here rr lies in [1,0)[-1, 0). Example: price and quantity demanded; speed and time taken for a fixed distance.

r=+1r = +1 is perfect positive and r=1r = -1 is perfect negative correlation.

correlation
10short5 marks

What is the median of 7, 3, 9, 5, 11?

The median is the middle value of a data set when the observations are arranged in ascending (or descending) order.

Arrange the data 7,3,9,5,117, 3, 9, 5, 11 in ascending order:

3, 5, 7, 9, 11.3,\ 5,\ 7,\ 9,\ 11.

There are n=5n = 5 (odd) observations, so the median is the (n+12)th=3rd\left(\dfrac{n+1}{2}\right)^{\text{th}} = 3^{\text{rd}} value.

Median=7.\text{Median} = \mathbf{7}.
central-tendency
11short5 marks

Define the probability density function.

The probability density function (PDF) is the function f(x)f(x) that describes the distribution of a continuous random variable XX. It satisfies:

  1. f(x)0f(x) \ge 0 for all xx (non-negative),
  2. f(x)dx=1\displaystyle\int_{-\infty}^{\infty} f(x)\,dx = 1 (total area under the curve equals 1),
  3. The probability that XX lies in an interval is the area under the curve:
P(aXb)=abf(x)dx.P(a \le X \le b) = \int_a^b f(x)\,dx.

Note that for a continuous variable P(X=x)=0P(X = x) = 0; the PDF gives probabilities only over intervals, not at single points. The PDF is the derivative of the cumulative distribution function: f(x)=ddxF(x)f(x) = \dfrac{d}{dx}F(x).

distribution
12short5 marks

What is meant by a scatter diagram?

A scatter diagram (scatter plot) is a graphical method of representing bivariate data, in which each pair of observations (xi,yi)(x_i, y_i) is plotted as a point on the XYXY-plane (with XX on the horizontal axis and YY on the vertical axis).

It gives a quick visual idea of the nature, direction, and degree of correlation between the two variables:

  • If points cluster around an upward-sloping line → positive correlation.
  • If points cluster around a downward-sloping line → negative correlation.
  • If points lie exactly on a line → perfect correlation (r=±1r = \pm 1).
  • If points are scattered randomly with no pattern → little or no correlation.

It is simple to draw and is usually the first step before computing a numerical correlation coefficient.

correlation

Frequently asked questions

Where can I find the BSc CSIT (TU) Statistics I (BSc CSIT, STA164) question paper 2075?
The full BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2075 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Statistics I (BSc CSIT, STA164) 2075 paper come with solutions?
Yes. Every question on this Statistics I (BSc CSIT, STA164) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2075 paper?
The BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Statistics I (BSc CSIT, STA164) past paper free?
Yes — reading and attempting this Statistics I (BSc CSIT, STA164) past paper on Kekkei is completely free.