Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Explain the measures of central tendency. Compute mean, median and mode for the given grouped frequency distribution and establish the empirical relationship among them.

Measures of Central Tendency

A measure of central tendency is a single value that represents the centre or typical value of a data set. The three principal measures are:

  • Mean (xˉ\bar{x}): the arithmetic average of all observations.
  • Median (MM): the middle value when data are arranged in order.
  • Mode (ZZ): the value that occurs most frequently.

Formulae for Grouped Data

Mean: xˉ=fixifi\bar{x} = \dfrac{\sum f_i x_i}{\sum f_i}, where xix_i is the mid-value of each class.

Median: M=L+N2Cf×hM = L + \dfrac{\frac{N}{2} - C}{f}\times h, where LL = lower boundary of the median class, N=fN = \sum f, CC = cumulative frequency before the median class, ff = frequency of the median class, hh = class width.

Mode: Z=L+f1f02f1f0f2×hZ = L + \dfrac{f_1 - f_0}{2f_1 - f_0 - f_2}\times h, where f1f_1 = frequency of the modal class, f0f_0 and f2f_2 are the frequencies of the preceding and following classes.

Worked Example

Consider the distribution:

Class0–1010–2020–3030–4040–50
ff581593

Here N=40N = 40.

Mean: mid-values x=5,15,25,35,45x = 5, 15, 25, 35, 45.

fx=25+120+375+315+135=970,xˉ=97040=24.25.\sum f x = 25 + 120 + 375 + 315 + 135 = 970,\quad \bar{x} = \frac{970}{40} = 24.25.

Median: N/2=20N/2 = 20. Cumulative frequencies: 5,13,28,37,405, 13, 28, 37, 40. The median class is 20–30 (L=20L = 20, C=13C = 13, f=15f = 15, h=10h = 10).

M=20+201315×10=20+4.67=24.67.M = 20 + \frac{20 - 13}{15}\times 10 = 20 + 4.67 = 24.67.

Mode: modal class is 20–30 (f1=15f_1 = 15, f0=8f_0 = 8, f2=9f_2 = 9, L=20L = 20, h=10h = 10).

Z=20+1582(15)89×10=20+713×10=20+5.38=25.38.Z = 20 + \frac{15 - 8}{2(15) - 8 - 9}\times 10 = 20 + \frac{7}{13}\times 10 = 20 + 5.38 = 25.38.

Empirical Relationship

For a moderately skewed (asymmetrical) distribution:

Mode=3Median2Mean.\text{Mode} = 3\,\text{Median} - 2\,\text{Mean}.

Check: 3(24.67)2(24.25)=74.0148.5=25.5125.383(24.67) - 2(24.25) = 74.01 - 48.5 = 25.51 \approx 25.38, which agrees closely with the computed mode. The relation shows that the median always lies between the mean and the mode (about one-third of the way from the mean to the mode).

central-tendency
2long10 marks

Define the Poisson distribution. State its properties and applications. Given the mean number of accidents per day is 2, find the probability of 0, 1 and at least 2 accidents on a given day.

Poisson Distribution

A discrete random variable XX follows a Poisson distribution with parameter λ>0\lambda > 0 if it gives the probability of a given number of independent rare events occurring in a fixed interval of time or space, when these events occur at a constant average rate. Its probability mass function is

P(X=x)=eλλxx!,x=0,1,2,P(X = x) = \frac{e^{-\lambda}\lambda^{x}}{x!},\quad x = 0, 1, 2, \dots

where λ\lambda is the mean (average) number of occurrences.

Properties

  1. It is a discrete distribution defined for x=0,1,2,,x = 0, 1, 2, \dots, \infty.
  2. Mean = Variance = λ\lambda (a characteristic property).
  3. It has a single parameter λ\lambda.
  4. It is the limiting form of the binomial distribution when nn \to \infty, p0p \to 0, with np=λnp = \lambda finite.
  5. It is positively skewed; skewness =1/λ= 1/\sqrt{\lambda} and kurtosis =3+1/λ= 3 + 1/\lambda, both decreasing as λ\lambda increases.
  6. The sum of independent Poisson variates is also Poisson (additive property).

Applications

  • Number of accidents, telephone calls, or customer arrivals per unit time.
  • Number of printing errors per page or defects per unit length.
  • Number of radioactive decay emissions; arrivals in queueing theory.

Numerical Solution

Given mean λ=2\lambda = 2, so e2=0.1353e^{-2} = 0.1353.

P(X=0)=e2200!=e2=0.1353.P(X = 0) = \frac{e^{-2}\,2^{0}}{0!} = e^{-2} = 0.1353. P(X=1)=e2211!=2e2=0.2707.P(X = 1) = \frac{e^{-2}\,2^{1}}{1!} = 2e^{-2} = 0.2707. P(X2)=1P(X=0)P(X=1)=10.13530.2707=0.5940.P(X \geq 2) = 1 - P(X=0) - P(X=1) = 1 - 0.1353 - 0.2707 = 0.5940.

Results: P(0)=0.1353P(0) = 0.1353, P(1)=0.2707P(1) = 0.2707, P(2)=0.5940P(\geq 2) = 0.5940.

poissondistribution
3long10 marks

Define regression. Derive the two regression equations (Y on X and X on Y). Explain the properties of the regression coefficients.

Regression

Regression is a statistical method that estimates the average relationship between two (or more) variables and is used to predict the value of a dependent variable from a known value of an independent variable. The line that gives the best estimate is the line of regression, fitted by the principle of least squares (minimising the sum of squared deviations of observed points from the line).

Derivation of the Regression Line of Y on X

Let the line be Y=a+bXY = a + bX. By least squares we minimise (YabX)2\sum (Y - a - bX)^2. Setting the partial derivatives to zero gives the normal equations:

Y=na+bX,\sum Y = na + b\sum X, XY=aX+bX2.\sum XY = a\sum X + b\sum X^2.

Solving for bb gives the regression coefficient of YY on XX:

byx=nXYXYnX2(X)2=Cov(X,Y)σx2=rσyσx.b_{yx} = \frac{n\sum XY - \sum X \sum Y}{n\sum X^2 - (\sum X)^2} = \frac{\text{Cov}(X,Y)}{\sigma_x^2} = r\frac{\sigma_y}{\sigma_x}.

The line of regression of Y on X is therefore

YYˉ=byx(XXˉ).Y - \bar{Y} = b_{yx}(X - \bar{X}).

Derivation of the Regression Line of X on Y

By symmetry, taking X=a+bYX = a' + b'Y and minimising (XabY)2\sum (X - a' - b'Y)^2 gives

bxy=nXYXYnY2(Y)2=Cov(X,Y)σy2=rσxσy.b_{xy} = \frac{n\sum XY - \sum X \sum Y}{n\sum Y^2 - (\sum Y)^2} = \frac{\text{Cov}(X,Y)}{\sigma_y^2} = r\frac{\sigma_x}{\sigma_y}.

The line of regression of X on Y is

XXˉ=bxy(YYˉ).X - \bar{X} = b_{xy}(Y - \bar{Y}).

Properties of the Regression Coefficients

  1. The correlation coefficient is the geometric mean of the two regression coefficients: r=±byxbxyr = \pm\sqrt{b_{yx}\cdot b_{xy}}.
  2. Both regression coefficients have the same sign, which is also the sign of rr.
  3. If one regression coefficient is greater than 1, the other must be less than 1 (since their product r21r^2 \leq 1).
  4. The AM of the two coefficients is greater than or equal to rr: byx+bxy2r\frac{b_{yx} + b_{xy}}{2} \geq r.
  5. Regression coefficients are independent of the change of origin but not of scale.
  6. The two regression lines intersect at the means (Xˉ,Yˉ)(\bar{X}, \bar{Y}).
regression
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Define harmonic mean.

The harmonic mean (HM) of nn observations is the reciprocal of the arithmetic mean of the reciprocals of the values:

HM=ni=1n1xi.\text{HM} = \frac{n}{\sum_{i=1}^{n}\frac{1}{x_i}}.

For a frequency distribution, HM=N(fi/xi)\text{HM} = \dfrac{N}{\sum (f_i/x_i)}, where N=fiN = \sum f_i. It is the appropriate average for rates and ratios (e.g. average speed over equal distances, price per unit) and gives more weight to smaller values.

central-tendency
5short5 marks

State the empirical relation between mean, median and mode.

For a moderately skewed (asymmetrical) distribution, the empirical relationship among the three measures of central tendency is

Mode=3Median2Mean.\text{Mode} = 3\,\text{Median} - 2\,\text{Mean}.

Equivalently, MeanMode=3(MeanMedian)\text{Mean} - \text{Mode} = 3(\text{Mean} - \text{Median}). The median always lies between the mean and the mode, dividing the distance in the ratio 1:21:2. For a perfectly symmetrical distribution, Mean = Median = Mode.

central-tendency
6short5 marks

What are the properties of the Poisson distribution?

Properties of the Poisson Distribution

  1. It is a discrete distribution with pmf P(X=x)=eλλxx!P(X=x) = \dfrac{e^{-\lambda}\lambda^x}{x!}, x=0,1,2,x = 0,1,2,\dots
  2. Mean = Variance = λ\lambda.
  3. It has only one parameter, λ\lambda (= npnp in the binomial limit).
  4. It is the limiting case of the binomial distribution when nn \to \infty, p0p \to 0 with np=λnp = \lambda fixed.
  5. It is positively skewed (β1=1/λ\beta_1 = 1/\lambda); the skewness decreases as λ\lambda increases, approaching normality for large λ\lambda.
  6. Additive property: the sum of independent Poisson variates with parameters λ1,λ2\lambda_1, \lambda_2 is Poisson with parameter λ1+λ2\lambda_1 + \lambda_2.
poisson
7short5 marks

Define the line of regression.

A line of regression is the straight line that gives the best estimate (in the least-squares sense) of one variable for a given value of the other, by minimising the sum of squared deviations of the observed points from the line. For two variables XX and YY there are two such lines:

  • Regression of Y on X: YYˉ=byx(XXˉ)Y - \bar{Y} = b_{yx}(X - \bar{X}), with byx=rσyσxb_{yx} = r\dfrac{\sigma_y}{\sigma_x}, used to estimate YY from XX.
  • Regression of X on Y: XXˉ=bxy(YYˉ)X - \bar{X} = b_{xy}(Y - \bar{Y}), with bxy=rσxσyb_{xy} = r\dfrac{\sigma_x}{\sigma_y}, used to estimate XX from YY.

Both lines pass through the point of means (Xˉ,Yˉ)(\bar{X}, \bar{Y}) and coincide only when r=±1r = \pm 1.

regression
8short5 marks

What is conditional probability?

Conditional probability is the probability of an event AA occurring given that another event BB has already occurred (with P(B)>0P(B) > 0). It is defined as

P(AB)=P(AB)P(B).P(A \mid B) = \frac{P(A \cap B)}{P(B)}.

It restricts the sample space to those outcomes in which BB occurs. From this, the multiplication rule follows: P(AB)=P(B)P(AB)P(A \cap B) = P(B)\,P(A\mid B). If AA and BB are independent, then P(AB)=P(A)P(A\mid B) = P(A).

Example: drawing two cards without replacement, P(2nd is King1st is King)=351P(\text{2nd is King} \mid \text{1st is King}) = \dfrac{3}{51}.

probability
9short5 marks

Define mean deviation.

Mean deviation (MD) is a measure of dispersion equal to the arithmetic mean of the absolute deviations of the observations from a central value (usually the mean or median).

About the mean: MDxˉ=xixˉn\text{MD}_{\bar{x}} = \dfrac{\sum |x_i - \bar{x}|}{n} (for grouped data, fixixˉN\dfrac{\sum f_i |x_i - \bar{x}|}{N}).

About the median: MDM=xiMn\text{MD}_{M} = \dfrac{\sum |x_i - M|}{n}.

Absolute values are used so that positive and negative deviations do not cancel. Mean deviation is least when taken about the median. The coefficient of mean deviation = MDcentral value\dfrac{\text{MD}}{\text{central value}}.

dispersion
10short5 marks

What is a probability mass function?

A probability mass function (pmf) gives the probability that a discrete random variable XX takes each of its possible values. If XX takes values x1,x2,x_1, x_2, \dots, the pmf is

p(xi)=P(X=xi),p(x_i) = P(X = x_i),

and it must satisfy two conditions:

  1. p(xi)0p(x_i) \geq 0 for all ii (non-negativity), and
  2. ip(xi)=1\sum_i p(x_i) = 1 (total probability is one).

Example: for a fair die, p(x)=16p(x) = \tfrac{1}{6} for x=1,2,,6x = 1,2,\dots,6. The pmf is the discrete counterpart of the probability density function used for continuous variables.

distribution
11short5 marks

State the properties of a good measure of central tendency.

Properties of a Good Measure of Central Tendency

According to Yule and Kendall, an ideal average should:

  1. Be rigidly (clearly) defined by a mathematical formula.
  2. Be based on all the observations of the data.
  3. Be easy to understand and simple to compute.
  4. Be least affected by sampling fluctuations (sampling stability).
  5. Be suitable for further algebraic / mathematical treatment.
  6. Not be unduly affected by extreme values (outliers).

The arithmetic mean satisfies most of these except the last (it is affected by extreme values).

central-tendency
12short5 marks

Find the mode of 2, 3, 3, 5, 7, 3, 8.

The mode is the value that occurs most frequently. Arranging the data 2,3,3,5,7,3,82, 3, 3, 5, 7, 3, 8 by frequency:

  • 33 occurs three times,
  • every other value occurs once.

Since 33 has the highest frequency, the mode = 3.

central-tendency

Frequently asked questions

Where can I find the BSc CSIT (TU) Statistics I (BSc CSIT, STA164) question paper 2078?
The full BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2078 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Statistics I (BSc CSIT, STA164) 2078 paper come with solutions?
Yes. Every question on this Statistics I (BSc CSIT, STA164) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2078 paper?
The BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2078 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Statistics I (BSc CSIT, STA164) past paper free?
Yes — reading and attempting this Statistics I (BSc CSIT, STA164) past paper on Kekkei is completely free.