Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

Define probability and the different approaches to probability (classical, empirical, axiomatic). State and prove the addition theorem of probability for two events.

Probability and Its Approaches

Probability is a numerical measure of the likelihood that a particular event will occur, lying between 00 and 11. If P(A)=0P(A)=0 the event is impossible and if P(A)=1P(A)=1 the event is certain.

(a) Classical (Mathematical / a priori) Approach

If a random experiment has nn equally likely, mutually exclusive and exhaustive outcomes, and mm of them are favourable to event AA, then

P(A)=mn=number of favourable casestotal number of cases.P(A)=\frac{m}{n}=\frac{\text{number of favourable cases}}{\text{total number of cases}}.

It requires outcomes to be equally likely (e.g. a fair coin or die).

(b) Empirical (Statistical / a posteriori / relative-frequency) Approach

If an experiment is repeated NN times under identical conditions and event AA occurs ff times, then

P(A)=limNfN.P(A)=\lim_{N\to\infty}\frac{f}{N}.

It is used when outcomes are not equally likely and is based on observation/experiment.

(c) Axiomatic Approach (Kolmogorov)

For a sample space SS, a probability PP assigns to every event AA a real number P(A)P(A) satisfying:

  1. Non-negativity: P(A)0P(A)\ge 0.
  2. Certainty: P(S)=1P(S)=1.
  3. Additivity: for mutually exclusive events A1,A2,A_1,A_2,\dots, P(A1A2)=iP(Ai)P(A_1\cup A_2\cup\cdots)=\sum_i P(A_i).

Addition Theorem of Probability (for two events)

Statement: For any two events AA and BB of a sample space,

P(AB)=P(A)+P(B)P(AB).P(A\cup B)=P(A)+P(B)-P(A\cap B).

Proof: From the Venn diagram, ABA\cup B can be written as the union of mutually exclusive parts:

AB=A(BAˉ),A\cup B=A\cup(B\cap \bar A),

where AA and (BAˉ)(B\cap\bar A) are disjoint. By the additivity axiom,

P(AB)=P(A)+P(BAˉ).(1)P(A\cup B)=P(A)+P(B\cap\bar A).\quad(1)

Also, BB can be split into two disjoint parts (AB)(A\cap B) and (BAˉ)(B\cap\bar A):

B=(AB)(BAˉ)    P(B)=P(AB)+P(BAˉ),B=(A\cap B)\cup(B\cap\bar A)\;\Rightarrow\;P(B)=P(A\cap B)+P(B\cap\bar A),

so P(BAˉ)=P(B)P(AB).(2)P(B\cap\bar A)=P(B)-P(A\cap B).\quad(2)

Substituting (2) in (1):

P(AB)=P(A)+P(B)P(AB).\boxed{P(A\cup B)=P(A)+P(B)-P(A\cap B).}

Corollary: If AA and BB are mutually exclusive, P(AB)=0P(A\cap B)=0, so P(AB)=P(A)+P(B)P(A\cup B)=P(A)+P(B).

probability
2long10 marks

Define the normal distribution. State its properties and explain the standard normal variate. Solve a problem involving the area under the normal curve.

Normal Distribution

Definition

A continuous random variable XX is said to follow a normal distribution with mean μ\mu and variance σ2\sigma^2, written XN(μ,σ2)X\sim N(\mu,\sigma^2), if its probability density function is

f(x)=1σ2πe12(xμσ)2,<x<,  σ>0.f(x)=\frac{1}{\sigma\sqrt{2\pi}}\,e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2},\qquad -\infty<x<\infty,\;\sigma>0.

Properties

  1. The curve is bell-shaped and symmetrical about x=μx=\mu.
  2. Mean = Median = Mode = μ\mu.
  3. It is unimodal; the maximum ordinate is 1σ2π\frac{1}{\sigma\sqrt{2\pi}} at x=μx=\mu.
  4. The curve is asymptotic to the xx-axis on both sides; total area under it is 11.
  5. It is symmetric, so skewness =0=0 and kurtosis β2=3\beta_2=3 (mesokurtic).
  6. Points of inflexion occur at x=μ±σx=\mu\pm\sigma.
  7. Empirical rule: about 68.27%68.27\%, 95.45%95.45\% and 99.73%99.73\% of values lie within μ±σ\mu\pm\sigma, μ±2σ\mu\pm2\sigma, μ±3σ\mu\pm3\sigma respectively.
  8. Quartiles: Q1=μ0.6745σQ_1=\mu-0.6745\sigma, Q3=μ+0.6745σQ_3=\mu+0.6745\sigma; mean deviation =0.7979σ=0.7979\sigma.

Standard Normal Variate

The variate

Z=XμσZ=\frac{X-\mu}{\sigma}

follows the standard normal distribution N(0,1)N(0,1) with mean 00 and variance 11, and density ϕ(z)=12πez2/2\phi(z)=\frac{1}{\sqrt{2\pi}}e^{-z^2/2}. Areas under it are tabulated, allowing any normal probability to be evaluated.

Worked Problem

The weights of a large group of students are normally distributed with mean μ=50\mu=50 kg and standard deviation σ=5\sigma=5 kg. Find P(45<X<60)P(45<X<60).

Convert to ZZ:

z1=45505=1,z2=60505=2.z_1=\frac{45-50}{5}=-1,\qquad z_2=\frac{60-50}{5}=2.

From standard normal tables, P(0<Z<1)=0.3413P(0<Z<1)=0.3413 and P(0<Z<2)=0.4772P(0<Z<2)=0.4772. By symmetry,

P(1<Z<2)=P(0<Z<1)+P(0<Z<2)=0.3413+0.4772=0.8185.P(-1<Z<2)=P(0<Z<1)+P(0<Z<2)=0.3413+0.4772=0.8185.

Hence P(45<X<60)=0.8185P(45<X<60)=0.8185, i.e. about 81.85%81.85\% of students.

normal-distribution
3long10 marks

Define rank correlation. Compute Spearman's rank correlation coefficient for the given data, including the case of tied ranks.

Rank Correlation

Definition

When the actual numerical values of two variables are not available (or attributes can only be ranked, e.g. beauty, intelligence), the degree of association between the two sets of ranks is measured by Spearman's rank correlation coefficient:

ρ=16di2n(n21),\rho=1-\frac{6\sum d_i^2}{n(n^2-1)},

where did_i is the difference between the two ranks of the ii-th individual and nn is the number of individuals. It lies between 1-1 and +1+1.

Tied Ranks

When two or more items share equal values, each is assigned the average of the ranks they would otherwise occupy. For every group of mm tied observations, a correction factor m(m21)12\dfrac{m(m^2-1)}{12} is added to d2\sum d^2:

ρ=16[d2+m(m21)12]n(n21).\rho=1-\frac{6\left[\sum d^2+\sum\frac{m(m^2-1)}{12}\right]}{n(n^2-1)}.

Worked Example (with a tie)

Marks of 5 students in two subjects:

X3540405030
Y2225283020

Ranks of X: 301,  352,  40&4030\to1,\;35\to2,\;40\&40 occupy ranks 3 and 4 \to each gets 3+42=3.5,  505\frac{3+4}{2}=3.5,\;50\to5. Ranks of Y: 201,222,253,284,30520\to1,22\to2,25\to3,28\to4,30\to5.

XRX_XYRY_Yd=RXRYd=R_X-R_Yd2d^2
35222200
403.52530.50.25
403.5284-0.50.25
50530500
30120100

d2=0.5\sum d^2=0.5. There is one tie of m=2m=2 in X, correction =2(221)12=0.5=\frac{2(2^2-1)}{12}=0.5.

ρ=16(0.5+0.5)5(251)=16120=10.05=0.95.\rho=1-\frac{6(0.5+0.5)}{5(25-1)}=1-\frac{6}{120}=1-0.05=0.95.

Hence ρ=+0.95\rho=+0.95, a very high positive rank correlation.

correlationrank
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Define the axiomatic approach to probability.

The axiomatic approach to probability, due to A. N. Kolmogorov, defines probability as a real-valued set function PP on the events of a sample space SS satisfying three axioms:

  1. Non-negativity: P(A)0P(A)\ge 0 for every event AA.
  2. Normalization (certainty): P(S)=1P(S)=1.
  3. Additivity: if A1,A2,A_1,A_2,\dots are pairwise mutually exclusive events, then P ⁣(iAi)=iP(Ai)P\!\left(\bigcup_i A_i\right)=\sum_i P(A_i).

All other probability results (e.g. P(Aˉ)=1P(A)P(\bar A)=1-P(A), the addition theorem) are derived from these axioms. It is the most general and rigorous definition and applies even when outcomes are not equally likely.

probability
5short5 marks

State the properties of the normal distribution.

Properties of the Normal Distribution XN(μ,σ2)X\sim N(\mu,\sigma^2):

  1. The curve is bell-shaped and symmetrical about the mean x=μx=\mu.
  2. Mean = Median = Mode = μ\mu; it is unimodal.
  3. Total area under the curve is 11, and the curve is asymptotic to the xx-axis on both sides.
  4. Skewness =0=0 and kurtosis β2=3\beta_2=3 (mesokurtic).
  5. Points of inflexion at x=μ±σx=\mu\pm\sigma.
  6. Empirical rule: 68.27%68.27\%, 95.45%95.45\% and 99.73%99.73\% of observations lie within μ±σ\mu\pm\sigma, μ±2σ\mu\pm2\sigma and μ±3σ\mu\pm3\sigma respectively.
  7. Quartile deviation =0.6745σ=0.6745\sigma and mean deviation =0.7979σ=0.7979\sigma.
  8. Any linear function of a normal variable is also normal.
normal-distribution
6short5 marks

What is rank correlation?

Rank correlation is a measure of the degree of association (agreement) between two sets of ranks rather than actual numerical values. It is used when the data are qualitative attributes that can only be ranked (e.g. beauty, honesty, intelligence) or when ranks are easier to assign than exact measurements.

It is measured by Spearman's rank correlation coefficient:

ρ=16di2n(n21),\rho=1-\frac{6\sum d_i^2}{n(n^2-1)},

where did_i is the difference between the ranks of the ii-th pair and nn is the number of pairs. Its value lies between 1-1 (perfect disagreement) and +1+1 (perfect agreement), with 00 indicating no association.

correlation
7short5 marks

Define a standard normal variate.

A standard normal variate ZZ is a normal variable that has been standardised to have mean 00 and variance (standard deviation) 11. If XN(μ,σ2)X\sim N(\mu,\sigma^2), then

Z=XμσN(0,1).Z=\frac{X-\mu}{\sigma}\sim N(0,1).

Its probability density function is

ϕ(z)=12πez2/2,<z<.\phi(z)=\frac{1}{\sqrt{2\pi}}\,e^{-z^2/2},\qquad -\infty<z<\infty.

Standardising allows the areas (probabilities) of any normal distribution to be read from a single standard normal table.

normal-distribution
8short5 marks

What is the difference between correlation and regression?

Correlation vs. Regression

BasisCorrelationRegression
MeaningMeasures the degree and direction of linear relationship between two variablesDescribes the average functional relationship to estimate/predict one variable from another
SymmetrySymmetric: rxy=ryxr_{xy}=r_{yx}Not symmetric: regression of YY on XX differs from XX on YY
Cause–effectDoes not imply cause and effectStudies dependence of dependent on independent variable
OutputA single coefficient rr, 1r1-1\le r\le 1An equation Y=a+bXY=a+bX with coefficients a,ba,b
UseTo know if and how strongly variables move togetherTo predict the value of one variable for a given value of the other
UnitsIndependent of units of measurementCoefficient bb depends on units

In short, correlation tells us whether and how strongly two variables are related, while regression lets us estimate one variable from the other.

regression
9short5 marks

Define quartile deviation.

Quartile deviation (also called the semi-interquartile range) is a measure of dispersion equal to half the difference between the third quartile Q3Q_3 and the first quartile Q1Q_1:

Q.D.=Q3Q12.Q.D.=\frac{Q_3-Q_1}{2}.

It measures the spread of the middle 50%50\% of the data and is unaffected by extreme values. The corresponding relative measure is the coefficient of quartile deviation:

Coeff. of Q.D.=Q3Q1Q3+Q1.\text{Coeff. of Q.D.}=\frac{Q_3-Q_1}{Q_3+Q_1}.
dispersion
10short5 marks

State the addition theorem of probability.

Addition Theorem of Probability: For any two events AA and BB of a sample space,

P(AB)=P(A)+P(B)P(AB).P(A\cup B)=P(A)+P(B)-P(A\cap B).

It gives the probability that at least one of the two events occurs.

  • If AA and BB are mutually exclusive (AB=A\cap B=\varnothing), then P(AB)=0P(A\cap B)=0 and P(AB)=P(A)+P(B)P(A\cup B)=P(A)+P(B).
  • For three events: P(ABC)=P(A)+P(B)+P(C)P(AB)P(BC)P(AC)+P(ABC).P(A\cup B\cup C)=P(A)+P(B)+P(C)-P(A\cap B)-P(B\cap C)-P(A\cap C)+P(A\cap B\cap C).
probability
11short5 marks

What is the coefficient of skewness?

The coefficient of skewness is a relative (unit-free) measure of the degree and direction of asymmetry of a distribution. A symmetrical distribution has coefficient 00; a positive value indicates a right (positive) skew and a negative value a left (negative) skew.

Karl Pearson's coefficient:

Sk=MeanModeσorSk=3(MeanMedian)σ,S_k=\frac{\text{Mean}-\text{Mode}}{\sigma}\qquad\text{or}\qquad S_k=\frac{3(\text{Mean}-\text{Median})}{\sigma},

usually lying between 3-3 and +3+3.

Bowley's (quartile) coefficient:

Sk=Q3+Q12Q2Q3Q1,S_k=\frac{Q_3+Q_1-2Q_2}{Q_3-Q_1},

which lies between 1-1 and +1+1.

skewness
12short5 marks

Find the variance of 2, 4, 6, 8, 10.

Find the variance of 2,4,6,8,102,4,6,8,10.

Number of observations n=5n=5.

Mean:

xˉ=2+4+6+8+105=305=6.\bar x=\frac{2+4+6+8+10}{5}=\frac{30}{5}=6.

Deviations and squares:

xxxxˉx-\bar x(xxˉ)2(x-\bar x)^2
2-416
4-24
600
824
10416

(xxˉ)2=16+4+0+4+16=40.\sum (x-\bar x)^2 = 16+4+0+4+16 = 40.

Variance:

σ2=(xxˉ)2n=405=8.\sigma^2=\frac{\sum (x-\bar x)^2}{n}=\frac{40}{5}=8.

Hence the variance is σ2=8\sigma^2=8 (and standard deviation σ=82.83\sigma=\sqrt{8}\approx2.83).

dispersion

Frequently asked questions

Where can I find the BSc CSIT (TU) Statistics I (BSc CSIT, STA164) question paper 2079?
The full BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2079 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Statistics I (BSc CSIT, STA164) 2079 paper come with solutions?
Yes. Every question on this Statistics I (BSc CSIT, STA164) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2079 paper?
The BSc CSIT (TU) Statistics I (BSc CSIT, STA164) 2079 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Statistics I (BSc CSIT, STA164) past paper free?
Yes — reading and attempting this Statistics I (BSc CSIT, STA164) past paper on Kekkei is completely free.