Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What is hypothesis testing? Explain the procedure of testing of hypothesis including null and alternative hypotheses, level of significance, types of errors and the critical region.

Hypothesis Testing

Hypothesis testing is a statistical procedure used to decide, on the basis of sample data, whether to accept or reject a claim (hypothesis) made about a population parameter. It provides an objective rule for choosing between two competing statements about the population.

1. Null and Alternative Hypotheses

  • Null hypothesis (H0H_0): A statement of no difference or no effect, assumed true until evidence contradicts it. e.g. H0:μ=μ0H_0: \mu = \mu_0.
  • Alternative hypothesis (H1H_1): The statement accepted if H0H_0 is rejected. It may be:
    • Two-tailed: H1:μμ0H_1: \mu \ne \mu_0
    • One-tailed: H1:μ>μ0H_1: \mu > \mu_0 or H1:μ<μ0H_1: \mu < \mu_0

2. Level of Significance (α\alpha)

The maximum probability of rejecting H0H_0 when it is actually true. Common values are α=0.05\alpha = 0.05 (5%) and α=0.01\alpha = 0.01 (1%). It fixes the size of the critical region.

3. Types of Errors

H0H_0 TrueH0H_0 False
Reject H0H_0Type I error (α\alpha)Correct decision (power =1β=1-\beta)
Accept H0H_0Correct decisionType II error (β\beta)
  • Type I error: Rejecting a true H0H_0; probability =α=\alpha.
  • Type II error: Accepting a false H0H_0; probability =β=\beta.

4. Critical Region

The critical (rejection) region is the set of values of the test statistic for which H0H_0 is rejected. Its area equals α\alpha. The complementary region is the acceptance region. The boundary value is the critical value (e.g. ±1.96\pm 1.96 for a two-tailed zz-test at 5%).

5. General Procedure

  1. Set up H0H_0 and H1H_1.
  2. Choose the level of significance α\alpha.
  3. Select an appropriate test statistic (z, t, χ2\chi^2, F) and find its sampling distribution under H0H_0.
  4. Determine the critical region / critical value.
  5. Compute the test statistic from sample data.
  6. Decision: If the computed value falls in the critical region, reject H0H_0; otherwise do not reject H0H_0. State the conclusion in context.
hypothesis-testing
2long10 marks

Define correlation and regression. Explain the method of fitting two regression lines and the relationship between correlation coefficient and regression coefficients.

Correlation and Regression

Correlation measures the degree and direction of the linear relationship between two variables XX and YY. It is dimensionless and lies between 1-1 and +1+1.

Regression measures the average functional dependence of one variable on another and is used for prediction. It gives the equation of the line that best estimates one variable from the other.

Fitting the Two Regression Lines (Least Squares)

The two lines are obtained by minimising the sum of squared deviations.

(a) Regression of YY on XX (predict YY from XX):

YYˉ=byx(XXˉ),byx=Cov(X,Y)σx2=rσyσxY - \bar{Y} = b_{yx}(X - \bar{X}), \qquad b_{yx} = \frac{\operatorname{Cov}(X,Y)}{\sigma_x^2} = r\frac{\sigma_y}{\sigma_x}

(b) Regression of XX on YY (predict XX from YY):

XXˉ=bxy(YYˉ),bxy=Cov(X,Y)σy2=rσxσyX - \bar{X} = b_{xy}(Y - \bar{Y}), \qquad b_{xy} = \frac{\operatorname{Cov}(X,Y)}{\sigma_y^2} = r\frac{\sigma_x}{\sigma_y}

Both lines pass through the mean point (Xˉ,Yˉ)(\bar{X}, \bar{Y}), where they intersect.

Relationship between rr and the Regression Coefficients

The correlation coefficient is the geometric mean of the two regression coefficients:

r=±byxbxyr = \pm\sqrt{b_{yx}\cdot b_{xy}}

Key points:

  • The sign of rr equals the (common) sign of byxb_{yx} and bxyb_{xy}.
  • Since r2=byxbxy1r^2 = b_{yx}b_{xy} \le 1, both coefficients cannot exceed 1 in magnitude simultaneously.
  • If r=0r = 0, both regression coefficients are 0 and the lines are perpendicular; if r=±1r = \pm 1 the two lines coincide.
correlationregression
3long10 marks

What is sampling? Explain different methods of probability and non-probability sampling with their merits and demerits.

Sampling

Sampling is the process of selecting a representative subset (sample) from a population so that conclusions about the whole population can be drawn from the sample, saving time, cost and effort.

A. Probability (Random) Sampling

Every unit has a known, non-zero chance of selection.

  1. Simple Random Sampling — every unit has an equal chance (lottery / random numbers).
    • Merits: unbiased, simple, easy to analyse.
    • Demerits: needs complete sampling frame; may miss small subgroups.
  2. Stratified Random Sampling — population divided into homogeneous strata, samples drawn from each.
    • Merits: high precision, ensures representation of all groups.
    • Demerits: requires prior knowledge of strata; complex.
  3. Systematic Sampling — select every kthk^{th} unit after a random start (k=N/nk = N/n).
    • Merits: simple, quick, spreads sample evenly.
    • Demerits: biased if there is periodicity in the list.
  4. Cluster / Multistage Sampling — population divided into clusters; whole clusters selected.
    • Merits: economical for wide geographic areas, no full frame needed.
    • Demerits: higher sampling error if clusters are heterogeneous within.

B. Non-Probability Sampling

Selection probability is unknown; relies on judgement/convenience.

  1. Convenience Sampling — units that are easiest to reach.
    • Merits: fast, cheap. Demerits: highly biased, not generalisable.
  2. Judgement (Purposive) Sampling — expert chooses typical units.
    • Merits: useful for small specialised studies. Demerits: subjective, biased.
  3. Quota Sampling — fixed quotas filled by interviewer choice.
    • Merits: quick, ensures group representation. Demerits: selection bias.
  4. Snowball Sampling — existing subjects recruit others.
    • Merits: good for hidden/rare populations. Demerits: strong bias, no frame.

Conclusion: Probability methods allow estimation of sampling error and valid inference; non-probability methods are cheaper and faster but cannot reliably generalise to the population.

sampling
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Define mathematical expectation. State and prove its properties.

Mathematical Expectation

The mathematical expectation (mean) of a random variable XX is the long-run average value, weighted by probabilities.

E(X)=ixip(xi)(discrete),E(X)=xf(x)dx(continuous)E(X) = \sum_i x_i\, p(x_i) \quad\text{(discrete)}, \qquad E(X) = \int_{-\infty}^{\infty} x\, f(x)\, dx \quad\text{(continuous)}

Properties (with proof)

1. Expectation of a constant. E(c)=cE(c) = c. Proof: E(c)=cp(x)=cp(x)=c1=c.E(c) = \sum c\, p(x) = c\sum p(x) = c\cdot 1 = c.

2. Constant multiplier. E(cX)=cE(X)E(cX) = c\,E(X). Proof: E(cX)=cxp(x)=cxp(x)=cE(X).E(cX) = \sum cx\,p(x) = c\sum x\,p(x) = c\,E(X).

3. Linearity (addition). E(aX+b)=aE(X)+bE(aX + b) = a\,E(X) + b. Proof: E(aX+b)=(ax+b)p(x)=axp(x)+bp(x)=aE(X)+b.E(aX+b) = \sum (ax+b)p(x) = a\sum x p(x) + b\sum p(x) = aE(X)+b.

4. Addition theorem. E(X+Y)=E(X)+E(Y)E(X + Y) = E(X) + E(Y) (always). Proof: E(X+Y)=xy(x+y)p(x,y)=xp(x)+yp(y)=E(X)+E(Y).E(X+Y) = \sum_x\sum_y (x+y)p(x,y) = \sum x\,p(x) + \sum y\,p(y) = E(X)+E(Y).

5. Multiplication theorem. If XX and YY are independent, E(XY)=E(X)E(Y)E(XY) = E(X)\,E(Y). Proof: For independence p(x,y)=p(x)p(y)p(x,y)=p(x)p(y), so E(XY)=xyxyp(x)p(y)=(xp(x))(yp(y))=E(X)E(Y).E(XY)=\sum_x\sum_y xy\,p(x)p(y)=\big(\sum x p(x)\big)\big(\sum y p(y)\big)=E(X)E(Y).

expectation
5short5 marks

Explain the t-test for testing the significance of the difference between two sample means.

t-Test for Difference of Two Sample Means

Used for small independent samples (n1,n2<30n_1, n_2 < 30) from two normal populations with unknown but equal variances.

Hypotheses

H0:μ1=μ2H1:μ1μ2 (or one-sided)H_0: \mu_1 = \mu_2 \qquad H_1: \mu_1 \ne \mu_2\ (\text{or one-sided})

Test Statistic

t=xˉ1xˉ2S1n1+1n2t = \frac{\bar{x}_1 - \bar{x}_2}{S\sqrt{\dfrac{1}{n_1} + \dfrac{1}{n_2}}}

where the pooled standard deviation is

S2=(x1xˉ1)2+(x2xˉ2)2n1+n22=(n11)s12+(n21)s22n1+n22S^2 = \frac{\sum (x_1-\bar{x}_1)^2 + \sum (x_2-\bar{x}_2)^2}{n_1 + n_2 - 2} = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}

Degrees of Freedom

d.f.=n1+n22\text{d.f.} = n_1 + n_2 - 2

Decision Rule

Compare tcal|t_{\text{cal}}| with the table value tα,n1+n22t_{\alpha,\,n_1+n_2-2}:

  • If tcalttab|t_{\text{cal}}| \le t_{\text{tab}}do not reject H0H_0 (means not significantly different).
  • If tcal>ttab|t_{\text{cal}}| > t_{\text{tab}}reject H0H_0 (significant difference).

Assumptions: samples independent, drawn from normal populations, with equal population variances.

t-test
6short5 marks

Explain the z-test for a large sample test of a single mean with an example.

z-Test for a Single Mean (Large Sample)

Used when n30n \ge 30 (or population σ\sigma known) to test whether the sample mean differs from a stated population mean μ\mu.

Hypotheses

H0:μ=μ0H1:μμ0H_0: \mu = \mu_0 \qquad H_1: \mu \ne \mu_0

Test Statistic

z=xˉμ0σ/nz = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}

If σ\sigma is unknown, the sample s.d. ss is used. Critical value at 5% (two-tailed) is ±1.96\pm 1.96.

Decision Rule

Reject H0H_0 if z>1.96|z| > 1.96 (5%); otherwise do not reject.

Example

A sample of n=100n = 100 bulbs has mean life xˉ=1570\bar{x} = 1570 hr with σ=120\sigma = 120 hr. Test whether the mean life is 16001600 hr.

  • H0:μ=1600H_0: \mu = 1600, H1:μ1600H_1: \mu \ne 1600.
  • z=15701600120/100=3012=2.5.z = \dfrac{1570 - 1600}{120/\sqrt{100}} = \dfrac{-30}{12} = -2.5.
  • z=2.5>1.96|z| = 2.5 > 1.96reject H0H_0.

Conclusion: The mean life differs significantly from 1600 hours at the 5% level.

z-test
7short5 marks

Define Karl Pearson's coefficient of correlation and state its properties.

Karl Pearson's Coefficient of Correlation

It is a numerical measure of the degree and direction of the linear relationship between two variables XX and YY, defined as the ratio of their covariance to the product of their standard deviations:

r=Cov(X,Y)σxσy=(xxˉ)(yyˉ)(xxˉ)2(yyˉ)2r = \frac{\operatorname{Cov}(X,Y)}{\sigma_x\,\sigma_y} = \frac{\sum (x-\bar{x})(y-\bar{y})}{\sqrt{\sum (x-\bar{x})^2}\,\sqrt{\sum (y-\bar{y})^2}}

or equivalently   r=nxyxynx2(x)2ny2(y)2.\;r = \dfrac{n\sum xy - \sum x\sum y}{\sqrt{n\sum x^2 - (\sum x)^2}\,\sqrt{n\sum y^2 - (\sum y)^2}}.

Properties

  1. Range: 1r+1-1 \le r \le +1.
  2. Sign: r>0r > 0 → positive (direct) correlation; r<0r < 0 → negative (inverse); r=0r = 0 → no linear correlation.
  3. Perfect correlation: r=±1r = \pm 1 means all points lie exactly on a straight line.
  4. Independent of origin and scale: rr is unchanged by linear transformations u=(xa)/hu = (x-a)/h, v=(yb)/kv=(y-b)/k.
  5. Symmetry: rxy=ryxr_{xy} = r_{yx}.
  6. Geometric mean of regression coefficients: r=±byxbxyr = \pm\sqrt{b_{yx}\,b_{xy}}.
  7. It is a pure number (unit-free).
correlation
8short5 marks

What are regression coefficients? State their properties.

Regression Coefficients

The regression coefficient is the slope of a regression line — it gives the average change in the dependent variable per unit change in the independent variable.

  • Regression of YY on XX:   byx=rσyσx=Cov(X,Y)σx2\;b_{yx} = r\dfrac{\sigma_y}{\sigma_x} = \dfrac{\operatorname{Cov}(X,Y)}{\sigma_x^2}
  • Regression of XX on YY:   bxy=rσxσy=Cov(X,Y)σy2\;b_{xy} = r\dfrac{\sigma_x}{\sigma_y} = \dfrac{\operatorname{Cov}(X,Y)}{\sigma_y^2}

Properties

  1. Correlation as geometric mean: r=±byxbxyr = \pm\sqrt{b_{yx}\,b_{xy}}.
  2. Same sign: both byxb_{yx} and bxyb_{xy} have the same sign, which is also the sign of rr.
  3. Product condition: byxbxy=r21b_{yx}\,b_{xy} = r^2 \le 1, so both coefficients cannot exceed unity at the same time (if one >1>1, the other <1<1).
  4. Independent of origin but not of scale: changing scale changes the coefficients.
  5. Arithmetic mean property: the arithmetic mean of the two regression coefficients is r\ge r, i.e. byx+bxy2r\dfrac{b_{yx}+b_{xy}}{2} \ge r.
regression
9short5 marks

Explain the concept of sampling distribution and standard error.

Sampling Distribution and Standard Error

Sampling Distribution

If all possible samples of a fixed size nn are drawn from a population and a statistic (e.g. the sample mean xˉ\bar{x}, proportion pp, or variance s2s^2) is computed for each, the probability distribution of that statistic over all such samples is called its sampling distribution.

Example: the sampling distribution of xˉ\bar{x} has mean E(xˉ)=μE(\bar{x}) = \mu and, by the Central Limit Theorem, is approximately normal for large nn.

Standard Error (S.E.)

The standard error is the standard deviation of the sampling distribution of a statistic. It measures the precision / variability of the estimate:

S.E.(xˉ)=σn,S.E.(p)=PQn\text{S.E.}(\bar{x}) = \frac{\sigma}{\sqrt{n}}, \qquad \text{S.E.}(p) = \sqrt{\frac{PQ}{n}}

Uses of Standard Error

  • It measures the reliability of a sample estimate (smaller S.E. = more precise).
  • It forms the denominator of test statistics (z=(xˉμ)/S.E.z = (\bar{x}-\mu)/\text{S.E.}).
  • It is used to construct confidence intervals.
  • S.E. decreases as the sample size nn increases (proportional to 1/n1/\sqrt{n}).
samplingdistribution
10short5 marks

Explain how to construct a confidence interval for a population mean.

Confidence Interval for a Population Mean

A confidence interval (CI) is a range of values, computed from a sample, that is likely to contain the unknown population mean μ\mu with a stated probability called the confidence level (1α)(1-\alpha), e.g. 95%.

Case 1: Large Sample (n30n \ge 30) or σ\sigma known

Using the normal distribution:

xˉ±zα/2σn\bar{x} \pm z_{\alpha/2}\,\frac{\sigma}{\sqrt{n}}

For 95% confidence, zα/2=1.96z_{\alpha/2} = 1.96; for 99%, zα/2=2.58z_{\alpha/2} = 2.58. If σ\sigma is unknown, replace it with the sample s.d. ss.

Case 2: Small Sample (n<30n < 30), σ\sigma unknown

Using the tt-distribution with n1n-1 degrees of freedom:

xˉ±tα/2,n1sn\bar{x} \pm t_{\alpha/2,\,n-1}\,\frac{s}{\sqrt{n}}

Steps

  1. Compute sample mean xˉ\bar{x} and standard error σ/n\sigma/\sqrt{n} (or s/ns/\sqrt{n}).
  2. Choose confidence level and find zα/2z_{\alpha/2} or tα/2t_{\alpha/2}.
  3. Compute the margin of error E=(critical value)×S.E.E = (\text{critical value})\times \text{S.E.}
  4. The interval is (xˉE, xˉ+E)(\bar{x} - E,\ \bar{x} + E).

Interpretation: We are 95% confident that the population mean lies within this interval; a wider level or smaller nn gives a wider interval.

confidence-interval
11short5 marks

Explain the F-test for the equality of two population variances.

F-Test for Equality of Two Population Variances

Used to test whether two independent normal populations have equal variances, based on their sample variances.

Hypotheses

H0:σ12=σ22H1:σ12σ22H_0: \sigma_1^2 = \sigma_2^2 \qquad H_1: \sigma_1^2 \ne \sigma_2^2

Test Statistic

F=s12s22(larger variance in the numerator, so F1)F = \frac{s_1^2}{s_2^2} \quad (\text{larger variance in the numerator, so } F \ge 1)

where the unbiased sample variances are

s12=(x1xˉ1)2n11,s22=(x2xˉ2)2n21s_1^2 = \frac{\sum (x_1-\bar{x}_1)^2}{n_1-1}, \qquad s_2^2 = \frac{\sum (x_2-\bar{x}_2)^2}{n_2-1}

Degrees of Freedom

ν1=n11\nu_1 = n_1 - 1 (numerator), ν2=n21\nu_2 = n_2 - 1 (denominator).

Decision Rule

Compare FcalF_{\text{cal}} with the table value Fα(ν1,ν2)F_{\alpha\,(\nu_1,\nu_2)}:

  • If FcalFtabF_{\text{cal}} \le F_{\text{tab}}do not reject H0H_0 (variances equal).
  • If Fcal>FtabF_{\text{cal}} > F_{\text{tab}}reject H0H_0 (variances differ significantly).

Assumptions: both samples are independent and drawn from normal populations. (The F-test is also the basis of ANOVA.)

f-test
12short5 marks

Define index numbers and explain Laspeyres' and Paasche's price index methods.

Index Numbers

An index number is a statistical measure that expresses the relative change in a variable (price, quantity, value) of a group of related items between a current period and a fixed base period, usually expressed as a percentage. Base period value =100= 100.

Notation: p0,q0p_0, q_0 = base-period price and quantity; p1,q1p_1, q_1 = current-period price and quantity.

Laspeyres' Price Index (base-year weights)

Uses base-year quantities q0q_0 as weights:

P01L=p1q0p0q0×100P_{01}^{L} = \frac{\sum p_1 q_0}{\sum p_0 q_0} \times 100

Advantage: requires only base-year quantities, easy to update. Limitation: tends to over-estimate the rise in prices (ignores consumers switching away from costlier goods).

Paasche's Price Index (current-year weights)

Uses current-year quantities q1q_1 as weights:

P01P=p1q1p0q1×100P_{01}^{P} = \frac{\sum p_1 q_1}{\sum p_0 q_1} \times 100

Advantage: reflects current consumption pattern. Limitation: needs fresh current-year quantities each period; tends to under-estimate the price rise.

Note: Fisher's Ideal Index is the geometric mean of the two: PF=PL×PPP_F = \sqrt{P^{L}\times P^{P}}.

index-numbers

Frequently asked questions

Where can I find the BSc CSIT (TU) Statistics II (BSc CSIT, STA210) question paper 2080?
The full BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2080 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Statistics II (BSc CSIT, STA210) 2080 paper come with solutions?
Yes. Every question on this Statistics II (BSc CSIT, STA210) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2080 paper?
The BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Statistics II (BSc CSIT, STA210) past paper free?
Yes — reading and attempting this Statistics II (BSc CSIT, STA210) past paper on Kekkei is completely free.