Browse papers
A

Section A: Long Answer Questions

Attempt any TWO questions.

3 questions·10 marks each
1long10 marks

What is sampling? Explain different methods of probability and non-probability sampling with their merits and demerits.

Sampling

Sampling is the statistical process of selecting a subset (a sample) of individuals or items from a larger group (the population) in order to estimate characteristics of the whole population. It is used because studying the entire population (a census) is often costly, time-consuming, or practically impossible.

Sampling methods are broadly divided into probability and non-probability sampling.

A. Probability Sampling

Every unit of the population has a known, non-zero chance of selection. Results can be generalized and sampling error can be estimated.

1. Simple Random Sampling

Every unit has an equal chance of selection (lottery method or random numbers).

  • Merits: Unbiased; easy to analyse; sampling error measurable.
  • Demerits: Needs a complete sampling frame; may not represent small subgroups; expensive for widely scattered populations.

2. Stratified Random Sampling

Population divided into homogeneous strata, then random samples drawn from each.

  • Merits: Greater precision; ensures representation of every subgroup.
  • Demerits: Requires prior knowledge of strata; complex; faulty stratification reduces efficiency.

3. Systematic Sampling

Every kthk^{th} unit is selected after a random start, where k=N/nk = N/n.

  • Merits: Simple, quick, evenly spread over the frame.
  • Demerits: Biased if the list has a hidden periodic pattern.

4. Cluster / Multistage Sampling

Population divided into clusters; some clusters are selected and all (or sampled) units within them studied.

  • Merits: Economical; no full frame of units needed; suited to geographically spread populations.
  • Demerits: Higher sampling error; less precise than other methods.

B. Non-Probability Sampling

Units are selected on a non-random basis; probability of selection is unknown, so sampling error cannot be measured.

1. Convenience Sampling

Units chosen because they are easy to reach.

  • Merits: Fast and cheap. Demerits: Highly biased, not generalizable.

2. Judgement (Purposive) Sampling

Expert chooses units believed to be representative.

  • Merits: Useful for small/specialized studies. Demerits: Subjective; depends on investigator's judgement.

3. Quota Sampling

Units selected to fill fixed quotas for sub-groups.

  • Merits: Quick; ensures representation of groups. Demerits: Selection within quota is biased.

4. Snowball Sampling

Existing respondents recruit further respondents.

  • Merits: Good for hidden/rare populations. Demerits: Strong selection bias.

Conclusion

Probability sampling is preferred when accuracy and generalization are required, whereas non-probability sampling is used when speed, cost, or accessibility dominate.

sampling
2long10 marks

What is analysis of variance (ANOVA)? Explain the procedure of one-way ANOVA with the construction of the ANOVA table.

Analysis of Variance (ANOVA)

ANOVA, developed by R. A. Fisher, is a technique used to test the equality of means of three or more populations simultaneously by partitioning the total variation in the data into components attributable to different sources. It compares the variance between groups with the variance within groups using the FF-statistic.

Assumptions: observations are independent, drawn from normal populations, and the populations have equal variances (homogeneity).

One-Way ANOVA Procedure

A single factor with kk treatments (groups) and NN total observations is studied.

Step 1 — Hypotheses

H0:μ1=μ2==μkvsH1:at least one mean differsH_0: \mu_1 = \mu_2 = \dots = \mu_k \quad\text{vs}\quad H_1: \text{at least one mean differs}

Step 2 — Grand total and correction factor

T=xij,CF=T2NT = \sum x_{ij}, \qquad CF = \frac{T^2}{N}

Step 3 — Sum of squares

SST=xij2CF(Total)SST = \sum x_{ij}^2 - CF \quad (\text{Total}) SSB=jTj2njCF(Between treatments)SSB = \sum_{j} \frac{T_j^2}{n_j} - CF \quad (\text{Between treatments}) SSE=SSTSSB(Within / Error)SSE = SST - SSB \quad (\text{Within / Error})

where TjT_j is the total of the jthj^{th} group having njn_j observations.

Step 4 — Degrees of freedom Between =k1= k-1, Error =Nk= N-k, Total =N1= N-1.

Step 5 — Mean squares and F-ratio

MSB=SSBk1,MSE=SSENk,F=MSBMSEMSB = \frac{SSB}{k-1}, \qquad MSE = \frac{SSE}{N-k}, \qquad F = \frac{MSB}{MSE}

ANOVA Table

Source of VariationSSd.f.Mean SquareF-ratio
Between treatmentsSSBSSBk1k-1MSB=SSB/(k1)MSB = SSB/(k-1)MSB/MSEMSB/MSE
Within (Error)SSESSENkN-kMSE=SSE/(Nk)MSE = SSE/(N-k)
TotalSSTSSTN1N-1

Step 6 — Decision: Compare calculated FF with the table value Fα,(k1,Nk)F_{\alpha,(k-1,N-k)}. If Fcal>FtabF_{cal} > F_{tab}, reject H0H_0 and conclude that the treatment means differ significantly.

anova
3long10 marks

Explain the theory of estimation. Differentiate between point estimation and interval estimation and explain the properties of a good estimator.

Theory of Estimation

Estimation is the branch of statistical inference concerned with using sample data to assign numerical values (estimates) to the unknown parameters of a population (e.g., mean μ\mu, variance σ2\sigma^2, proportion PP). A sample statistic used for this purpose is called an estimator, and a particular numerical value it takes is an estimate.

Estimation is of two types: point estimation and interval estimation.

Point Estimation vs Interval Estimation

BasisPoint EstimationInterval Estimation
ResultA single value as the estimate of the parameterA range (interval) within which the parameter is expected to lie
Examplexˉ\bar{x} estimates μ\mu (e.g., μ=50\mu = 50)xˉ±Zα/2σn\bar{x} \pm Z_{\alpha/2}\,\dfrac{\sigma}{\sqrt{n}} (e.g., 48<μ<5248 < \mu < 52)
Probability statementNo measure of reliability attachedAttached with a confidence level (e.g., 95%)
ErrorProbability of being exactly correct is essentially zeroAccounts for sampling error via the confidence coefficient
Information givenLess informativeMore informative and realistic

In interval estimation the interval is called a confidence interval and the probability (1α)(1-\alpha) that it contains the parameter is the confidence coefficient.

Properties of a Good Estimator

  1. Unbiasedness: The expected value of the estimator equals the parameter, E(θ^)=θE(\hat{\theta}) = \theta. (e.g., E(xˉ)=μE(\bar{x}) = \mu.)
  2. Consistency: As the sample size nn \to \infty, the estimator converges to the true parameter value.
  3. Efficiency: Among unbiased estimators, the one with the smallest variance is the most efficient.
  4. Sufficiency: A sufficient estimator uses all the information in the sample relevant to the parameter, leaving nothing more to be gained from the data.

An ideal estimator is unbiased, consistent, efficient, and sufficient.

estimation
B

Section B: Short Answer Questions

Attempt any EIGHT questions.

9 questions·5 marks each
4short5 marks

Define Karl Pearson's coefficient of correlation and state its properties.

Karl Pearson's Coefficient of Correlation

Karl Pearson's coefficient of correlation, denoted rr, measures the degree and direction of the linear relationship between two quantitative variables XX and YY. It is defined as the ratio of the covariance of the variables to the product of their standard deviations:

r=Cov(X,Y)σXσY=(xxˉ)(yyˉ)(xxˉ)2(yyˉ)2r = \frac{\text{Cov}(X,Y)}{\sigma_X \,\sigma_Y} = \frac{\sum (x-\bar{x})(y-\bar{y})}{\sqrt{\sum (x-\bar{x})^2}\,\sqrt{\sum (y-\bar{y})^2}}

Properties

  1. Range: rr always lies between 1-1 and +1+1, i.e. 1r+1-1 \le r \le +1.
  2. Direction: r>0r > 0 indicates positive correlation, r<0r < 0 negative correlation, and r=0r = 0 no linear correlation.
  3. Unit-free: rr is a pure number, independent of the units of measurement.
  4. Independent of change of origin and scale: correlation is unaffected by adding/subtracting a constant or multiplying/dividing by a positive constant.
  5. Symmetric: rxy=ryxr_{xy} = r_{yx}.
  6. It is the geometric mean of the two regression coefficients: r=±bxybyxr = \pm\sqrt{b_{xy}\cdot b_{yx}}, taking the sign of the regression coefficients.
correlation
5short5 marks

What are regression coefficients? State their properties.

Regression Coefficients

In a linear regression between two variables XX and YY, the regression coefficient is the slope of the regression line and measures the average change in the dependent variable for a unit change in the independent variable.

  • Regression coefficient of YY on XX:
byx=rσyσx=(xxˉ)(yyˉ)(xxˉ)2b_{yx} = r\,\frac{\sigma_y}{\sigma_x} = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sum(x-\bar{x})^2}
  • Regression coefficient of XX on YY:
bxy=rσxσy=(xxˉ)(yyˉ)(yyˉ)2b_{xy} = r\,\frac{\sigma_x}{\sigma_y} = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sum(y-\bar{y})^2}

Properties

  1. The correlation coefficient is the geometric mean of the two regression coefficients: r=±byxbxyr = \pm\sqrt{b_{yx}\cdot b_{xy}}.
  2. Both regression coefficients have the same sign, which is also the sign of rr.
  3. The product of the two regression coefficients cannot exceed 1: byxbxy=r21b_{yx}\cdot b_{xy} = r^2 \le 1.
  4. If one regression coefficient is greater than 1, the other must be less than 1.
  5. Regression coefficients are independent of change of origin but not of change of scale.
  6. The arithmetic mean of the two regression coefficients is greater than or equal to rr (when r>0r>0).
regression
6short5 marks

Explain the concept of sampling distribution and standard error.

Sampling Distribution

If all possible samples of a fixed size nn are drawn from a population and a statistic (such as the mean xˉ\bar{x}, proportion, or variance) is computed for each sample, the probability distribution of that statistic over all such samples is called the sampling distribution of the statistic.

For example, the sampling distribution of the mean xˉ\bar{x} describes how sample means vary from sample to sample. By the Central Limit Theorem, for large nn this distribution is approximately normal with

E(xˉ)=μ,Var(xˉ)=σ2n.E(\bar{x}) = \mu, \qquad \text{Var}(\bar{x}) = \frac{\sigma^2}{n}.

Standard Error (S.E.)

The standard error is the standard deviation of the sampling distribution of a statistic. It measures the variability of the statistic due to sampling and is a key indicator of the precision/reliability of an estimate.

  • S.E. of the mean:   S.E.(xˉ)=σn\;\text{S.E.}(\bar{x}) = \dfrac{\sigma}{\sqrt{n}}
  • S.E. of a proportion:   S.E.(p)=PQn\;\text{S.E.}(p) = \sqrt{\dfrac{PQ}{n}}

Uses of S.E.: it is used to construct confidence intervals, to test hypotheses (test statistic = (estimate − parameter)/S.E.), and to judge accuracy. A smaller standard error (larger nn) indicates a more reliable estimate.

samplingdistribution
7short5 marks

Explain how to construct a confidence interval for a population mean.

Confidence Interval for a Population Mean

A confidence interval (CI) is a range of values, computed from a sample, that is expected to contain the unknown population mean μ\mu with a stated probability (1α)(1-\alpha), called the confidence level (e.g., 95%).

Case 1: Population variance σ2\sigma^2 known (or large sample, n30n \ge 30)

Use the standard normal (ZZ) distribution:

xˉ±Zα/2σn\bar{x} \pm Z_{\alpha/2}\,\frac{\sigma}{\sqrt{n}}

where xˉ\bar{x} is the sample mean and Zα/2Z_{\alpha/2} is the critical value (e.g., 1.96 for 95%, 2.58 for 99%).

Case 2: Population variance unknown and small sample (n<30n < 30)

Replace σ\sigma by the sample standard deviation ss and use the tt-distribution with (n1)(n-1) degrees of freedom:

xˉ±tα/2,n1sn\bar{x} \pm t_{\alpha/2,\,n-1}\,\frac{s}{\sqrt{n}}

Steps

  1. Compute the sample mean xˉ\bar{x} (and ss if needed).
  2. Fix the confidence level and obtain the critical value Zα/2Z_{\alpha/2} or tα/2,n1t_{\alpha/2,n-1}.
  3. Compute the standard error σ/n\sigma/\sqrt{n} (or s/ns/\sqrt{n}).
  4. Compute the margin of error E=(critical value)×(S.E.)E = (\text{critical value}) \times (\text{S.E.}).
  5. The interval is (xˉE, xˉ+E)(\bar{x} - E,\ \bar{x} + E).

Interpretation: A 95% CI means that if the sampling were repeated many times, about 95% of such intervals would contain the true mean μ\mu.

confidence-interval
8short5 marks

Explain the F-test for the equality of two population variances.

F-test for Equality of Two Population Variances

The F-test is used to test whether two normal populations have equal variances, based on the ratio of two independent sample variances. It is the basis for ANOVA and for testing the homogeneity assumption.

Hypotheses

H0:σ12=σ22vsH1:σ12σ22H_0: \sigma_1^2 = \sigma_2^2 \qquad \text{vs} \qquad H_1: \sigma_1^2 \neq \sigma_2^2

Test Statistic

Given two independent samples of sizes n1n_1 and n2n_2 with unbiased sample variances

s12=(x1xˉ1)2n11,s22=(x2xˉ2)2n21,s_1^2 = \frac{\sum(x_1-\bar{x}_1)^2}{n_1-1}, \qquad s_2^2 = \frac{\sum(x_2-\bar{x}_2)^2}{n_2-1},

the statistic is

F=s12s22,with s12>s22 (larger variance in numerator),F = \frac{s_1^2}{s_2^2}, \quad \text{with } s_1^2 > s_2^2 \ (\text{larger variance in numerator}),

which follows the FF-distribution with (n11,n21)(n_1-1, n_2-1) degrees of freedom.

Decision Rule

Compare FcalF_{cal} with the table value Fα,(n11,n21)F_{\alpha,(n_1-1,\,n_2-1)}.

  • If Fcal>FtabF_{cal} > F_{tab}reject H0H_0 (variances differ significantly).
  • If FcalFtabF_{cal} \le F_{tab}accept H0H_0 (no significant difference).

Assumptions: both samples are random, independent, and drawn from normal populations.

f-test
9short5 marks

Define index numbers and explain Laspeyres' and Paasche's price index methods.

Index Numbers

An index number is a statistical measure that expresses the relative change in the level of a variable (or group of variables) such as price, quantity, or value, over time or between places, with respect to a fixed base period (taken as 100). They are often called economic barometers.

A price index measures the relative change in the prices of a basket of commodities between the base year (prices p0p_0, quantities q0q_0) and the current year (prices p1p_1, quantities q1q_1).

Laspeyres' Price Index

Uses base-year quantities (q0q_0) as weights:

P01L=p1q0p0q0×100P_{01}^{L} = \frac{\sum p_1 q_0}{\sum p_0 q_0} \times 100
  • Merit: Requires only base-year weights, so easy to compute over time.
  • Demerit: Ignores changes in consumption pattern; tends to overestimate the rise in prices.

Paasche's Price Index

Uses current-year quantities (q1q_1) as weights:

P01P=p1q1p0q1×100P_{01}^{P} = \frac{\sum p_1 q_1}{\sum p_0 q_1} \times 100
  • Merit: Reflects current consumption pattern.
  • Demerit: Current-year weights must be collected every period (costly); tends to underestimate the rise in prices.

Note: Fisher's ideal index is the geometric mean of the two, PF=PL×PPP^F = \sqrt{P^L \times P^P}.

index-numbers
10short5 marks

State and explain the addition and multiplication theorems of probability with examples.

Addition and Multiplication Theorems of Probability

Addition Theorem

The addition theorem gives the probability of the union of events (occurrence of at least one event).

For any two events AA and BB:

P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)

If AA and BB are mutually exclusive (AB=A \cap B = \varnothing):

P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B)

Example: Drawing one card from a deck of 52, P(King or Queen)=452+452=852=213P(\text{King or Queen}) = \frac{4}{52} + \frac{4}{52} = \frac{8}{52} = \frac{2}{13} (mutually exclusive).

Multiplication Theorem

The multiplication theorem gives the probability of the joint occurrence (intersection) of events.

For any two events:

P(AB)=P(A)P(BA)=P(B)P(AB)P(A \cap B) = P(A)\cdot P(B\mid A) = P(B)\cdot P(A\mid B)

If AA and BB are independent:

P(AB)=P(A)P(B)P(A \cap B) = P(A)\cdot P(B)

Example: Tossing two fair coins, P(both heads)=12×12=14P(\text{both heads}) = \frac{1}{2} \times \frac{1}{2} = \frac{1}{4} (independent events).

Summary: the addition theorem deals with "OR" (union) of events, while the multiplication theorem deals with "AND" (intersection) of events.

probability
11short5 marks

Explain the Poisson distribution with its mean and variance and state its applications.

Poisson Distribution

The Poisson distribution is a discrete probability distribution that models the number of occurrences of a rare event in a fixed interval of time, space, or area, when the events occur independently and at a constant average rate λ\lambda.

A random variable XX follows a Poisson distribution if its probability mass function is

P(X=x)=eλλxx!,x=0,1,2,; λ>0,P(X = x) = \frac{e^{-\lambda}\,\lambda^{x}}{x!}, \qquad x = 0, 1, 2, \dots; \ \lambda > 0,

where λ\lambda is the average number of occurrences (the parameter) and e2.718e \approx 2.718.

It is obtained as a limiting case of the binomial distribution when nn \to \infty, p0p \to 0, with np=λnp = \lambda finite.

Mean and Variance

A distinctive property is that the mean equals the variance:

Mean=λ,Variance=λ.\text{Mean} = \lambda, \qquad \text{Variance} = \lambda.

Applications

  1. Number of telephone calls received at an exchange per minute.
  2. Number of printing/typing errors per page of a book.
  3. Number of accidents at a junction per day.
  4. Number of defective items in a large batch (rare defects).
  5. Number of customers arriving at a counter in a given time (queueing theory).
  6. Number of radioactive particle emissions per unit time.
poisson
12short5 marks

Define a random variable. Differentiate between discrete and continuous random variables with examples.

Random Variable

A random variable is a real-valued function that assigns a numerical value to each outcome of a random experiment (each point in the sample space). It is usually denoted by capital letters X,Y,ZX, Y, Z.

Example: In tossing two coins, if XX = number of heads, then XX takes values 0,1,20, 1, 2.

Random variables are of two types: discrete and continuous.

Discrete vs Continuous Random Variables

BasisDiscrete Random VariableContinuous Random Variable
ValuesTakes only countable (isolated) valuesTakes any value within an interval (uncountable)
Distribution functionProbability mass function p(x)=P(X=x)p(x) = P(X=x)Probability density function f(x)f(x)
Probability of a pointP(X=x)P(X=x) can be positiveP(X=x)=0P(X=x) = 0; only P(aXb)P(a \le X \le b) meaningful
Total probabilityxp(x)=1\sum_x p(x) = 1f(x)dx=1\int_{-\infty}^{\infty} f(x)\,dx = 1
ExampleNumber of heads in coin tosses; number of defective itemsHeight, weight, temperature, time taken

Examples:

  • Discrete: number of children in a family (0,1,2,)(0, 1, 2, \dots).
  • Continuous: the exact height of a student (e.g., any value such as 165.3 cm).
random-variable

Frequently asked questions

Where can I find the BSc CSIT (TU) Statistics II (BSc CSIT, STA210) question paper 2077?
The full BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2077 (regular) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.
Does the Statistics II (BSc CSIT, STA210) 2077 paper come with solutions?
Yes. Every question on this Statistics II (BSc CSIT, STA210) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.
How many marks is the BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2077 paper?
The BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2077 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.
Is practising this Statistics II (BSc CSIT, STA210) past paper free?
Yes — reading and attempting this Statistics II (BSc CSIT, STA210) past paper on Kekkei is completely free.