BSc CSIT (TU) Science Statistics II (BSc CSIT, STA210) Question Paper 2080 Nepal

Q: Where can I find the BSc CSIT (TU) Statistics II (BSc CSIT, STA210) question paper 2080?

The full BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2080 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Statistics II (BSc CSIT, STA210) 2080 paper come with solutions?

Yes. Every question on this Statistics II (BSc CSIT, STA210) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2080 paper?

The BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2080 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Statistics II (BSc CSIT, STA210) past paper free?

Yes — reading and attempting this Statistics II (BSc CSIT, STA210) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is hypothesis testing? Explain the procedure of testing of hypothesis including null and alternative hypotheses, level of significance, types of errors and the critical region.

hypothesis-testing

Answer 1

Hypothesis Testing

Hypothesis testing is a statistical procedure used to decide, on the basis of sample data, whether to accept or reject a claim (hypothesis) made about a population parameter. It provides an objective rule for choosing between two competing statements about the population.

1. Null and Alternative Hypotheses

Null hypothesis ( $H_0$ ): A statement of no difference or no effect, assumed true until evidence contradicts it. e.g. $H_0: \mu = \mu_0$ .
Alternative hypothesis ( $H_1$ ): The statement accepted if $H_0$ $H_{0}$ is rejected. It may be:
- Two-tailed: $H_1: \mu \ne \mu_0$
- One-tailed: $H_1: \mu > \mu_0$ or $H_1: \mu < \mu_0$

2. Level of Significance ( $\alpha$ )

The maximum probability of rejecting $H_0$ when it is actually true. Common values are $\alpha = 0.05$ (5%) and $\alpha = 0.01$ (1%). It fixes the size of the critical region.

3. Types of Errors

	$H_0$ True	$H_0$ False
Reject $H_0$	Type I error ( $\alpha$ )	Correct decision (power $=1-\beta$ )
Accept $H_0$	Correct decision	Type II error ( $\beta$ )

Type I error: Rejecting a true $H_0$ ; probability $=\alpha$ .
Type II error: Accepting a false $H_0$ ; probability $=\beta$ .

4. Critical Region

The critical (rejection) region is the set of values of the test statistic for which $H_0$ is rejected. Its area equals $\alpha$ . The complementary region is the acceptance region. The boundary value is the critical value (e.g. $\pm 1.96$ for a two-tailed $z$ -test at 5%).

5. General Procedure

Set up $H_0$ and $H_1$ .
Choose the level of significance $\alpha$ .
Select an appropriate test statistic (z, t, $\chi^2$ , F) and find its sampling distribution under $H_0$ .
Determine the critical region / critical value.
Compute the test statistic from sample data.
Decision: If the computed value falls in the critical region, reject $H_0$ ; otherwise do not reject $H_0$ . State the conclusion in context.

Answer 2

Correlation and Regression

Correlation measures the degree and direction of the linear relationship between two variables $X$ and $Y$ . It is dimensionless and lies between $-1$ and $+1$ .

Regression measures the average functional dependence of one variable on another and is used for prediction. It gives the equation of the line that best estimates one variable from the other.

Fitting the Two Regression Lines (Least Squares)

The two lines are obtained by minimising the sum of squared deviations.

(a) Regression of $Y$ on $X$ (predict $Y$ from $X$ ):

Y - \bar{Y} = b_{yx}(X - \bar{X}), \qquad b_{yx} = \frac{\operatorname{Cov}(X,Y)}{\sigma_x^2} = r\frac{\sigma_y}{\sigma_x}

(b) Regression of $X$ on $Y$ (predict $X$ from $Y$ ):

X - \bar{X} = b_{xy}(Y - \bar{Y}), \qquad b_{xy} = \frac{\operatorname{Cov}(X,Y)}{\sigma_y^2} = r\frac{\sigma_x}{\sigma_y}

Both lines pass through the mean point $(\bar{X}, \bar{Y})$ , where they intersect.

Relationship between $r$ and the Regression Coefficients

The correlation coefficient is the geometric mean of the two regression coefficients:

r = \pm\sqrt{b_{yx}\cdot b_{xy}}

Key points:

The sign of $r$ equals the (common) sign of $b_{yx}$ and $b_{xy}$ .
Since $r^2 = b_{yx}b_{xy} \le 1$ , both coefficients cannot exceed 1 in magnitude simultaneously.
If $r = 0$ , both regression coefficients are 0 and the lines are perpendicular; if $r = \pm 1$ the two lines coincide.

Answer 3

Sampling

Sampling is the process of selecting a representative subset (sample) from a population so that conclusions about the whole population can be drawn from the sample, saving time, cost and effort.

A. Probability (Random) Sampling

Every unit has a known, non-zero chance of selection.

Simple Random Sampling — every unit has an equal chance (lottery / random numbers).
- Merits: unbiased, simple, easy to analyse.
- Demerits: needs complete sampling frame; may miss small subgroups.
Stratified Random Sampling — population divided into homogeneous strata, samples drawn from each.
- Merits: high precision, ensures representation of all groups.
- Demerits: requires prior knowledge of strata; complex.
Systematic Sampling — select every $k^{th}$ $k^{t h}$ unit after a random start ( $k = N/n$ $k = N / n$ ).
- Merits: simple, quick, spreads sample evenly.
- Demerits: biased if there is periodicity in the list.
Cluster / Multistage Sampling — population divided into clusters; whole clusters selected.
- Merits: economical for wide geographic areas, no full frame needed.
- Demerits: higher sampling error if clusters are heterogeneous within.

B. Non-Probability Sampling

Selection probability is unknown; relies on judgement/convenience.

Convenience Sampling — units that are easiest to reach.
- Merits: fast, cheap. Demerits: highly biased, not generalisable.
Judgement (Purposive) Sampling — expert chooses typical units.
- Merits: useful for small specialised studies. Demerits: subjective, biased.
Quota Sampling — fixed quotas filled by interviewer choice.
- Merits: quick, ensures group representation. Demerits: selection bias.
Snowball Sampling — existing subjects recruit others.
- Merits: good for hidden/rare populations. Demerits: strong bias, no frame.

Conclusion: Probability methods allow estimation of sampling error and valid inference; non-probability methods are cheaper and faster but cannot reliably generalise to the population.

Answer 4

Mathematical Expectation

The mathematical expectation (mean) of a random variable $X$ is the long-run average value, weighted by probabilities.

E(X) = \sum_i x_i\, p(x_i) \quad\text{(discrete)}, \qquad E(X) = \int_{-\infty}^{\infty} x\, f(x)\, dx \quad\text{(continuous)}

Properties (with proof)

1. Expectation of a constant. $E(c) = c$ . Proof: $E(c) = \sum c\, p(x) = c\sum p(x) = c\cdot 1 = c.$

2. Constant multiplier. $E(cX) = c\,E(X)$ . Proof: $E(cX) = \sum cx\,p(x) = c\sum x\,p(x) = c\,E(X).$

3. Linearity (addition). $E(aX + b) = a\,E(X) + b$ . Proof: $E(aX+b) = \sum (ax+b)p(x) = a\sum x p(x) + b\sum p(x) = aE(X)+b.$

4. Addition theorem. $E(X + Y) = E(X) + E(Y)$ (always). Proof: $E(X+Y) = \sum_x\sum_y (x+y)p(x,y) = \sum x\,p(x) + \sum y\,p(y) = E(X)+E(Y).$

5. Multiplication theorem. If $X$ and $Y$ are independent, $E(XY) = E(X)\,E(Y)$ . Proof: For independence $p(x,y)=p(x)p(y)$ , so $E(XY)=\sum_x\sum_y xy\,p(x)p(y)=\big(\sum x p(x)\big)\big(\sum y p(y)\big)=E(X)E(Y).$

Answer 5

t-Test for Difference of Two Sample Means

Used for small independent samples ( $n_1, n_2 < 30$ ) from two normal populations with unknown but equal variances.

Hypotheses

H_0: \mu_1 = \mu_2 \qquad H_1: \mu_1 \ne \mu_2\ (\text{or one-sided})

Test Statistic

t = \frac{\bar{x}_1 - \bar{x}_2}{S\sqrt{\dfrac{1}{n_1} + \dfrac{1}{n_2}}}

where the pooled standard deviation is

S^2 = \frac{\sum (x_1-\bar{x}_1)^2 + \sum (x_2-\bar{x}_2)^2}{n_1 + n_2 - 2} = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}

Degrees of Freedom

\text{d.f.} = n_1 + n_2 - 2

Decision Rule

Compare $|t_{\text{cal}}|$ with the table value $t_{\alpha,\,n_1+n_2-2}$ :

If $|t_{\text{cal}}| \le t_{\text{tab}}$ → do not reject $H_0$ (means not significantly different).
If $|t_{\text{cal}}| > t_{\text{tab}}$ → reject $H_0$ (significant difference).

Assumptions: samples independent, drawn from normal populations, with equal population variances.

Answer 6

z-Test for a Single Mean (Large Sample)

Used when $n \ge 30$ (or population $\sigma$ known) to test whether the sample mean differs from a stated population mean $\mu$ .

Hypotheses

H_0: \mu = \mu_0 \qquad H_1: \mu \ne \mu_0

Test Statistic

z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}

If $\sigma$ is unknown, the sample s.d. $s$ is used. Critical value at 5% (two-tailed) is $\pm 1.96$ .

Decision Rule

Reject $H_0$ if $|z| > 1.96$ (5%); otherwise do not reject.

Example

A sample of $n = 100$ bulbs has mean life $\bar{x} = 1570$ hr with $\sigma = 120$ hr. Test whether the mean life is $1600$ hr.

$H_0: \mu = 1600$ , $H_1: \mu \ne 1600$ .
$z = \dfrac{1570 - 1600}{120/\sqrt{100}} = \dfrac{-30}{12} = -2.5.$
$|z| = 2.5 > 1.96$ → reject $H_0$ .

Conclusion: The mean life differs significantly from 1600 hours at the 5% level.

Answer 7

Karl Pearson's Coefficient of Correlation

It is a numerical measure of the degree and direction of the linear relationship between two variables $X$ and $Y$ , defined as the ratio of their covariance to the product of their standard deviations:

r = \frac{\operatorname{Cov}(X,Y)}{\sigma_x\,\sigma_y} = \frac{\sum (x-\bar{x})(y-\bar{y})}{\sqrt{\sum (x-\bar{x})^2}\,\sqrt{\sum (y-\bar{y})^2}}

or equivalently $\;r = \dfrac{n\sum xy - \sum x\sum y}{\sqrt{n\sum x^2 - (\sum x)^2}\,\sqrt{n\sum y^2 - (\sum y)^2}}.$

Properties

Range: $-1 \le r \le +1$ .
Sign: $r > 0$ → positive (direct) correlation; $r < 0$ → negative (inverse); $r = 0$ → no linear correlation.
Perfect correlation: $r = \pm 1$ means all points lie exactly on a straight line.
Independent of origin and scale: $r$ is unchanged by linear transformations $u = (x-a)/h$ , $v=(y-b)/k$ .
Symmetry: $r_{xy} = r_{yx}$ .
Geometric mean of regression coefficients: $r = \pm\sqrt{b_{yx}\,b_{xy}}$ .
It is a pure number (unit-free).

Answer 8

Regression Coefficients

The regression coefficient is the slope of a regression line — it gives the average change in the dependent variable per unit change in the independent variable.

Regression of $Y$ on $X$ : $\;b_{yx} = r\dfrac{\sigma_y}{\sigma_x} = \dfrac{\operatorname{Cov}(X,Y)}{\sigma_x^2}$
Regression of $X$ on $Y$ : $\;b_{xy} = r\dfrac{\sigma_x}{\sigma_y} = \dfrac{\operatorname{Cov}(X,Y)}{\sigma_y^2}$

Properties

Correlation as geometric mean: $r = \pm\sqrt{b_{yx}\,b_{xy}}$ .
Same sign: both $b_{yx}$ and $b_{xy}$ have the same sign, which is also the sign of $r$ .
Product condition: $b_{yx}\,b_{xy} = r^2 \le 1$ , so both coefficients cannot exceed unity at the same time (if one $>1$ , the other $<1$ ).
Independent of origin but not of scale: changing scale changes the coefficients.
Arithmetic mean property: the arithmetic mean of the two regression coefficients is $\ge r$ , i.e. $\dfrac{b_{yx}+b_{xy}}{2} \ge r$ .

Answer 9

Sampling Distribution and Standard Error

Sampling Distribution

If all possible samples of a fixed size $n$ are drawn from a population and a statistic (e.g. the sample mean $\bar{x}$ , proportion $p$ , or variance $s^2$ ) is computed for each, the probability distribution of that statistic over all such samples is called its sampling distribution.

Example: the sampling distribution of $\bar{x}$ has mean $E(\bar{x}) = \mu$ and, by the Central Limit Theorem, is approximately normal for large $n$ .

Standard Error (S.E.)

The standard error is the standard deviation of the sampling distribution of a statistic. It measures the precision / variability of the estimate:

\text{S.E.}(\bar{x}) = \frac{\sigma}{\sqrt{n}}, \qquad \text{S.E.}(p) = \sqrt{\frac{PQ}{n}}

Uses of Standard Error

It measures the reliability of a sample estimate (smaller S.E. = more precise).
It forms the denominator of test statistics ( $z = (\bar{x}-\mu)/\text{S.E.}$ ).
It is used to construct confidence intervals.
S.E. decreases as the sample size $n$ increases (proportional to $1/\sqrt{n}$ ).

Answer 10

Confidence Interval for a Population Mean

A confidence interval (CI) is a range of values, computed from a sample, that is likely to contain the unknown population mean $\mu$ with a stated probability called the confidence level $(1-\alpha)$ , e.g. 95%.

Case 1: Large Sample ( $n \ge 30$ ) or $\sigma$ known

Using the normal distribution:

\bar{x} \pm z_{\alpha/2}\,\frac{\sigma}{\sqrt{n}}

For 95% confidence, $z_{\alpha/2} = 1.96$ ; for 99%, $z_{\alpha/2} = 2.58$ . If $\sigma$ is unknown, replace it with the sample s.d. $s$ .

Case 2: Small Sample ( $n < 30$ ), $\sigma$ unknown

Using the $t$ -distribution with $n-1$ degrees of freedom:

\bar{x} \pm t_{\alpha/2,\,n-1}\,\frac{s}{\sqrt{n}}

Steps

Compute sample mean $\bar{x}$ and standard error $\sigma/\sqrt{n}$ (or $s/\sqrt{n}$ ).
Choose confidence level and find $z_{\alpha/2}$ or $t_{\alpha/2}$ .
Compute the margin of error $E = (\text{critical value})\times \text{S.E.}$
The interval is $(\bar{x} - E,\ \bar{x} + E)$ .

Interpretation: We are 95% confident that the population mean lies within this interval; a wider level or smaller $n$ gives a wider interval.

Answer 11

F-Test for Equality of Two Population Variances

Used to test whether two independent normal populations have equal variances, based on their sample variances.

Hypotheses

H_0: \sigma_1^2 = \sigma_2^2 \qquad H_1: \sigma_1^2 \ne \sigma_2^2

Test Statistic

F = \frac{s_1^2}{s_2^2} \quad (\text{larger variance in the numerator, so } F \ge 1)

where the unbiased sample variances are

s_1^2 = \frac{\sum (x_1-\bar{x}_1)^2}{n_1-1}, \qquad s_2^2 = \frac{\sum (x_2-\bar{x}_2)^2}{n_2-1}

Degrees of Freedom

$\nu_1 = n_1 - 1$ (numerator), $\nu_2 = n_2 - 1$ (denominator).

Decision Rule

Compare $F_{\text{cal}}$ with the table value $F_{\alpha\,(\nu_1,\nu_2)}$ :

If $F_{\text{cal}} \le F_{\text{tab}}$ → do not reject $H_0$ (variances equal).
If $F_{\text{cal}} > F_{\text{tab}}$ → reject $H_0$ (variances differ significantly).

Assumptions: both samples are independent and drawn from normal populations. (The F-test is also the basis of ANOVA.)

Answer 12

Index Numbers

An index number is a statistical measure that expresses the relative change in a variable (price, quantity, value) of a group of related items between a current period and a fixed base period, usually expressed as a percentage. Base period value $= 100$ .

Notation: $p_0, q_0$ = base-period price and quantity; $p_1, q_1$ = current-period price and quantity.

Laspeyres' Price Index (base-year weights)

Uses base-year quantities $q_0$ as weights:

P_{01}^{L} = \frac{\sum p_1 q_0}{\sum p_0 q_0} \times 100

Advantage: requires only base-year quantities, easy to update. Limitation: tends to over-estimate the rise in prices (ignores consumers switching away from costlier goods).

Paasche's Price Index (current-year weights)

Uses current-year quantities $q_1$ as weights:

P_{01}^{P} = \frac{\sum p_1 q_1}{\sum p_0 q_1} \times 100

Advantage: reflects current consumption pattern. Limitation: needs fresh current-year quantities each period; tends to under-estimate the price rise.

Note: Fisher's Ideal Index is the geometric mean of the two: $P_F = \sqrt{P^{L}\times P^{P}}$ .

Level	BSc CSIT (TU)
Stream	Science
Subject	Statistics II (BSc CSIT, STA210)
Year	2080 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

Section A: Long Answer Questions

Hypothesis Testing

1. Null and Alternative Hypotheses

2. Level of Significance (α\alphaα)

3. Types of Errors

4. Critical Region

5. General Procedure

Correlation and Regression

Fitting the Two Regression Lines (Least Squares)

Relationship between rrr and the Regression Coefficients

Sampling

A. Probability (Random) Sampling

B. Non-Probability Sampling

Section B: Short Answer Questions

Mathematical Expectation

Properties (with proof)

t-Test for Difference of Two Sample Means

Hypotheses

Test Statistic

Degrees of Freedom

Decision Rule

z-Test for a Single Mean (Large Sample)

Hypotheses

Test Statistic

Decision Rule

Example

Karl Pearson's Coefficient of Correlation

Properties

Regression Coefficients

Properties

Sampling Distribution and Standard Error

Sampling Distribution

Standard Error (S.E.)

Uses of Standard Error

Confidence Interval for a Population Mean

Case 1: Large Sample (n≥30n \ge 30n≥30) or σ\sigmaσ known

Case 2: Small Sample (n<30n < 30n<30), σ\sigmaσ unknown

Steps

F-Test for Equality of Two Population Variances

Hypotheses

Test Statistic

Degrees of Freedom

Decision Rule

Index Numbers

Laspeyres' Price Index (base-year weights)

Paasche's Price Index (current-year weights)

Frequently asked questions

2. Level of Significance ( $\alpha$ )

Relationship between $r$ and the Regression Coefficients

Case 1: Large Sample ( $n \ge 30$ ) or $\sigma$ known

Case 2: Small Sample ( $n < 30$ ), $\sigma$ unknown