BSc CSIT (TU) Science Statistics II (BSc CSIT, STA210) Question Paper 2075 Nepal

Q: Where can I find the BSc CSIT (TU) Statistics II (BSc CSIT, STA210) question paper 2075?

The full BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2075 (Regular (annual)) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the Statistics II (BSc CSIT, STA210) 2075 paper come with solutions?

Yes. Every question on this Statistics II (BSc CSIT, STA210) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2075 paper?

The BSc CSIT (TU) Statistics II (BSc CSIT, STA210) 2075 paper carries 60 full marks and is meant to be completed in 180 minutes, across 12 questions.

Q: Is practising this Statistics II (BSc CSIT, STA210) past paper free?

Yes — reading and attempting this Statistics II (BSc CSIT, STA210) past paper on Kekkei is completely free.

Question

1Long answer10 marks

What is hypothesis testing? Explain the procedure of testing of hypothesis including null and alternative hypotheses, level of significance, types of errors and the critical region.

hypothesis-testing

Answer 1

Hypothesis Testing

Hypothesis testing is a statistical procedure that uses sample data to decide whether a claim (assumption) about a population parameter is supported by the evidence. It provides a rule for accepting or rejecting the claim with a controlled probability of error.

Procedure of Testing a Hypothesis

Step 1: Set up the Null and Alternative Hypotheses

Null hypothesis ( $H_0$ ): A statement of no difference / no effect that is assumed true until evidence contradicts it, e.g. $H_0:\mu = \mu_0$ .
Alternative hypothesis ( $H_1$ ): The claim accepted if $H_0$ $H_{0}$ is rejected. It may be:
- Two-tailed: $H_1:\mu \neq \mu_0$
- One-tailed: $H_1:\mu > \mu_0$ or $H_1:\mu < \mu_0$

Step 2: Choose the Level of Significance ( $\alpha$ )

The level of significance is the maximum probability of rejecting $H_0$ when it is actually true. Common values are $\alpha = 0.05$ (5%) or $0.01$ (1%). It is fixed before collecting data.

Step 3: Identify the Test Statistic

Select an appropriate statistic (e.g. $z$ , $t$ , $\chi^2$ , $F$ ) whose sampling distribution under $H_0$ is known, e.g.

Z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}.

Step 4: Determine the Critical Region (Rejection Region)

The critical region is the set of values of the test statistic for which $H_0$ is rejected. Its boundary is the critical value, fixed so that $P(\text{test statistic} \in \text{critical region} \mid H_0) = \alpha$ .

Step 5: Compute the test statistic from the sample and compare with the critical value.

Step 6: Decision — Reject $H_0$ if the computed value falls in the critical region; otherwise do not reject $H_0$ . State the conclusion in the context of the problem.

Types of Errors

	$H_0$ True	$H_0$ False
Reject $H_0$	Type I error ( $\alpha$ )	Correct (power $=1-\beta$ )
Accept $H_0$	Correct $(1-\alpha)$	Type II error ( $\beta$ )

Type I error: Rejecting a true $H_0$ ; probability $= \alpha$ .
Type II error: Accepting a false $H_0$ ; probability $= \beta$ .

Reducing one error (for fixed $n$ ) tends to increase the other; increasing sample size reduces both.

Answer 2

Correlation and Regression

Correlation measures the degree and direction of the linear association between two variables $X$ and $Y$ . It tells whether the variables move together (positive), oppositely (negative), or are unrelated. Karl Pearson's coefficient is

r = \frac{\operatorname{Cov}(X,Y)}{\sigma_x \sigma_y} = \frac{\sum (x-\bar{x})(y-\bar{y})}{\sqrt{\sum (x-\bar{x})^2 \sum (y-\bar{y})^2}}, \quad -1 \le r \le 1.

Regression is the statistical method of estimating (predicting) the value of one (dependent) variable from the known value of another (independent) variable by fitting an average mathematical relationship between them.

Fitting the Two Regression Lines

There are two regression lines because either variable may be treated as dependent.

(a) Regression line of $Y$ on $X$ (used to estimate $Y$ from $X$ ):

Y - \bar{Y} = b_{yx}(X - \bar{X}), \qquad b_{yx} = \frac{\operatorname{Cov}(X,Y)}{\sigma_x^2} = r\frac{\sigma_y}{\sigma_x}.

The coefficient $b_{yx}$ is obtained by minimising $\sum (Y - \hat{Y})^2$ (least squares).

(b) Regression line of $X$ on $Y$ (used to estimate $X$ from $Y$ ):

X - \bar{X} = b_{xy}(Y - \bar{Y}), \qquad b_{xy} = \frac{\operatorname{Cov}(X,Y)}{\sigma_y^2} = r\frac{\sigma_x}{\sigma_y}.

Both lines pass through the mean point $(\bar{X},\bar{Y})$ .

Relationship between $r$ and the Regression Coefficients

Multiplying the two coefficients:

b_{yx}\cdot b_{xy} = \left(r\frac{\sigma_y}{\sigma_x}\right)\left(r\frac{\sigma_x}{\sigma_y}\right) = r^2.

Hence the correlation coefficient is the geometric mean of the two regression coefficients:

r = \pm\sqrt{b_{yx}\,b_{xy}}.

Key points: $r$ takes the same sign as the regression coefficients; since $r^2 \le 1$ , both coefficients cannot exceed 1 simultaneously; if one coefficient $>1$ the other must be $<1$ .

Answer 3

Sampling

Sampling is the process of selecting a representative subset (sample) from a population in order to draw inferences about the whole population, saving time, cost and effort compared with a complete census.

Sampling methods are broadly classified as probability and non-probability sampling.

A. Probability Sampling

Every unit has a known, non-zero chance of selection; results can be generalized with measurable error.

Simple Random Sampling — every unit has an equal chance of selection (lottery / random numbers).
- Merits: unbiased, simple, sampling error measurable.
- Demerits: needs complete frame; units may be geographically scattered.
Systematic Sampling — select every $k^{th}$ $k^{t h}$ unit after a random start ( $k = N/n$ $k = N / n$ ).
- Merits: simple, fast, evenly spread.
- Demerits: biased if there is a hidden periodicity in the list.
Stratified Sampling — divide the population into homogeneous strata and sample from each.
- Merits: high precision, ensures representation of subgroups.
- Demerits: requires prior knowledge of strata; complex.
Cluster / Multi-stage Sampling — divide into clusters, randomly select whole clusters (or sample within them in stages).
- Merits: cheap, no full frame needed, good for wide areas.
- Demerits: larger sampling error than other probability methods.

B. Non-Probability Sampling

Selection probability is unknown; based on judgement or convenience; generalization is limited.

Convenience Sampling — units easiest to reach are selected. Merit: quick, cheap. Demerit: highly biased, not representative.
Judgement (Purposive) Sampling — expert selects typical units. Merit: useful for small specialised studies. Demerit: subjective, bias-prone.
Quota Sampling — fixed quotas filled for each category. Merit: convenient, ensures group coverage. Demerit: selection within quota is biased.
Snowball Sampling — existing subjects refer further subjects. Merit: reaches hidden/rare populations. Demerit: strong selection bias.

Summary: Probability sampling gives unbiased, measurable results but is costlier; non-probability sampling is cheaper and faster but prone to bias.

Answer 4

Mathematical Expectation

For a random variable $X$ , the mathematical expectation (expected value / mean) is

E(X) = \sum_x x\,p(x) \quad (\text{discrete}), \qquad E(X)=\int_{-\infty}^{\infty} x f(x)\,dx \quad (\text{continuous}),

provided the sum/integral converges absolutely. It is the long-run average value of $X$ .

Properties (with proofs)

1. Expectation of a constant: $E(c) = c$ . Proof: $E(c)=\sum c\,p(x)=c\sum p(x)=c\cdot 1 = c.$

2. Linearity (constant multiple): $E(aX)=aE(X)$ . Proof: $E(aX)=\sum ax\,p(x)=a\sum x\,p(x)=aE(X).$

3. Addition of a constant: $E(aX+b)=aE(X)+b$ . Proof: $E(aX+b)=\sum (ax+b)p(x)=a\sum x p(x)+b\sum p(x)=aE(X)+b.$

4. Addition theorem: $E(X+Y)=E(X)+E(Y)$ (always true). Proof: $E(X+Y)=\sum_x\sum_y (x+y)p(x,y)=\sum_x\sum_y x\,p(x,y)+\sum_x\sum_y y\,p(x,y)=E(X)+E(Y).$

5. Multiplication theorem (independence): If $X$ and $Y$ are independent, $E(XY)=E(X)E(Y)$ . Proof: For independence $p(x,y)=p(x)p(y)$ , so $E(XY)=\sum_x\sum_y xy\,p(x)p(y)=\big(\sum_x x p(x)\big)\big(\sum_y y p(y)\big)=E(X)E(Y).$

Answer 5

t-test for the Difference between Two Sample Means

Used for small independent samples ( $n_1, n_2 < 30$ ) drawn from normal populations having a common but unknown variance, to test whether their means differ.

Hypotheses: $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1\neq\mu_2$ (or one-tailed).

Test statistic:

t = \frac{\bar{x}_1 - \bar{x}_2}{S\sqrt{\dfrac{1}{n_1}+\dfrac{1}{n_2}}},

where the pooled estimate of the common variance is

S^2 = \frac{\sum (x_{1}-\bar{x}_1)^2 + \sum (x_{2}-\bar{x}_2)^2}{n_1+n_2-2} = \frac{n_1 s_1^2 + n_2 s_2^2}{n_1+n_2-2}.

Degrees of freedom: $df = n_1 + n_2 - 2$ .

Decision rule: Compare $|t|_{cal}$ with the table value $t_{\alpha, df}$ .

If $|t|_{cal} \le t_{tab}$ : do not reject $H_0$ (means not significantly different).
If $|t|_{cal} > t_{tab}$ : reject $H_0$ (means significantly different).

Assumptions: populations normal, samples independent and random, equal (homogeneous) population variances.

Answer 6

z-test for a Single Mean (Large Sample)

When the sample is large ( $n \ge 30$ ), the sampling distribution of the mean is approximately normal, so a $z$ -test tests whether the sample mean differs from a hypothesised population mean $\mu_0$ .

Hypotheses: $H_0:\mu=\mu_0$ vs $H_1:\mu\neq\mu_0$ .

Test statistic:

Z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}},

where $\sigma$ is the population standard deviation (the sample s.d. $s$ is used if $\sigma$ is unknown, valid for large $n$ ).

Decision rule (5% level, two-tailed): Reject $H_0$ if $|Z| > 1.96$ (use $2.58$ at 1%). For one-tailed tests use $1.645$ .

Example

A machine is set to fill packets of mean weight $\mu_0 = 500$ g. A sample of $n = 64$ packets gives $\bar{x}=496$ g with $\sigma = 16$ g. Test at 5%.

Z = \frac{496-500}{16/\sqrt{64}} = \frac{-4}{16/8} = \frac{-4}{2} = -2.0.

Since $|Z| = 2.0 > 1.96$ , we reject $H_0$ : the mean filling weight differs significantly from 500 g.

Answer 7

Karl Pearson's Coefficient of Correlation

It is a numerical measure of the degree and direction of the linear relationship between two variables $X$ and $Y$ , defined as the ratio of their covariance to the product of their standard deviations:

r = \frac{\operatorname{Cov}(X,Y)}{\sigma_x \sigma_y} = \frac{\sum (x-\bar{x})(y-\bar{y})}{\sqrt{\sum (x-\bar{x})^2}\,\sqrt{\sum (y-\bar{y})^2}}.

Properties

Range: $-1 \le r \le +1$ . $r=+1$ perfect positive, $r=-1$ perfect negative, $r=0$ no linear correlation.
Independent of origin and scale (units): $r$ is unchanged when each variable is transformed by $u=(x-a)/h$ , $v=(y-b)/k$ .
Symmetric: $r_{xy} = r_{yx}$ .
It is the geometric mean of the two regression coefficients: $r = \pm\sqrt{b_{yx}\,b_{xy}}$ .
It is a pure (dimensionless) number, independent of the units of measurement.
Measures only linear association; $r=0$ does not imply the variables are unrelated (could be non-linear).

Answer 8

Regression Coefficients

The regression coefficient is the slope of a regression line; it measures the average change in the dependent variable for a unit change in the independent variable.

Regression coefficient of $Y$ on $X$ : $\displaystyle b_{yx} = \frac{\operatorname{Cov}(X,Y)}{\sigma_x^2} = r\frac{\sigma_y}{\sigma_x}.$
Regression coefficient of $X$ on $Y$ : $\displaystyle b_{xy} = \frac{\operatorname{Cov}(X,Y)}{\sigma_y^2} = r\frac{\sigma_x}{\sigma_y}.$

Properties

Geometric mean relation with $r$ : $r = \pm\sqrt{b_{yx}\,b_{xy}}$ , i.e. $b_{yx}\,b_{xy}=r^2$ .
Same sign: both regression coefficients (and $r$ ) have the same sign; if both are positive $r$ is positive, if both negative $r$ is negative.
Both cannot exceed unity: since $r^2 \le 1$ , the product $b_{yx}b_{xy}\le 1$ , so if one coefficient is greater than 1 the other must be less than 1.
Independent of change of origin but not of scale.
Arithmetic mean of the two coefficients is greater than or equal to $r$ : $\tfrac{1}{2}(b_{yx}+b_{xy}) \ge r$ .

Answer 9

Sampling Distribution and Standard Error

Sampling distribution: If all possible samples of a fixed size $n$ are drawn from a population and a statistic (e.g. the mean $\bar{x}$ , proportion $p$ ) is computed for each sample, the probability distribution of that statistic over all such samples is called its sampling distribution. For example, the sampling distribution of the mean has its own mean $E(\bar{x})=\mu$ and variance $\sigma^2/n$ , and by the Central Limit Theorem it is approximately normal for large $n$ .

Standard error (S.E.): The standard deviation of the sampling distribution of a statistic is called its standard error. It measures the variability of the statistic due to sampling and is used to set confidence limits and construct test statistics. For the sample mean:

\text{S.E.}(\bar{x}) = \frac{\sigma}{\sqrt{n}}.

For a sample proportion: $\text{S.E.}(p) = \sqrt{\dfrac{PQ}{n}}$ .

Importance / uses:

A smaller S.E. means the estimate is more reliable; S.E. decreases as $n$ increases.
It provides the denominator of test statistics, e.g. $Z = (\bar{x}-\mu)/\text{S.E.}(\bar{x})$ .
It is used to construct confidence intervals and to judge the precision of an estimate.

Answer 10

Confidence Interval for a Population Mean

A confidence interval gives a range of values, computed from sample data, within which the unknown population mean $\mu$ is expected to lie with a stated probability (confidence level $1-\alpha$ , e.g. 95%).

Construction

Point estimate: the sample mean $\bar{x}$ .

General form: $\bar{x} \pm (\text{critical value}) \times \text{S.E.}(\bar{x})$ .

Case 1 — Large sample ( $n\ge 30$ ) or known $\sigma$ (use $Z$ ):

\bar{x} \pm z_{\alpha/2}\,\frac{\sigma}{\sqrt{n}}.

For 95% confidence, $z_{\alpha/2}=1.96$ ; for 99%, $z_{\alpha/2}=2.58$ .

Case 2 — Small sample ( $n<30$ ), $\sigma$ unknown (use $t$ ):

\bar{x} \pm t_{\alpha/2,\,n-1}\,\frac{s}{\sqrt{n}}, \qquad df = n-1.

Example: $n=100,\ \bar{x}=50,\ \sigma=10$ . 95% interval $= 50 \pm 1.96\times\frac{10}{\sqrt{100}} = 50 \pm 1.96 = (48.04,\ 51.96)$ .

Interpretation: We are 95% confident that the true population mean lies between the lower and upper limits; i.e. in repeated sampling, 95% of such intervals would contain $\mu$ .

Answer 11

F-test for Equality of Two Population Variances

Used to test whether two independent normal populations have equal variances (also the basis of ANOVA and testing the validity of pooling variances in the t-test).

Hypotheses: $H_0:\sigma_1^2=\sigma_2^2$ vs $H_1:\sigma_1^2\neq\sigma_2^2$ .

Test statistic: the ratio of the two sample variances, with the larger variance in the numerator:

F = \frac{s_1^2}{s_2^2}\quad (s_1^2 > s_2^2),

where the unbiased sample variances are

s_1^2 = \frac{\sum (x_1-\bar{x}_1)^2}{n_1-1}, \qquad s_2^2 = \frac{\sum (x_2-\bar{x}_2)^2}{n_2-1}.

Degrees of freedom: $(\nu_1, \nu_2) = (n_1-1,\ n_2-1)$ , corresponding to numerator and denominator.

Decision rule: Compare $F_{cal}$ with the table value $F_{\alpha}(\nu_1,\nu_2)$ .

If $F_{cal} \le F_{tab}$ : do not reject $H_0$ — variances are homogeneous.
If $F_{cal} > F_{tab}$ : reject $H_0$ — variances differ significantly.

Assumptions: populations normally distributed; samples independent and random. Since the larger variance is on top, $F \ge 1$ and only the upper tail is used.

Answer 12

Index Numbers

An index number is a statistical measure that shows the relative change in the value of a variable (or a group of variables such as prices, quantities or values) over time or place, with respect to a chosen base period taken as 100. It is a specialised average used to study, for example, changes in the cost of living or price level.

Let $p_0, q_0$ be base-year price and quantity and $p_1, q_1$ the current-year price and quantity.

Laspeyres' Price Index (base-year weights)

Uses base-year quantities ( $q_0$ ) as weights:

P_{01}^{L} = \frac{\sum p_1 q_0}{\sum p_0 q_0} \times 100.

It measures the cost of buying base-year quantities at current prices versus base prices. It tends to overstate the price rise (ignores substitution away from costlier goods).

Paasche's Price Index (current-year weights)

Uses current-year quantities ( $q_1$ ) as weights:

P_{01}^{P} = \frac{\sum p_1 q_1}{\sum p_0 q_1} \times 100.

It tends to understate the price rise. (Fisher's ideal index is the geometric mean of the two: $P^F=\sqrt{P^L \times P^P}$ .)

Key difference: Laspeyres uses base-period weights (data needed only once), whereas Paasche uses current-period weights (must be recollected each period).

Level	BSc CSIT (TU)
Stream	Science
Subject	Statistics II (BSc CSIT, STA210)
Year	2075 BS
Exam session	Regular (annual)
Full marks	60
Time allowed	180 minutes
Questions	12, all with step-by-step solutions

BSc CSIT (TU) Science Statistics II (BSc CSIT, STA210) Question Paper 2075 Nepal

Section A: Long Answer Questions

Hypothesis Testing

Procedure of Testing a Hypothesis

Types of Errors

Correlation and Regression

Fitting the Two Regression Lines

Relationship between $r$ and the Regression Coefficients

Sampling

A. Probability Sampling

B. Non-Probability Sampling

Section B: Short Answer Questions

Mathematical Expectation

Properties (with proofs)

t-test for the Difference between Two Sample Means

z-test for a Single Mean (Large Sample)

Example

Karl Pearson's Coefficient of Correlation

Properties

Regression Coefficients

Properties

Sampling Distribution and Standard Error

Confidence Interval for a Population Mean

Construction

F-test for Equality of Two Population Variances

Index Numbers

Laspeyres' Price Index (base-year weights)

Paasche's Price Index (current-year weights)

Frequently asked questions

Section A: Long Answer Questions

Hypothesis Testing

Procedure of Testing a Hypothesis

Types of Errors

Correlation and Regression

Fitting the Two Regression Lines

Relationship between rrr and the Regression Coefficients

Sampling

A. Probability Sampling

B. Non-Probability Sampling

Section B: Short Answer Questions

Mathematical Expectation

Properties (with proofs)

t-test for the Difference between Two Sample Means

z-test for a Single Mean (Large Sample)

Example

Karl Pearson's Coefficient of Correlation

Properties

Regression Coefficients

Properties

Sampling Distribution and Standard Error

Confidence Interval for a Population Mean

Construction

F-test for Equality of Two Population Variances

Index Numbers

Laspeyres' Price Index (base-year weights)

Paasche's Price Index (current-year weights)

Frequently asked questions

Relationship between $r$ and the Regression Coefficients