BSc CSIT (TU) Science B.Sc. II Year Statistics (STA201) (Model) Question Paper 2075 Nepal

Q: Where can I find the BSc CSIT (TU) B.Sc. II Year Statistics (STA201) (Model) question paper 2075?

The full BSc CSIT (TU) B.Sc. II Year Statistics (STA201) (Model) 2075 (model) question paper is available free on Kekkei. You can read every question online and attempt the paper under timed exam conditions.

Q: Does the B.Sc. II Year Statistics (STA201) (Model) 2075 paper come with solutions?

Yes. Every question on this B.Sc. II Year Statistics (STA201) (Model) past paper includes a step-by-step solution, plus instant AI feedback when you attempt it on Kekkei.

Q: How many marks is the BSc CSIT (TU) B.Sc. II Year Statistics (STA201) (Model) 2075 paper?

The BSc CSIT (TU) B.Sc. II Year Statistics (STA201) (Model) 2075 paper carries 100 full marks and is meant to be completed in 180 minutes, across 26 questions.

Q: Is practising this B.Sc. II Year Statistics (STA201) (Model) past paper free?

Yes — reading and attempting this B.Sc. II Year Statistics (STA201) (Model) past paper on Kekkei is completely free.

Question

1long10 marks

Define Negative binomial distribution. Derive its mean and variance.

negative-binomial-distributionmean-varianceprobability-distributions

Answer 1

Definition. The negative binomial distribution gives the probability that the $r^{th}$ success occurs on the $(r+k)^{th}$ trial in a sequence of independent Bernoulli trials with success probability $p$ (failure $q = 1-p$ ). If $X$ is the number of failures before the $r^{th}$ success, its p.m.f. is

P(X = k) = \binom{k+r-1}{k} p^{r} q^{k}, \quad k = 0, 1, 2, \dots

Mean and variance (via the m.g.f.). The moment generating function is

M_X(t) = \left(\frac{p}{1 - q e^{t}}\right)^{r}.

Differentiating and evaluating at $t = 0$ :

E(X) = M_X'(0) = \frac{rq}{p}.

Using $E(X^2) = M_X''(0)$ ,

\operatorname{Var}(X) = E(X^2) - [E(X)]^2 = \frac{rq}{p^{2}}.

Thus Mean $= \dfrac{rq}{p}$ and Variance $= \dfrac{rq}{p^{2}}$ . Note that variance $>$ mean (over-dispersion), since $1/p > 1$ .

Answer 2

Gamma distribution. A continuous random variable $X$ follows a gamma distribution with shape parameter $\alpha > 0$ (and scale 1) if its p.d.f. is

f(x) = \frac{1}{\Gamma(\alpha)} e^{-x} x^{\alpha-1}, \quad x > 0.

Its mean is $E(X) = \alpha$ and variance $\operatorname{Var}(X) = \alpha$ .

Normal limit as $\alpha \to \infty$ . Standardise $X$ by

Z = \frac{X - \alpha}{\sqrt{\alpha}}.

The m.g.f. of $X$ is $M_X(t) = (1-t)^{-\alpha}$ for $t < 1$ . Then

M_Z(t) = e^{-\sqrt{\alpha}\,t}\, M_X\!\left(\frac{t}{\sqrt{\alpha}}\right) = e^{-\sqrt{\alpha}\,t}\left(1 - \frac{t}{\sqrt{\alpha}}\right)^{-\alpha}.

Taking logarithms and expanding $\log(1 - t/\sqrt{\alpha})$ in a Taylor series:

\log M_Z(t) = -\sqrt{\alpha}\,t - \alpha\log\!\left(1 - \frac{t}{\sqrt{\alpha}}\right) = -\sqrt{\alpha}\,t + \alpha\left(\frac{t}{\sqrt{\alpha}} + \frac{t^{2}}{2\alpha} + \frac{t^{3}}{3\alpha^{3/2}} + \dots\right).

This simplifies to

\log M_Z(t) = \frac{t^{2}}{2} + \frac{t^{3}}{3\sqrt{\alpha}} + \dots \xrightarrow{\alpha \to \infty} \frac{t^{2}}{2}.

Hence $M_Z(t) \to e^{t^{2}/2}$ , which is the m.g.f. of the standard normal distribution. Therefore, for large $\alpha$ , the gamma distribution tends to $N(\alpha, \alpha)$ .

Answer 3

Bivariate random variable. Let $(X, Y)$ be a pair of jointly distributed random variables.

Joint distribution function (c.d.f.):

F(x, y) = P(X \le x, Y \le y), \quad -\infty < x, y < \infty.

Joint probability density function (continuous case): If $F(x,y)$ is differentiable, the joint p.d.f. is

f(x, y) = \frac{\partial^{2} F(x, y)}{\partial x\, \partial y}, \quad \text{with } f(x,y) \ge 0 \text{ and } \int_{-\infty}^{\infty}\!\int_{-\infty}^{\infty} f(x,y)\,dx\,dy = 1.

Equivalently, $F(x,y) = \int_{-\infty}^{x}\int_{-\infty}^{y} f(u,v)\,dv\,du.$

Properties of the bivariate distribution function $F(x,y)$ :

$0 \le F(x, y) \le 1$ .
$F$ is monotonically non-decreasing in each argument.
$F(-\infty, y) = F(x, -\infty) = F(-\infty, -\infty) = 0$ .
$F(+\infty, +\infty) = 1$ .
The marginal distribution functions are $F_X(x) = F(x, +\infty)$ and $F_Y(y) = F(+\infty, y)$ .
For $a_1 < a_2$ and $b_1 < b_2$ :

P(a_1 < X \le a_2, b_1 < Y \le b_2) = F(a_2, b_2) - F(a_1, b_2) - F(a_2, b_1) + F(a_1, b_1) \ge 0.

$F(x,y)$ is right-continuous in each variable.

Answer 4

Derivation of Student's $t$ -distribution. Let $Z \sim N(0,1)$ and $\chi^{2}$ be a chi-square variate with $n$ degrees of freedom, independent of $Z$ . Define

t = \frac{Z}{\sqrt{\chi^{2}/n}}.

Using the joint density of $Z$ and $\chi^{2}$ and the transformation, integrating out the chi-square variable gives the p.d.f. of $t$ :

f(t) = \frac{1}{\sqrt{n\pi}}\, \frac{\Gamma\!\left(\frac{n+1}{2}\right)}{\Gamma\!\left(\frac{n}{2}\right)} \left(1 + \frac{t^{2}}{n}\right)^{-\frac{n+1}{2}}, \quad -\infty < t < \infty.

This is the Student's $t$ -distribution with $n$ degrees of freedom.

Relation between $F$ and $t$ when $m = 1$ . The $F$ statistic with $(m, n)$ degrees of freedom is

F = \frac{\chi_m^{2}/m}{\chi_n^{2}/n}.

For $m = 1$ , $\chi_1^{2} = Z^{2}$ where $Z \sim N(0,1)$ . Hence

F(1, n) = \frac{Z^{2}/1}{\chi_n^{2}/n} = \left(\frac{Z}{\sqrt{\chi_n^{2}/n}}\right)^{2} = t^{2}.

Therefore $F(1, n) = t_n^{2}$ ; that is, the square of a $t$ -variate with $n$ d.f. follows an $F$ distribution with $(1, n)$ d.f. Equivalently, $t_n = \sqrt{F(1, n)}$ .

Answer 5

Method of maximum likelihood estimation (MLE). Let $x_1, x_2, \dots, x_n$ be a random sample from a population with density $f(x; \theta)$ . The likelihood function is

L(\theta) = \prod_{i=1}^{n} f(x_i; \theta).

The maximum likelihood estimate $\hat{\theta}$ is the value of $\theta$ that maximises $L(\theta)$ (equivalently $\log L(\theta)$ ). It is obtained by solving

\frac{\partial \log L}{\partial \theta} = 0, \qquad \frac{\partial^{2} \log L}{\partial \theta^{2}} < 0 \;\text{(maximum condition)}.

Properties of MLEs:

Consistency — MLEs are consistent: $\hat{\theta} \to \theta$ in probability as $n \to \infty$ .
Asymptotic normality — for large $n$ , $\hat{\theta}$ is approximately normally distributed with mean $\theta$ and variance equal to the Cramer-Rao lower bound.
Asymptotic efficiency — MLEs attain the minimum possible variance asymptotically.
Sufficiency — if a sufficient statistic exists, the MLE is a function of it.
Invariance — if $\hat{\theta}$ is the MLE of $\theta$ , then $g(\hat{\theta})$ is the MLE of $g(\theta)$ .
Not always unbiased — MLEs may be biased for small samples, though the bias vanishes as $n \to \infty$ .

Answer 6

Parametric vs non-parametric tests:

Basis	Parametric test	Non-parametric test
Assumptions	Assume a specific population distribution (usually normal)	Distribution-free; no assumption about population form
Data type	Require interval/ratio (quantitative) data	Suitable for nominal/ordinal data
Parameters	Test hypotheses about parameters ( $\mu, \sigma^2$ )	Do not involve population parameters directly
Power	More powerful when assumptions hold	Less powerful but more robust
Examples	$t$ -test, $F$ -test, $z$ -test	Run test, sign test, Mann-Whitney U, Chi-square

One-sample run test (test of randomness). A run is a sequence of identical symbols bounded by different symbols (or boundaries). The run test checks whether a sequence of two types of outcomes occurs in a random order.

Procedure:

Arrange the observations in the order obtained and classify each into one of two categories (e.g. above/below the median, denoted $+$ and $-$ ).
Let $n_1$ = number of $+$ symbols, $n_2$ = number of $-$ symbols, and $R$ = total number of runs.
Hypotheses: $H_0$ : the sequence is random; $H_1$ : the sequence is not random.
For large samples, under $H_0$ , $R$ is approximately normal with

E(R) = \frac{2 n_1 n_2}{n_1 + n_2} + 1, \qquad \operatorname{Var}(R) = \frac{2 n_1 n_2 (2 n_1 n_2 - n_1 - n_2)}{(n_1 + n_2)^{2}(n_1 + n_2 - 1)}.

Compute the test statistic $Z = \dfrac{R - E(R)}{\sqrt{\operatorname{Var}(R)}}$ and compare with the critical value (e.g. $\pm 1.96$ at 5%). Reject $H_0$ if $|Z|$ exceeds the critical value. For small samples, use the run-test tables.

Example: Suppose the sequence of defective (D) and non-defective (N) items is: N N D N D D N N D N. Here $n_1 = 6$ (N), $n_2 = 4$ (D), and the runs are NN | D | N | DD | NN | D | N giving $R = 7$ . Compute $E(R) = \frac{2(6)(4)}{10} + 1 = 5.8$ and the variance, then form $Z$ and compare with $1.96$ . Since $Z$ is small, we do not reject $H_0$ and conclude the order is random.

Answer 7

The negative exponential distribution has p.d.f.

f(x) = \theta e^{-\theta x}, \quad x > 0, \; \theta > 0.

Moment generating function:

M_X(t) = E(e^{tX}) = \int_0^{\infty} e^{tx}\,\theta e^{-\theta x}\,dx = \theta \int_0^{\infty} e^{-(\theta - t)x}\,dx = \frac{\theta}{\theta - t}, \quad t < \theta.

Mean: $E(X) = M_X'(0) = \dfrac{\theta}{(\theta - t)^2}\Big|_{t=0} = \dfrac{1}{\theta}.$

Second moment: $E(X^2) = M_X''(0) = \dfrac{2\theta}{(\theta - t)^3}\Big|_{t=0} = \dfrac{2}{\theta^{2}}.$

Variance: $\operatorname{Var}(X) = E(X^2) - [E(X)]^2 = \dfrac{2}{\theta^{2}} - \dfrac{1}{\theta^{2}} = \dfrac{1}{\theta^{2}}.$

Answer 8

For a beta distribution of the first kind with parameters $m$ and $n$ :

Mean: $\dfrac{m}{m+n} = \dfrac{3}{3+4} = \dfrac{3}{7} \approx 0.4286.$
Mode: $\dfrac{m-1}{m+n-2} = \dfrac{3-1}{3+4-2} = \dfrac{2}{5} = 0.40$ (valid since $m, n > 1$ ).
Variance: $\dfrac{mn}{(m+n)^{2}(m+n+1)} = \dfrac{3 \times 4}{(7)^{2}\,(8)} = \dfrac{12}{49 \times 8} = \dfrac{12}{392} = \dfrac{3}{98} \approx 0.0306.$

Answer 9

Let $U = X_1 X_2$ where $X_1, X_2 \sim U(0,1)$ are independent, so the joint density is $f(x_1, x_2) = 1$ on the unit square.

For $0 < u < 1$ , the c.d.f. is

G(u) = P(X_1 X_2 \le u) = \int_0^1 P\!\left(X_2 \le \frac{u}{x_1}\right) dx_1.

For $x_1 \le u$ , $u/x_1 \ge 1$ so the inner probability is $1$ ; for $x_1 > u$ it equals $u/x_1$ . Hence

G(u) = \int_0^{u} 1\, dx_1 + \int_u^{1} \frac{u}{x_1}\, dx_1 = u + u\big[\ln x_1\big]_u^1 = u - u\ln u.

Differentiating gives the p.d.f. of $U$ :

g(u) = \frac{d}{du}(u - u\ln u) = 1 - (\ln u + 1) = -\ln u, \quad 0 < u < 1.

Thus $X_1 X_2$ has p.d.f. $g(u) = -\ln u$ for $0 < u < 1$ (and $0$ otherwise).

Answer 10

Marginal of $X$ :

f_X(x) = \int_0^1 4xy\, dy = 4x\left[\frac{y^2}{2}\right]_0^1 = 2x, \quad 0 \le x \le 1.

Marginal of $Y$ :

f_Y(y) = \int_0^1 4xy\, dx = 4y\left[\frac{x^2}{2}\right]_0^1 = 2y, \quad 0 \le y \le 1.

Check independence:

f_X(x)\, f_Y(y) = (2x)(2y) = 4xy = f(x, y).

Since the joint density factorises into the product of the marginal densities for all $(x, y)$ , the random variables $X$ and $Y$ are independent.

Answer 11

Likelihood function. For a random sample $x_1, x_2, \dots, x_n$ from a population with density $f(x; \theta)$ , the likelihood function is the joint density viewed as a function of the parameter $\theta$ :

L(\theta \mid x_1, \dots, x_n) = \prod_{i=1}^{n} f(x_i; \theta).

Properties:

Non-negativity and integration: $L(\theta) \ge 0$ , and as a density in the sample it integrates to 1 over the sample space:

\int \cdots \int L(\theta)\, dx_1 \cdots dx_n = 1.

Score has zero expectation. Differentiating the identity $\int L\, dx = 1$ with respect to $\theta$ (under regularity conditions allowing interchange of integral and derivative):

\int \frac{\partial L}{\partial \theta}\, dx = 0 \;\Rightarrow\; \int \frac{\partial \log L}{\partial \theta}\, L\, dx = 0 \;\Rightarrow\; E\!\left(\frac{\partial \log L}{\partial \theta}\right) = 0.

Information identity. Differentiating again gives

E\!\left[\left(\frac{\partial \log L}{\partial \theta}\right)^{2}\right] = -E\!\left(\frac{\partial^{2} \log L}{\partial \theta^{2}}\right) = I(\theta),

the Fisher information, which is positive.

These properties form the basis for maximum likelihood estimation and the Cramer-Rao bound.

Answer 12

For a random sample $x_1, \dots, x_n$ from $N(\mu, \sigma^2)$ (with $\sigma^2$ known), the log-likelihood is

\log L = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_i - \mu)^2.

Differentiating with respect to $\mu$ :

\frac{\partial \log L}{\partial \mu} = \frac{1}{\sigma^2}\sum_{i=1}^{n}(x_i - \mu) = \frac{n}{\sigma^2}(\bar{x} - \mu).

This is of the form $\dfrac{\partial \log L}{\partial \mu} = A(\mu)\,(T - \mu)$ with $T = \bar{x}$ and $A(\mu) = n/\sigma^2$ . By the Cramer-Rao theory, an MVB estimator exists because the score factorises in this form, and the MVB estimator of $\mu$ is the sample mean $\bar{x}$ .

Minimum variance bound (Cramer-Rao lower bound):

I(\mu) = -E\!\left(\frac{\partial^{2}\log L}{\partial \mu^{2}}\right) = \frac{n}{\sigma^2}, \qquad \text{MVB} = \frac{1}{I(\mu)} = \frac{\sigma^{2}}{n}.

Thus the MVB estimator of $\mu$ is $\bar{x}$ with minimum variance $\dfrac{\sigma^{2}}{n}$ .

Answer 13

Interval estimation. Interval estimation specifies a random interval $(T_1, T_2)$ , computed from the sample, that contains the unknown parameter $\theta$ with a stated probability (confidence level) $1 - \alpha$ :

P(T_1 < \theta < T_2) = 1 - \alpha.

The interval $(T_1, T_2)$ is called a confidence interval and $1-\alpha$ the confidence coefficient.

Proof that $T^2$ is biased for $\theta^2$ . Since $T$ is unbiased for $\theta$ , $E(T) = \theta$ . By the definition of variance,

\operatorname{Var}(T) = E(T^2) - [E(T)]^2 \;\Rightarrow\; E(T^2) = \operatorname{Var}(T) + \theta^{2}.

Unless $\operatorname{Var}(T) = 0$ , we have

E(T^2) = \theta^{2} + \operatorname{Var}(T) \ne \theta^{2}.

Hence $T^2$ overestimates $\theta^2$ by an amount equal to $\operatorname{Var}(T) > 0$ , so $T^2$ is a biased (positively biased) estimator of $\theta^2$ .

Answer 14

Statement. To test a simple null hypothesis $H_0: \theta = \theta_0$ against a simple alternative $H_1: \theta = \theta_1$ , the most powerful (MP) critical region $w$ of size $\alpha$ is given by

\frac{L_1}{L_0} = \frac{L(x \mid \theta_1)}{L(x \mid \theta_0)} \ge k \;\text{inside } w, \qquad \frac{L_1}{L_0} < k \;\text{outside } w,

where $k > 0$ is chosen so that $P(x \in w \mid H_0) = \alpha$ .

Proof. Let $w$ be the region defined above and $w^{*}$ be any other region of size $\alpha$ , so $\int_w L_0\,dx = \int_{w^*} L_0\,dx = \alpha$ . The power of $w$ is $\int_w L_1\,dx$ . Consider

\int_w L_1\,dx - \int_{w^*} L_1\,dx = \int_{w \setminus w^*} L_1\,dx - \int_{w^* \setminus w} L_1\,dx.

Inside $w$ , $L_1 \ge k L_0$ ; outside $w$ , $L_1 < k L_0$ . Therefore

\int_{w\setminus w^*} L_1\,dx \ge k\!\int_{w\setminus w^*} L_0\,dx, \qquad \int_{w^*\setminus w} L_1\,dx \le k\!\int_{w^*\setminus w} L_0\,dx.

Hence the difference $\ge k\big(\int_{w\setminus w^*} L_0\,dx - \int_{w^*\setminus w} L_0\,dx\big) = k(\alpha - \alpha) = 0.$ Thus the power of $w$ is at least that of $w^*$ , proving $w$ is most powerful.

Applications:

Construction of most powerful tests for simple hypotheses.
Basis for likelihood ratio tests and uniformly most powerful (UMP) tests.
Used to derive optimal critical regions for normal, binomial, Poisson, etc.

Answer 15

Let the critical region be $w: x \ge 1$ and the density be $f(x, \theta) = \theta e^{-\theta x}$ , $x \ge 0$ .

Type I error (rejecting $H_0$ when $H_0: \theta = 2$ is true):

\alpha = P(x \ge 1 \mid \theta = 2) = \int_1^{\infty} 2 e^{-2x}\, dx = \big[-e^{-2x}\big]_1^{\infty} = e^{-2} \approx 0.1353.

Type II error (accepting $H_0$ when $H_1: \theta = 1$ is true), i.e. $x < 1$ under $\theta = 1$ :

\beta = P(x < 1 \mid \theta = 1) = \int_0^{1} e^{-x}\, dx = \big[-e^{-x}\big]_0^{1} = 1 - e^{-1} \approx 0.6321.

Thus the Type I error $\alpha = e^{-2} \approx 0.135$ and the Type II error $\beta = 1 - e^{-1} \approx 0.632$ .

Answer 16

Step 1 — Rank all observations (combined, smallest = rank 1):

Value	45	51	53	64	70	75	78	82	110
Rank	1	2	3	4	5	6	7	8	9
Group	T	U	U	T	U	T	T	T	U

( $T$ = trained, $U$ = untrained.)

Step 2 — Rank sums:

Trained ( $n_1 = 5$ ): ranks 1, 4, 6, 7, 8 → $R_1 = 26$ .
Untrained ( $n_2 = 4$ ): ranks 2, 3, 5, 9 → $R_2 = 19$ .

Step 3 — Compute U statistics:

U_1 = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1 = 20 + 15 - 26 = 9.

U_2 = n_1 n_2 + \frac{n_2(n_2+1)}{2} - R_2 = 20 + 10 - 19 = 11.

Check: $U_1 + U_2 = 20 = n_1 n_2$ . Take $U = \min(U_1, U_2) = 9$ .

Step 4 — Decision. For $n_1 = 5$ , $n_2 = 4$ at the 5% level (two-tailed), the critical value of $U$ is $1$ . Since the calculated $U = 9 > 1$ , we do not reject $H_0$ .

Conclusion: There is no significant difference between the average number of trials of trained and untrained rats.

Answer 17

If $p$ is the population proportion and $q = 1 - p$ , then for a random sample of size $n$ the sample proportion $\hat{p}$ has standard error

S.E.(\hat{p}) = \sqrt{\frac{pq}{n}} = \sqrt{\frac{p(1-p)}{n}}.

When $p$ is unknown it is estimated by $\hat{p}$ , giving $S.E.(\hat{p}) = \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}.$

Answer 18

One-tailed test: The critical (rejection) region lies entirely in one tail of the sampling distribution. It is used when the alternative hypothesis is directional, e.g. $H_1: \mu > \mu_0$ (right-tailed) or $H_1: \mu < \mu_0$ (left-tailed).

Two-tailed test: The critical region is split between both tails of the distribution. It is used when the alternative is non-directional, e.g. $H_1: \mu \ne \mu_0$ , so that significantly large or small values lead to rejection of $H_0$ .

Answer 19

A two-dimensional (bivariate) random variable assigns a pair of real numbers to each outcome of a random experiment.

Example: Select a student at random and record $(X, Y)$ where $X$ = the student's height and $Y$ = the student's weight. Each outcome gives an ordered pair $(x, y)$ , so $(X, Y)$ is a two-dimensional random variable. (Another example: tossing two dice and recording the pair of numbers $(X, Y)$ that appear.)

Answer 20

Given $\sigma = 2.4$ kg, $n = 10$ , $\bar{x} = 31.4$ kg. Since $\sigma$ is known, use the $z$ value $1.96$ for 95% confidence.

Standard error: $\dfrac{\sigma}{\sqrt{n}} = \dfrac{2.4}{\sqrt{10}} = \dfrac{2.4}{3.162} = 0.759$ kg.

95% confidence limits:

\bar{x} \pm 1.96 \cdot \frac{\sigma}{\sqrt{n}} = 31.4 \pm 1.96 \times 0.759 = 31.4 \pm 1.488.

Thus the limits are $(29.91, 32.89)$ kg approximately, i.e. the 95% confidence interval for the population mean is $29.91 \text{ kg} < \mu < 32.89 \text{ kg}$ .

Answer 21

Four main features of the $F$ -distribution curve:

It is a continuous distribution defined only for non-negative values ( $F \ge 0$ ).
It is positively skewed (skewed to the right), the skewness decreasing as the degrees of freedom increase.
Its shape depends on two parameters — the numerator and denominator degrees of freedom $(m, n)$ .
The total area under the curve is 1, and it is unimodal; as both degrees of freedom become large it approaches the normal curve.

Answer 22

For a hypergeometric distribution with population size $N$ , $M$ successes in the population, and sample size $n$ :

Mean: $E(X) = \dfrac{nM}{N} = np$ , where $p = M/N$ .

Variance: $\operatorname{Var}(X) = n\,\dfrac{M}{N}\left(1 - \dfrac{M}{N}\right)\dfrac{N - n}{N - 1} = npq\left(\dfrac{N - n}{N - 1}\right),$

where $q = 1 - p$ and $\dfrac{N-n}{N-1}$ is the finite population correction factor.

Answer 23

For the chi-square distribution with $n$ degrees of freedom, the central moments $\mu_r$ satisfy the recurrence relation

\mu_{r+1} = 2r\left(\mu_r + n\,\mu_{r-1}\right), \quad r \ge 1,

with $\mu_0 = 1$ and $\mu_1 = 0$ . Using this, $\mu_2 = 2n$ , $\mu_3 = 8n$ , and $\mu_4 = 48n + 12n^2$ . (The raw moments satisfy $\mu_r' = n(n+2)(n+4)\cdots(n+2r-2)$ .)

Answer 24

Cramer-Rao Inequality. If $T$ is an unbiased estimator of a parameter $\theta$ based on a random sample, then under regularity conditions the variance of $T$ cannot be smaller than the reciprocal of the Fisher information:

\operatorname{Var}(T) \ge \frac{1}{I(\theta)} = \frac{1}{E\!\left[\left(\dfrac{\partial \log L}{\partial \theta}\right)^{2}\right]} = \frac{1}{-E\!\left(\dfrac{\partial^{2} \log L}{\partial \theta^{2}}\right)}.

The right-hand side is the minimum variance bound (MVB). An estimator attaining this bound is the most efficient (MVB) estimator.

Answer 25

The four characteristics (properties) of a good estimator are:

Unbiasedness — $E(\hat{\theta}) = \theta$ ; on average the estimator equals the parameter.
Consistency — $\hat{\theta} \to \theta$ in probability as the sample size $n \to \infty$ .
Efficiency — it has the smallest variance among all unbiased estimators (minimum variance).
Sufficiency — it utilises all the information in the sample relevant to the parameter.

Answer 26

The Cauchy distribution does not possess a moment generating function, because the defining integral $\int_{-\infty}^{\infty} e^{tx} f(x)\,dx$ diverges for every $t \ne 0$ (its moments, including the mean, do not exist).

Instead, the Cauchy distribution is characterised by its characteristic function. For the standard Cauchy distribution with p.d.f. $f(x) = \dfrac{1}{\pi(1 + x^2)}$ ,

\phi(t) = E(e^{itX}) = e^{-|t|}.

For a general Cauchy with location $\mu$ and scale $\lambda$ : $\phi(t) = e^{i\mu t - \lambda |t|}.$

Group	Trials
Trained rats	78, 64, 75, 45, 82
Untrained rats	110, 70, 53, 51

BSc CSIT (TU) Science B.Sc. II Year Statistics (STA201) (Model) Question Paper 2075 Nepal

Group A

Group B

Group C

Frequently asked questions