Overview

Basic Conceptions

  • Frequency and Probability
  • Conditional Probability
  • Independence

Random Variable

  • Discrete Random Variable
  • Continuous Variable

Radom Variables

  • 2-dimensional Random Variables
  • Marginal Distribution
  • Conditional Distribution

Expectation and Variance

  • Expectation
  • Variance
  • Covariance and Correlation Coefficient

Law of Large Numbers and Central Limit Theorem

Sampling Distribution

Estimation Theory

Hypothesis Testing

Basic Conceptions

Frequency and Probability

Frequency

  1. Non-negativity: 0fn(A)10\leq f_n(A)\leq 1
  2. Full set: fn(S)=1f_n(S)=1
  3. Additivity: fn(Ai)=fn(Ai)f_n(\bigcup A_i)=\sum f_n(A_i), which AiA_i is pairwise disjoint

Probability

  1. Non-negativity: P(A)0P(A)\geq 0
  2. Normative: P(S)=1P(S)=1
  3. Additivity: P(Ai)=P(Ai)P(\bigcup A_i)=\sum P(A_i), which AiA_i is pairwise disjoint

Conditional Probability

Definition

P(BA)=P(AB)P(A)P(B|A)=\frac{P(AB)}{P(A)}

  1. Non-negativity
  2. Normative
  3. Additivity

Multiply Theorem

P(AB)=P(BA)P(A)P(AB)=P(B|A)P(A) for P(A)>0P(A)>0

Full Probability Formula

P(A)=P(ABi)P(Bi)P(A)=\sum P(A|B_i)P(B_i), which Bi\bigcup B_i is full set and BiB_i is pairwise disjoint.

Bayesian Formula

P(BA)=P(B)P(AB)P(A)P(B|A)=\frac{P(B)P(A|B)}{P(A)}

Posterior: P(B|A)

Likelihood: P(A|B)

Prior: P(B)

Evidence: P(A)

Independence

A, B are mutually independent if P(AB)=P(A)P(B)P(AB)=P(A)P(B).

Random Variable

Discrete Random Variable

0-1 Distribution

P{X=k}=pk(1p)k,k=0,1 (0<p<1)P\{X=k\}=p^k(1-p)^k,k=0,1\ (0\lt p\lt 1)

Bernoulli Experiment and Binomial Distribution

Bernoulli Experiment

  • Experiment E has only 2 possible outcomes: AA and Aˉ\bar A
  • P(A)=p,P(Aˉ)=1pP(A)=p,P(\bar A)=1-p

Binominal Distribution

  • n-time Bernoulli Experiment

    probability of incident A occur k times in n-time experiments is Cnkpk(1p)nkC_n^kp^k(1-p)^{n-k}

  • XB(n,p)X\sim B(n,p)

Poisson Distribution

P{X=k}=λkeλk!,k=0,1,2,P\{X=k\}=\frac{\lambda^ke^{-\lambda}}{k!},k=0,1,2,\cdots

Xπ(λ)X\sim\pi(\lambda)

Poisson Theorem

Limit of binominal distribution is Poisson distribution

limnCnkpnk(1p)nk=λkeλk!\lim_{n\rightarrow\infty}C_n^kp_n^k(1-p)^{n-k}=\frac{\lambda^ke^{-\lambda}}{k!}

Continuous Variable

Cumulative Distribution Function

F(x)=P{Xx},xRF(x)=P\{X\leq x\},x\in R

Distribution functions are non-decreasing functions.

F()=0,F(+)=1F(-\infty)=0,F(+\infty)=1

Probability Density Function

f(x)=dF(x)dx0f(x)=\frac{dF(x)}{dx}\geq 0

F(x)=xf(u)duF(x)=\int_{-\infty}^xf(u)du

P{x1<Xx2}=F(x2)F(x1)=x1x2f(x)dxP\{x_1\lt X\leq x_2\}=F(x_2)-F(x_1)=\int_{x_1}^{x_2}f(x)dx

Uniform Distribution

f(x)=1ba,a<x<bf(x)=\frac{1}{b-a},a\lt x\lt b

XU(a,b)X\sim U(a,b)

Exponential Distribution

f(x)=1θexθ,x>0f(x)=\frac{1}{\theta}e^{-\frac{x}{\theta}},x\gt 0

Normal Distribution

aka Gaussian Distribution

f(x)=12πσe(xμ)22σ2f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

XN(μ,σ2)X\sim N(\mu,\sigma^2)

Symmetry: x=μx=\mu

Maximum: f(μ)=12πσf(\mu)=\frac{1}{\sqrt{2\pi}\sigma}

Standard Normal Distribution: μ=0,σ=1\mu=0,\sigma=1

law of 3σ

Radom Variables

2-dimensional Random Variables

Cumulative Distribution Function: F(x,y)=P{(Xx)(Yy)}=P{Xx,Yy}F(x,y)=P\{(X\leq x)\bigcup(Y\leq y)\}=P\{X\leq x,Y\leq y\}

aka joint distribution.

For discrete variables (x,y) can be finite or infinite.

  • F(x,y)=xiyjpijF(x,y)=\sum_{x_i}\sum_{y_j}p_{ij}

For continuous variables (x,y)

  • F(x,y)=yxf(u,v)dudvF(x,y)=\int_{-\infty}^y\int_{-\infty}^xf(u,v)dudv
  • f(x,y)f(x,y) is joint probability density
    • f(x,y)0f(x,y)\geq 0
    • F(,)=f(u,v)dudv=1F(\infty,\infty)=\int_{-\infty}^\infty\int_{-\infty}^\infty f(u,v)dudv=1
    • if f(x,y) is continuous at point (x,y), F(x,y)xy=f(x,y)\frac{\partial F(x,y)}{\partial x\partial y}=f(x,y)

n-dimensional random variables

F(x1,x2,,xn)=P{X1x1,X2x2,,Xnxn}F(x_1,x_2,\cdots,x_n)=P\{X_1\leq x_1,X_2\leq x_2,\cdots,X_n\leq x_n\}

Marginal Distribution

For random continuous variable X,Y:

  • Marginal distribution function
    • FX(x)=F(x,)=x[f(u,v)dv]duF_X(x)=F(x,\infty)=\int_{-\infty}^x[\int_{-\infty}^\infty f(u,v)dv]du
    • FY(y)=F(,y)=y[f(u,v)du]dvF_Y(y)=F(\infty,y)=\int_{-\infty}^y[\int_{-\infty}^\infty f(u,v)du]dv
  • Marginal probability density
    • fX(x)=f(x,y)dyf_X(x)=\int_{-\infty}^\infty f(x,y)dy
    • fY(y)=f(x,y)dxf_Y(y)=\int_{-\infty}^\infty f(x,y)dx

Conditional Distribution

P{X=xi,Y=yj}=pijP\{X=x_i,Y=y_j\}=p_{ij}

marginal distrubution

P{X=xi}=pi=j=1pijP\{X=x_i\}=p_{i\cdot}=\sum_{j=1}^\infty p_{ij}

P{Y=yj}=pj=i=1pijP\{Y=y_j\}=p_{\cdot j}=\sum_{i=1}^\infty p_{ij}

P{X=xiY=yj}=P{X=xi,Y=yj}P{Y=yj}=pijpjP\{X=x_i|Y=y_j\}=\frac{P\{X=x_i,Y=y_j\}}{P\{Y=y_j\}}=\frac{p_{ij}}{p_{\cdot j}}

Conditional Probability Density

fXY(xy)=f(x,y)fY(y)f_{X|Y}(x|y)=\frac{f(x,y)}{f_Y(y)}

Expectation and Variance

Expectation

For discrete random variable, E(X)=k=1xkpkE(X)=\sum_{k=1}^\infty x_kp_k.

For continuous random variable, E(X)=xf(x)dxE(X)=\int_{-\infty}^\infty xf(x)dx

aka mathematical expectation or average

if Y=g(x), E(Y)=E[g(X)]=k=1g(xk)pkE(Y)=E[g(X)]=\sum_{k=1}^\infty g(x_k)p_k (discrete)

E(Y)=E[g(X)]=g(x)f(x)dxE(Y)=E[g(X)]=\int_{-\infty}^\infty g(x)f(x)dx (continuous)

Multivariable

Given Z=g(X,Y)Z=g(X,Y)

E(Z)=E[g(X,Y)]=g(x,y)f(x,y)dxdyE(Z)=E[g(X,Y)]=\iint g(x,y)f(x,y)dxdy (continuous)

E(Z)=E[g(X,Y)]=g(xi,yj)pijE(Z)=E[g(X,Y)]=\sum\sum g(x_i,y_j)p_{ij}

Properties of Expectation

Given constant C, random variable X,Y

  1. E(CX)=CE(X)E(CX)=CE(X)
  2. E(C)=CE(C)=C
  3. E(X+Y)=E(X)+E(Y)E(X+Y)=E(X)+E(Y)
  4. if X, Y are mutually independent, E(XY)=E(X)E(Y)E(XY)=E(X)E(Y)

Variance

Definition

D(X)=Var(X)=E{[XE(X)]2}D(X)=Var(X)=E\{[X-E(X)]^2\}

=(xμ)2f(x)dx=\int (x-\mu)^2f(x)dx (continuous)

standard variance σ(X)=D(X)\sigma(X)=\sqrt{D(X)}

standardized random variable X=XμσX^*=\frac{X-\mu}{\sigma}

Properties of Variance

Given constant C, random variable X, Y

  1. D(C)=0D(C)=0

  2. D(CX)=C2D(X)D(CX)=C^2D(X)

  3. D(X+Y)=D(X)+D(Y)+2E{(XE(X))(YE(Y))}D(X+Y)=D(X)+D(Y)+2E\{(X-E(X))(Y-E(Y))\}

    when X, Y are mutually independent, D(X+Y)=D(X)+D(Y)D(X+Y)=D(X)+D(Y)

  4. D(X)=0P{X=E(X)}=1D(X)=0\Leftrightarrow P\{X=E(X)\}=1

Chebyshev’s Inequality

Given random variable X, which satisfy E(X)=μE(X)=\mu,D(X)=σ2D(X)=\sigma^2

ϵ>0\forall\epsilon\gt 0, P{Xμϵ}σ2ϵ2P\{|X-\mu|\geq\epsilon\}\leq\frac{\sigma^2}{\epsilon^2}

Covariance and Correlation Coefficient

Covariance

  • Cov(X,Y)=E{[XE(X)][YE(Y)]}Cov(X,Y)=E\{[X-E(X)][Y-E(Y)]\}

    =E(XY)E(X)E(Y)=E(XY)-E(X)E(Y)

  • Cov(X,Y)=Cov(Y,X)Cov(X,Y)=Cov(Y,X)

  • Cov(X,X)=D(X)Cov(X,X)=D(X)

Correlation coefficient

  • ρXY=Cov(X,Y)D(X)D(Y)\rho_{XY}=\frac{Cov(X,Y)}{\sqrt{D(X)D(Y)}}

  • ρXY1|\rho_{XY}|\leq 1

  • the more ρXY|\rho_{XY}| close to 1, the more linear correlated between X and Y,

    the more ρXY|\rho_{XY}| close to 0, the less linear correlated between X and Y

Law of Large Numbers and Central Limit Theorem

LLN (Law of Large Numbers)

Weak Law

Xˉnμ\bar{X}_n\rightarrow\mu, when nn\rightarrow\infty

aka limnPr(Xˉnμ<ϵ)=1\lim_{n\rightarrow\infty}\Pr(|\bar{X}_n-\mu|\lt\epsilon)=1

Strong Law

Pr(limnXˉn=μ)=1\Pr(\lim_{n\rightarrow\infty}\bar X_n=\mu)=1

CLT (Central Limit Theorem)

Independent and Identically Distributed

Given a set of same-distributed random variables X1,X2,,XnX_1,X_2,\cdots,X_n which are mutually independent.

Standardized random variables of Xk\sum X_k: Yn=XkE(Xk)D(Xk)=XknμnσY_n=\frac{\sum X_k-E(\sum X_k)}{\sqrt{D(\sum X_k)}}=\frac{\sum X_k-n\mu}{\sqrt n\sigma}

Distribution function of YnY_n: limnFn(x)=Φ\lim_{n\rightarrow\infty}F_n(x)=\Phi(x)(x), which means YnY_n approximately obeys standard normal distribution when n is very large.

Lyapunov’s Theorem

Given a set of random variables X1,X2,,XnX_1,X_2,\cdots,X_n which are mutually independent.

E(Xk)=μkE(X_k)=\mu_k, D(Xk)=σk2>0D(X_k)=\sigma_k^2\gt 0

Let Bn2=σk2B_n^2=\sum\sigma_k^2

if δ>0\exist\delta\gt 0, 1Bn2+δkE{Xkμk2+δ}0\frac{1}{B_n^{2+\delta}}\sum_kE\{|X_k-\mu_k|^{2+\delta}\}\rightarrow 0,

then standardized random variable for Xk\sum X_k approximately obeys N(0,1)N(0,1).

Xk=BnZn+kμk\sum X_k=B_nZ_n+\sum_k\mu_k approximately obeys N(kμk,Bn2)N(\sum_k\mu_k,B_n^2).

De Moivre-Laplace Theorem

Given ηn\eta_n obey B(n,p)B(n,p),

x\forall x, limn{ηnnpnp(1p)x}=Φ(x)\lim_{n\rightarrow\infty}\{\frac{\eta_n-np}{\sqrt{np(1-p)}}\leq x\}=\Phi(x)

aka limit of binominal distribution is normal distribution.

Sampling Distribution

sample mean Xˉ=1niXi\bar X=\frac{1}{n}\sum_iX_i

sample variance S2=1n1i(XiXˉi)=1n1(iXi2nXˉ2)S^2=\frac{1}{n-1}\sum_i(X_i-\bar X_i)=\frac{1}{n-1}(\sum_iX_i^2-n\bar X^2)

Empirical Distribution Function

Let S(x)S(x) be number of samples less than xx in X1,X2,,XnX_1,X_2,\cdots,X_n.

Empirical distribution function Fn(x)=1nS(x)F_n(x)=\frac{1}{n}S(x)

Chi-squared Distribution

χ2\chi^2 distribution: given X1,X2,,XnN(0,1)X_1,X_2,\cdots,X_n\sim N(0,1) (independent).

Let Q=iXi2Q=\sum_iX_i^2, then Q is distributed according to the χ2\chi^2 distribution with n degrees of freedom. Qχ2(n)Q\sim\chi^2(n).

Properties of Chi-squared Distribution

  • Additive χ12χ2(n1)\chi_1^2\sim\chi^2(n_1), χ22χ2(n2)\chi_2^2\sim\chi^2(n_2). then

    χ12+χ22χ2(n1+n2)\chi_1^2+\chi_2^2\sim\chi^2(n_1+n_2)

  • E(χ2)=nE(\chi^2)=n, D(χ2)=2nD(\chi^2)=2n

t Distribution

XN(0,1)X\sim N(0,1), Yχ2(n)Y\sim\chi^2(n), they are mutually independent.

Let z=XY/nz=\frac{X}{\sqrt{Y/n}}, then z is distributed according to the t distribution with n degrees of freedom. zt(n)z\sim t(n)

aka Student distribution

F Distribution

Uχ2(n1)U\sim\chi^2(n_1), Vχ2(n2)V\sim\chi^2(n_2), they are mutually independent.

Let F=U/n1V/n2F=\frac{U/n_1}{V/n_2}, then F is distributed according to the F distribution with (n1,n2) degrees of freedom. FF(n1,n2)F\sim F(n_1,n_2)

Distribution of Sample Mean and Sample Variance

for any distribution with existent mean and variance:

E(Xˉ)=μE(\bar X)=\mu, D(Xˉ)=σ2/nD(\bar X)=\sigma^2/n

Some Theorem

Samples X1,X2,,XnX_1,X_2,\cdots,X_n come from N(μ,σ2)N(\mu,\sigma^2),

  1. XˉN(μ,σ2/n)\bar X\sim N(\mu,\sigma^2/n)
  2. (n1)S2σ2χ2(n1)\frac{(n-1)S^2}{\sigma^2}\sim\chi^2(n-1)
  3. Xˉ\bar X and S2S^2 are mutually independent
  4. Xˉμσ/nt(n1)\frac{\bar X-\mu}{\sigma/\sqrt n}\sim t(n-1)

Samples X1,X2,,Xn1X_1,X_2,\cdots,X_{n_1} and Y1,Y2,,YnY_1,Y_2,\cdots,Y_n come from N(μ1,σ12)N(\mu_1,\sigma_1^2) and N(μ2,σ22)N(\mu_2,\sigma_2^2) respectively.

S12=i(XiXˉ)2n11S_1^2=\frac{\sum_i(X_i-\bar X)^2}{n_1-1}, S22=i(YiYˉ)2n21S_2^2=\frac{\sum_i(Y_i-\bar Y)^2}{n_2-1}

  1. S12/S22σ12σ22F(n11,n21)\frac{S_1^2/S_2^2}{\sigma_1^2\sigma_2^2}\sim F(n_1-1,n_2-1)

  2. when σ12=σ22=σ2\sigma_1^2=\sigma_2^2=\sigma^2,

    (XˉYˉ)(μ1μ2)Sw1n1+1n2t(n1+n22)\frac{(\bar X-\bar Y)-(\mu_1-\mu_2)}{S_w\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\sim t(n_1+n_2-2)

    Sw2=(n11)S12+(n21)S22n1+n22S_w^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}

Estimation Theory

Point Estimation

X1,X2,,XnX_1,X_2,\cdots,X_n is one sample of X

x1,x2,,xnx_1,x_2,\cdots,x_n is a corresponding sample valule

Estimator θ^(X1,X2,,Xn)\hat\theta(X_1,X_2,\cdots,X_n)

Estimate value θ^(x1,x2,,xn)\hat\theta(x_1,x_2,\cdots,x_n)

Method of Moments 矩估计

Random variable X.

probability density function: f(x;θ1,,θk)f(x;\theta_1,\cdots,\theta_k), θ1,,θk\theta_1,\cdots,\theta_k are waiting for estimation, X1,,XnX_1,\cdots,X_n are samples from X.

Suppose the first k moments of distribution of X:

μl=E(Xl)=xlf(x;θ1,,θk)dx\mu_l=E(X_l)=\int x^lf(x;\theta_1,\cdots,\theta_k)dx (continuous)

μl=E(Xl)=xlp(x;θ1,,θk)dx\mu_l=E(X_l)=\sum x^lp(x;\theta_1,\cdots,\theta_k)dx (discrete)

l=1,2,,kl=1,2,\cdots,k

Sample moment Al=1niXilA_l=\frac{1}{n}\sum_iX_i^l convergence by probability to μl\mu_l

Let sample moments be estimator of moments of true distribution

Maximum Likelihood Estimation

Random variable X, P{X=x}=p(x;θ)P\{X=x\}=p(x;\theta)

θΘ\theta\in\Theta is parameter waiting for estimation.

Suppose X1,,XnX_1,\cdots,X_n are samples from X, whose joint distribution is Πip(xi;θ)\Pi_ip(x_i;\theta)

Let x1,,xnx_1,\cdots,x_n are a set of value of X1,,XnX_1,\cdots,X_n.

Likelihood function of sample X1,,XnX_1,\cdots,X_n is:

L(θ)=Pr{X1=x1,,Xn=xn}=L(x1,,xn;θ)=Πip(xi;θ)L(\theta)=\Pr\{X_1=x_1,\cdots,X_n=x_n\}=L(x_1,\cdots,x_n;\theta)=\Pi_ip(x_i;\theta)

Maximum likelihood estimation value θ^(x1,,xn)\hat\theta(x_1,\cdots,x_n) satisfies:

L(x1,,xn;θ^)=maxθL(x1,,xn;θ)L(x_1,\cdots,x_n;\hat\theta)=\max_\theta L(x_1,\cdots,x_n;\theta)

For continuous variable,

let dL(θ)dθ=0\frac{dL(\theta)}{d\theta}=0 or dlnL(θ)dθ=0\frac{d\ln L(\theta)}{d\theta}=0 (log-likelihood)

For multi-parameters θ1,θ2,,θn\theta_1,\theta_2,\cdots,\theta_n:

use likelihood equations: Lθi=0,i=1,2,,k\frac{\partial L}{\partial\theta_i}=0,i=1,2,\cdots,k

or log-likelihood equations: lnLθi=0,i=1,2,,k\frac{\partial\ln L}{\partial\theta_i}=0,i=1,2,\cdots,k

Criteria for Estimator

Bias of an Estimator

Estimator θ^(X1,,Xn)\hat\theta(X_1,\cdots,X_n) has expectation E(θ^)E(\hat\theta).

if θΘ,E(θ^)=θ\forall\theta\in\Theta,E(\hat\theta)=\theta, then θ^\hat\theta is an unbiased estimator of θ\theta

z.B. S2=(XiXˉ)2n1S^2=\frac{\sum(X_i-\bar X)^2}{n-1} is an unbiased estimator of σ2\sigma^2 whereas (XiXˉ)2n\frac{\sum(X_i-\bar X)^2}{n} is biased.

Effectiveness

the less variance is, the better estimator would be.

Consistent Estimator

θΘ,ϵ>0,limnP{θ^θ<ϵ}=1\forall\theta\in\Theta,\forall\epsilon\gt 0,\lim_{n\rightarrow\infty}P\{|\hat\theta-\theta|\lt\epsilon\}=1

Confidence Interval (CI)

Distribution of X is F(x;θ),θΘF(x;\theta),\theta\in\Theta.

Given a value α(0,1)\alpha\in(0,1), X1,,XnX_1,\cdots,X_n are samples come from X.

Find 2 value θ\underline\theta and θ\overline\theta:

  • P{θ(X1,,Xn)<θ<θ(X1,,Xn)}1αP\{\underline\theta(X_1,\cdots,X_n)\lt\theta\lt\overline\theta(X_1,\cdots,X_n)\}\geq 1-\alpha

Confidence interval: (θ,θ)(\underline\theta,\overline\theta)

Confidence level: 1α1-\alpha

Confidence lower and upper bound: θ\underline\theta and θ\overline\theta

Confidence Interval for Normal Distribution

Single Normal Distribution

Confidence interval with confidence level 1α1-\alpha for μ\mu:

Xμσ/nN(0,1)\frac{\overline X-\mu}{\sigma/\sqrt n}\sim N(0,1)

P{Xμσ/n<zα/2}=1αP\{|\frac{\overline X-\mu}{\sigma/\sqrt n}|\lt z_{\alpha/2}\}=1-\alpha

(Xσnzα/2,X+σnzα/2)(\overline X-\frac{\sigma}{\sqrt n}z_{\alpha/2},\overline X+\frac{\sigma}{\sqrt n}z_{\alpha/2})

If σ\sigma is unknown, replace it with SS, then:

XμS/nt(n1)\frac{\overline X-\mu}{S/\sqrt n}\sim t(n-1)

Confidence interval with confidence level 1α1-\alpha for μ\mu is:

(XSntα/2(n1),X+Sntα/2(n1))(\overline X-\frac{S}{\sqrt n}t_{\alpha/2}(n-1),\overline X+\frac{S}{\sqrt n}t_{\alpha/2}(n-1))

Double Normal Distributions

XN(μ1,σ12)X\sim N(\mu_1,\sigma_1^2) and YN(μ2,σ22)Y\sim N(\mu_2,\sigma_2^2)

  1. if σ1,σ2\sigma_1,\sigma_2 are known, XN(μ1,σ12/n1),YN(μ2,σ22/n2)\overline X\sim N(\mu_1,\sigma_1^2/n_1),\overline Y\sim N(\mu_2,\sigma_2^2/n_2)

    then XYN(μ1μ2,σ12/n1+σ22/n2)\overline X-\overline Y\sim N(\mu_1-\mu_2,\sigma_1^2/n_1+\sigma_2^2/n_2) or

    (XY)(μ1μ2)σ12/n1+σ22/n2N(0,1)\frac{(\overline X-\overline Y)-(\mu_1-\mu_2)}{\sqrt{\sigma_1^2/n_1+\sigma_2^2/n_2}}\sim N(0,1)

    Confidence interval with confidence level 1α1-\alpha for μ1μ2\mu_1-\mu_2:

    (XYzα/2σ12n1+σ22n2,XY+zα/2σ12n1+σ22n2)(\overline X-\overline Y-z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}},\overline X-\overline Y+z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}})

  2. if σ1=σ2=σ\sigma_1=\sigma_2=\sigma but are unknown,

    (XY)(μ1μ2)Sw1n1+1n2t(n1+n22)\frac{(\overline X-\overline Y)-(\mu_1-\mu_2)}{S_w\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\sim t(n_1+n_2-2)

    Sw2=(n11)S12+(n21)S22n1+n22S_w^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}

    Confidence interval with confidence level 1α1-\alpha for μ1μ2\mu_1-\mu_2:

    (XYtα/2(n1+n22)Sw1n1+1n2,XY+tα/2(n1+n22)Sw1n1+1n2)(\overline X-\overline Y-t_{\alpha/2}(n_1+n_2-2)S_w\sqrt{\frac{1}{n_1}+\frac{1}{n_2}},\overline X-\overline Y+t_{\alpha/2}(n_1+n_2-2)S_w\sqrt{\frac{1}{n_1}+\frac{1}{n_2}})

  3. if μ1,μ2\mu_1,\mu_2 are unknown,

    S12/S22σ12/σ22F(n11,n21)\frac{S_1^2/S_2^2}{\sigma_1^2/\sigma_2^2}\sim F(n_1-1,n_2-1)

    Confidence interval with confidence level 1α1-\alpha for σ12/σ22\sigma_1^2/\sigma_2^2:

    (S12S221Fα/2(n11,n21),S12S221F1α/2(n11,n21))(\frac{S_1^2}{S_2^2}\frac{1}{F_{\alpha/2}(n_1-1,n_2-1)},\frac{S_1^2}{S_2^2}\frac{1}{F_{1-\alpha/2}(n_1-1,n_2-1)})

Confidence Interval for 0-1 Distribution

XB(n,p)X\sim B(n,p), X1,X2,,XnX_1,X_2,\cdots,X_n are samples of X.

μ=p,σ2=p(1p)\mu=p,\sigma^2=p(1-p)

Xinpnp(1p)=nXnpnp(1p)N(0,1)\frac{\sum X_i-np}{\sqrt{np(1-p)}}=\frac{n\overline X-np}{\sqrt{np(1-p)}}\sim N(0,1)

Confidence interval with confidence level 1α1-\alpha for pp

P{nXnpnp(1p)<zα/2}P\{|\frac{n\overline X-np}{\sqrt{np(1-p)}}|\lt z_{\alpha/2}\}

(n+zα/22)p2(2nX+zα/22)p+nX2<0\rightarrow (n+z_{\alpha/2}^2)p^2-(2n\overline X+z_{\alpha/2}^2)p+n\overline X^2\lt 0

(12a(bb24ac),12a(b+b24ac))\rightarrow (\frac{1}{2a}(-b-\sqrt{b^2-4ac}),\frac{1}{2a}(-b+\sqrt{b^2-4ac}))

a=n+zα/22,b=(2nX+zα/22),c=nX2a=n+z_{\alpha/2}^2,b=-(2n\overline X+z_{\alpha/2}^2),c=n\overline X^2

Hypothesis Testing

我觉得 APS 面谈 25 分钟不会谈到这来……