Overview
Basic Conceptions
- Frequency and Probability
- Conditional Probability
- Independence
Random Variable
- Discrete Random Variable
- Continuous Variable
Radom Variables
- 2-dimensional Random Variables
- Marginal Distribution
- Conditional Distribution
Expectation and Variance
- Expectation
- Variance
- Covariance and Correlation Coefficient
Law of Large Numbers and Central Limit Theorem
Sampling Distribution
Estimation Theory
Hypothesis Testing
Basic Conceptions
Frequency and Probability
Frequency
- Non-negativity: 0≤fn(A)≤1
- Full set: fn(S)=1
- Additivity: fn(⋃Ai)=∑fn(Ai), which Ai is pairwise disjoint
Probability
- Non-negativity: P(A)≥0
- Normative: P(S)=1
- Additivity: P(⋃Ai)=∑P(Ai), which Ai is pairwise disjoint
Conditional Probability
Definition
P(B∣A)=P(A)P(AB)
- Non-negativity
- Normative
- Additivity
Multiply Theorem
P(AB)=P(B∣A)P(A) for P(A)>0
P(A)=∑P(A∣Bi)P(Bi), which ⋃Bi is full set and Bi is pairwise disjoint.
P(B∣A)=P(A)P(B)P(A∣B)
Posterior: P(B|A)
Likelihood: P(A|B)
Prior: P(B)
Evidence: P(A)
Independence
A, B are mutually independent if P(AB)=P(A)P(B).
Random Variable
Discrete Random Variable
0-1 Distribution
P{X=k}=pk(1−p)k,k=0,1 (0<p<1)
Bernoulli Experiment and Binomial Distribution
Bernoulli Experiment
- Experiment E has only 2 possible outcomes: A and Aˉ
- P(A)=p,P(Aˉ)=1−p
Binominal Distribution
n-time Bernoulli Experiment
probability of incident A occur k times in n-time experiments is Cnkpk(1−p)n−k
X∼B(n,p)
Poisson Distribution
P{X=k}=k!λke−λ,k=0,1,2,⋯
X∼π(λ)
Poisson Theorem
Limit of binominal distribution is Poisson distribution
limn→∞Cnkpnk(1−p)n−k=k!λke−λ
Continuous Variable
Cumulative Distribution Function
F(x)=P{X≤x},x∈R
Distribution functions are non-decreasing functions.
F(−∞)=0,F(+∞)=1
Probability Density Function
f(x)=dxdF(x)≥0
F(x)=∫−∞xf(u)du
P{x1<X≤x2}=F(x2)−F(x1)=∫x1x2f(x)dx
f(x)=b−a1,a<x<b
X∼U(a,b)
Exponential Distribution
f(x)=θ1e−θx,x>0
Normal Distribution
aka Gaussian Distribution
f(x)=2πσ1e−2σ2(x−μ)2
X∼N(μ,σ2)
Symmetry: x=μ
Maximum: f(μ)=2πσ1
Standard Normal Distribution: μ=0,σ=1
law of 3σ
Radom Variables
2-dimensional Random Variables
Cumulative Distribution Function: F(x,y)=P{(X≤x)⋃(Y≤y)}=P{X≤x,Y≤y}
aka joint distribution.
For discrete variables (x,y) can be finite or infinite.
- F(x,y)=∑xi∑yjpij
For continuous variables (x,y)
- F(x,y)=∫−∞y∫−∞xf(u,v)dudv
- f(x,y) is joint probability density
- f(x,y)≥0
- F(∞,∞)=∫−∞∞∫−∞∞f(u,v)dudv=1
- if f(x,y) is continuous at point (x,y), ∂x∂y∂F(x,y)=f(x,y)
n-dimensional random variables
F(x1,x2,⋯,xn)=P{X1≤x1,X2≤x2,⋯,Xn≤xn}
Marginal Distribution
For random continuous variable X,Y:
- Marginal distribution function
- FX(x)=F(x,∞)=∫−∞x[∫−∞∞f(u,v)dv]du
- FY(y)=F(∞,y)=∫−∞y[∫−∞∞f(u,v)du]dv
- Marginal probability density
- fX(x)=∫−∞∞f(x,y)dy
- fY(y)=∫−∞∞f(x,y)dx
Conditional Distribution
P{X=xi,Y=yj}=pij
marginal distrubution
P{X=xi}=pi⋅=∑j=1∞pij
P{Y=yj}=p⋅j=∑i=1∞pij
P{X=xi∣Y=yj}=P{Y=yj}P{X=xi,Y=yj}=p⋅jpij
Conditional Probability Density
fX∣Y(x∣y)=fY(y)f(x,y)
Expectation and Variance
Expectation
For discrete random variable, E(X)=∑k=1∞xkpk.
For continuous random variable, E(X)=∫−∞∞xf(x)dx
aka mathematical expectation or average
if Y=g(x), E(Y)=E[g(X)]=∑k=1∞g(xk)pk (discrete)
E(Y)=E[g(X)]=∫−∞∞g(x)f(x)dx (continuous)
Multivariable
Given Z=g(X,Y)
E(Z)=E[g(X,Y)]=∬g(x,y)f(x,y)dxdy (continuous)
E(Z)=E[g(X,Y)]=∑∑g(xi,yj)pij
Properties of Expectation
Given constant C, random variable X,Y
- E(CX)=CE(X)
- E(C)=C
- E(X+Y)=E(X)+E(Y)
- if X, Y are mutually independent, E(XY)=E(X)E(Y)
Variance
Definition
D(X)=Var(X)=E{[X−E(X)]2}
=∫(x−μ)2f(x)dx (continuous)
standard variance σ(X)=D(X)
standardized random variable X∗=σX−μ
Properties of Variance
Given constant C, random variable X, Y
D(C)=0
D(CX)=C2D(X)
D(X+Y)=D(X)+D(Y)+2E{(X−E(X))(Y−E(Y))}
when X, Y are mutually independent, D(X+Y)=D(X)+D(Y)
D(X)=0⇔P{X=E(X)}=1
Chebyshev’s Inequality
Given random variable X, which satisfy E(X)=μ,D(X)=σ2
∀ϵ>0, P{∣X−μ∣≥ϵ}≤ϵ2σ2
Covariance and Correlation Coefficient
Covariance
Cov(X,Y)=E{[X−E(X)][Y−E(Y)]}
=E(XY)−E(X)E(Y)
Cov(X,Y)=Cov(Y,X)
Cov(X,X)=D(X)
Correlation coefficient
ρXY=D(X)D(Y)Cov(X,Y)
∣ρXY∣≤1
the more ∣ρXY∣ close to 1, the more linear correlated between X and Y,
the more ∣ρXY∣ close to 0, the less linear correlated between X and Y
Law of Large Numbers and Central Limit Theorem
LLN (Law of Large Numbers)
Weak Law
Xˉn→μ, when n→∞
aka limn→∞Pr(∣Xˉn−μ∣<ϵ)=1
Strong Law
Pr(limn→∞Xˉn=μ)=1
CLT (Central Limit Theorem)
Independent and Identically Distributed
Given a set of same-distributed random variables X1,X2,⋯,Xn which are mutually independent.
Standardized random variables of ∑Xk: Yn=D(∑Xk)∑Xk−E(∑Xk)=nσ∑Xk−nμ
Distribution function of Yn: limn→∞Fn(x)=Φ(x), which means Yn approximately obeys standard normal distribution when n is very large.
Lyapunov’s Theorem
Given a set of random variables X1,X2,⋯,Xn which are mutually independent.
E(Xk)=μk, D(Xk)=σk2>0
Let Bn2=∑σk2
if ∃δ>0, Bn2+δ1∑kE{∣Xk−μk∣2+δ}→0,
then standardized random variable for ∑Xk approximately obeys N(0,1).
∑Xk=BnZn+∑kμk approximately obeys N(∑kμk,Bn2).
De Moivre-Laplace Theorem
Given ηn obey B(n,p),
∀x, limn→∞{np(1−p)ηn−np≤x}=Φ(x)
aka limit of binominal distribution is normal distribution.
Sampling Distribution
sample mean Xˉ=n1∑iXi
sample variance S2=n−11∑i(Xi−Xˉi)=n−11(∑iXi2−nXˉ2)
Empirical Distribution Function
Let S(x) be number of samples less than x in X1,X2,⋯,Xn.
Empirical distribution function Fn(x)=n1S(x)
Chi-squared Distribution
χ2 distribution: given X1,X2,⋯,Xn∼N(0,1) (independent).
Let Q=∑iXi2, then Q is distributed according to the χ2 distribution with n degrees of freedom. Q∼χ2(n).
Properties of Chi-squared Distribution
Additive χ12∼χ2(n1), χ22∼χ2(n2). then
χ12+χ22∼χ2(n1+n2)
E(χ2)=n, D(χ2)=2n
t Distribution
X∼N(0,1), Y∼χ2(n), they are mutually independent.
Let z=Y/nX, then z is distributed according to the t distribution with n degrees of freedom. z∼t(n)
aka Student distribution
F Distribution
U∼χ2(n1), V∼χ2(n2), they are mutually independent.
Let F=V/n2U/n1, then F is distributed according to the F distribution with (n1,n2) degrees of freedom. F∼F(n1,n2)
Distribution of Sample Mean and Sample Variance
for any distribution with existent mean and variance:
E(Xˉ)=μ, D(Xˉ)=σ2/n
Some Theorem
Samples X1,X2,⋯,Xn come from N(μ,σ2),
- Xˉ∼N(μ,σ2/n)
- σ2(n−1)S2∼χ2(n−1)
- Xˉ and S2 are mutually independent
- σ/nXˉ−μ∼t(n−1)
Samples X1,X2,⋯,Xn1 and Y1,Y2,⋯,Yn come from N(μ1,σ12) and N(μ2,σ22) respectively.
S12=n1−1∑i(Xi−Xˉ)2, S22=n2−1∑i(Yi−Yˉ)2
σ12σ22S12/S22∼F(n1−1,n2−1)
when σ12=σ22=σ2,
Swn11+n21(Xˉ−Yˉ)−(μ1−μ2)∼t(n1+n2−2)
Sw2=n1+n2−2(n1−1)S12+(n2−1)S22
Estimation Theory
Point Estimation
X1,X2,⋯,Xn is one sample of X
x1,x2,⋯,xn is a corresponding sample valule
Estimator θ^(X1,X2,⋯,Xn)
Estimate value θ^(x1,x2,⋯,xn)
Method of Moments 矩估计
Random variable X.
probability density function: f(x;θ1,⋯,θk), θ1,⋯,θk are waiting for estimation, X1,⋯,Xn are samples from X.
Suppose the first k moments of distribution of X:
μl=E(Xl)=∫xlf(x;θ1,⋯,θk)dx (continuous)
μl=E(Xl)=∑xlp(x;θ1,⋯,θk)dx (discrete)
l=1,2,⋯,k
Sample moment Al=n1∑iXil convergence by probability to μl
Let sample moments be estimator of moments of true distribution
Maximum Likelihood Estimation
Random variable X, P{X=x}=p(x;θ)
θ∈Θ is parameter waiting for estimation.
Suppose X1,⋯,Xn are samples from X, whose joint distribution is Πip(xi;θ)
Let x1,⋯,xn are a set of value of X1,⋯,Xn.
Likelihood function of sample X1,⋯,Xn is:
L(θ)=Pr{X1=x1,⋯,Xn=xn}=L(x1,⋯,xn;θ)=Πip(xi;θ)
Maximum likelihood estimation value θ^(x1,⋯,xn) satisfies:
L(x1,⋯,xn;θ^)=maxθL(x1,⋯,xn;θ)
For continuous variable,
let dθdL(θ)=0 or dθdlnL(θ)=0 (log-likelihood)
For multi-parameters θ1,θ2,⋯,θn:
use likelihood equations: ∂θi∂L=0,i=1,2,⋯,k
or log-likelihood equations: ∂θi∂lnL=0,i=1,2,⋯,k
Criteria for Estimator
Bias of an Estimator
Estimator θ^(X1,⋯,Xn) has expectation E(θ^).
if ∀θ∈Θ,E(θ^)=θ, then θ^ is an unbiased estimator of θ
z.B. S2=n−1∑(Xi−Xˉ)2 is an unbiased estimator of σ2 whereas n∑(Xi−Xˉ)2 is biased.
Effectiveness
the less variance is, the better estimator would be.
Consistent Estimator
∀θ∈Θ,∀ϵ>0,limn→∞P{∣θ^−θ∣<ϵ}=1
Confidence Interval (CI)
Distribution of X is F(x;θ),θ∈Θ.
Given a value α∈(0,1), X1,⋯,Xn are samples come from X.
Find 2 value θ and θ:
- P{θ(X1,⋯,Xn)<θ<θ(X1,⋯,Xn)}≥1−α
Confidence interval: (θ,θ)
Confidence level: 1−α
Confidence lower and upper bound: θ and θ
Confidence Interval for Normal Distribution
Single Normal Distribution
Confidence interval with confidence level 1−α for μ:
σ/nX−μ∼N(0,1)
P{∣σ/nX−μ∣<zα/2}=1−α
(X−nσzα/2,X+nσzα/2)
If σ is unknown, replace it with S, then:
S/nX−μ∼t(n−1)
Confidence interval with confidence level 1−α for μ is:
(X−nStα/2(n−1),X+nStα/2(n−1))
Double Normal Distributions
X∼N(μ1,σ12) and Y∼N(μ2,σ22)
if σ1,σ2 are known, X∼N(μ1,σ12/n1),Y∼N(μ2,σ22/n2)
then X−Y∼N(μ1−μ2,σ12/n1+σ22/n2) or
σ12/n1+σ22/n2(X−Y)−(μ1−μ2)∼N(0,1)
Confidence interval with confidence level 1−α for μ1−μ2:
(X−Y−zα/2n1σ12+n2σ22,X−Y+zα/2n1σ12+n2σ22)
if σ1=σ2=σ but are unknown,
Swn11+n21(X−Y)−(μ1−μ2)∼t(n1+n2−2)
Sw2=n1+n2−2(n1−1)S12+(n2−1)S22
Confidence interval with confidence level 1−α for μ1−μ2:
(X−Y−tα/2(n1+n2−2)Swn11+n21,X−Y+tα/2(n1+n2−2)Swn11+n21)
if μ1,μ2 are unknown,
σ12/σ22S12/S22∼F(n1−1,n2−1)
Confidence interval with confidence level 1−α for σ12/σ22:
(S22S12Fα/2(n1−1,n2−1)1,S22S12F1−α/2(n1−1,n2−1)1)
Confidence Interval for 0-1 Distribution
X∼B(n,p), X1,X2,⋯,Xn are samples of X.
μ=p,σ2=p(1−p)
np(1−p)∑Xi−np=np(1−p)nX−np∼N(0,1)
Confidence interval with confidence level 1−α for p
P{∣np(1−p)nX−np∣<zα/2}
→(n+zα/22)p2−(2nX+zα/22)p+nX2<0
→(2a1(−b−b2−4ac),2a1(−b+b2−4ac))
a=n+zα/22,b=−(2nX+zα/22),c=nX2
Hypothesis Testing
我觉得 APS 面谈 25 分钟不会谈到这来……