I will begin a new project work 24 Fall: FAIRNESS IN AI/ML THROUGH THE LENS OF SOCIAL CHOICE

The basic outline is following Tutorial: Fairness in AI/ML via Social Choice by Evi Micha & Nisarg Shah in EC 2024.

The first paper I have read in last semester, Fairness in Federated Learning via Core-Stability, which was one of the possible topics.

Brief notes is here.

⚠️We found that the basis of this paper may make no sense!!!

I will try to understand some mathematical proof in this semester.

Core-Stability

Revisit the definition of Core-Stability: A predictor θP\theta\in P is called core stable if there exist no other θP\theta'\in P and no subset SS of agents, s.t. Snui(θ)ui(θ)\frac{|S|}{n}u_i(\theta')\ge u_i(\theta) for all iSi\in S, with at least one strict inequality.

⚠️⚠️⚠️What is WRONG with the claim

Setting S=[n]S=[n] in definition, there is no predictor θP\theta'\in P where i[n]ui(θ)ui(θ)>n\sum_{i\in[n]}\frac{u_i(\theta')}{u_i(\theta)}\gt n.

The problem is, according to the definition, when we plug in S=[n]S=[n], there exist no θP\theta'\in P s.t.

ui(θ)ui(θ)iSui(θ)ui(θ)1iSu_i(\theta')\ge u_i(\theta)\forall i\in S\\ \Lrarr\frac{u_i(\theta')}{u_i(\theta)}\ge 1\forall i\in S

with at least one strict inequality.

But can it be directly sum up? iui(θ)ui(θ)n\sum_i\frac{u_i(\theta')}{u_i(\theta)}\ge n???

A counter example is here:

Assume θ\theta is core-stable and n=3n=3,

u1(θ)=10,u2(θ)=11,u3(θ)=12u_1(\theta)=10,u_2(\theta)=11,u_3(\theta)=12

According to the definition, there exist no θP\theta'\in P (We plug in S=[3]S=[3]) s.t.

u1(θ)10,u2(θ)11,u3(θ)12u_1(\theta')\ge10,u_2(\theta')\ge11,u_3(\theta')\ge12

with at least one strict inequality.

But according to definition, there may exist θP\theta'\in P s.t.

u1(θ)=9.9<10,u2(θ)=1000011,u3(θ)=1000012,u_1(\theta')=9.9\lt10,u_2(\theta')=10000\ge11,u_3(\theta')=10000\ge12,

summing them up, we have

u1(θ)u1(θ)+u2(θ)u2(θ)+u3(θ)u3(θ)>3=n\frac{u_1(\theta')}{u_1(\theta)}+\frac{u_2(\theta')}{u_2(\theta)}+\frac{u_3(\theta')}{u_3(\theta)}\gt3=n

Scaled factor S/n|S|/n suggests some kind of proportionality here. Basically core-stability means there is no big benefit for agent to form a coalition.

Core means “no deviating subgroup”

Core stability is very strong, only exists in special cases.

Advantage

More fair than FedAvg. MM is a very large integer.

Agent 1Agent 2Agent 3
θ1\theta_1aaaaMaM\cdot a
θ2\theta_210a10a10a10a0.9Ma0.9M\cdot a

In FedAvg, θ2\theta_2 is preferred, but not fair to Agent 1 and 2. (They have incentive to collide.)

Robustness to poor local data quality of agents.

❗Scaling the utility of any single agents does not change the core-stable allocation.

Core-Stability in FL

CoreFed exists in Linear Regression and Logistic Regression (Classification)

Approximate core stable predictor exists in Deep Neural Network.

Existence

🤔Core-Stability exist in FL if:

  1. Utility function is continuous.

  2. The conical combination of utility functions is convex.

    α1,,αnR0n\forall\langle\alpha_1,\cdots,\alpha_n\rangle\in\mathbb R_{\ge0}^n, the set C={θi[n]αiui(θ)}C=\{\theta|\sum_{i\in[n]}\alpha_iu_i(\theta)\} is convex.

🤔Important theorem: Kakutani’s Fixed Point Theorem: A set valued function ϕ:D2D\phi:D\to2^D admits a fixed point, i.e. dD\exist d\in D, s.t. dϕ(d)d\in\phi(d), if

  1. DD is non-empty, compact, and convex

Compact: The set is closed (It must contains all its boundary points), and bounded (It can be covered by a “sphere” with certain radius)

Convex: A set is convex if, for any two points xx and yy in the set, the line segment connecting xx and yy also lies entirely within the set. Mathematically, for any x,ySx,y\in S and any t[0,1]t \in [0, 1], the point: tx+(1t)yStx+(1−t)y\in S.

  1. dD,ϕ(d)\forall d\in D,\phi(d) is non-empty, compact, and convex.
  2. ϕ()\phi(\cdot) has a closed graph. i.e., \forall sequence (di)iN(d_i)_{i\in\mathbb N} converging to dd^* and (ei)iN(e_i)_{i\in\mathbb N} converging to ee^*, s.t. diDd_i\in D and eiϕ(di)e_i\in\phi(d_i), we have eϕ(d)e^*\in\phi(d^*).

Define Core-Stable Feasible Predictors

θP\forall\theta\in P, ϕ(θ)={argmaxdPi[n]ui(d)ui(θ)}\phi(\theta)=\{\arg\max_{d\in P}\sum_{i\in[n]}\frac{u_i(d)}{u_i(\theta)}\}.

Lemma 4.1: Let θP\theta\in P be s.t. θϕ(θ)\theta\in\phi(\theta) (It is a fixed point). Then θ\theta is a core-stable predictor.

Proof by contradiction.

By Takutani’s Theorem, it suffices to show that ϕ()\phi(\cdot) has a closed graph, to ensure that it admits a fixed point.

Lemma 4.2: ϕ()\phi(\cdot) has a closed graph.

Hard proof.

🤔Theorem 1. In FL, where utilities are continuous and any conical combination of utilities is convex, a core-stable predictor exists.

⚠️If we change dPd\in P to dB(θ,r)d\in\mathcal B(\theta, r), if the two condition still holds within a radius of rr of every point. The conditions tend to be true in DNN of small rr. Then the predictor is locally core-stable.

Computation of Core-Stable Predictor

There is efficient way to compute core-stable predictor.

It is optima of convex program:

maximize L(θ)=i[n]log(ui(θ)) or i[n]ui(θ)subject to θP\text{maximize }\mathcal L(\theta)=\sum_{i\in[n]}\log(u_i(\theta))\text{ or }\prod_{i\in[n]}u_i(\theta)\\ \text{subject to }\theta\in P

The object reminds me of Nash Social Welfare😎

Utility of each agent is concave, thus above program is convex. Thus this program is concave maximization subject to convex constraint.

🤔Theorem 2. If uiu_i is concave for all ii, then any predictor θ\theta^* maximizes the convex program is core stable.

Proof by contradiction

i[n]ui(θ)ui(θ)n.\sum_{i\in[n]}\frac{u_i(\theta')}{u_i(\theta^*)}\le n.

If θ\theta^* is not core-stable, there must exist θ\theta' and subset SS, s.t.

ui(θ)nSui(θ)u_i(\theta')\ge\frac{n}{|S|}u_i(\theta^*)

for all ii with at least one strict inequality.

Thus for any θθ\theta\neq\theta',

i[n]ui(θ)ui(θ)iSui(θ)ui(θ)>n/SS=n,\sum_{i\in[n]}\frac{u_i(\theta)}{u_i(\theta')}\ge\sum_{i\in S}\frac{u_i(\theta)}{u_i(\theta')}\gt n/|S|\cdot|S|=n,

admitting contradiction.

θ\theta^* must be Pareto Optimal.

CoreFed

nn finite samples {(xi,yi)}i[n]\{(x_i,y_i)\}_{i\in[n]} drawn from distribution P\mathcal P.

Empirical distribution Pn\mathcal P_n.

Stochastic Gradient Descent.

To be continued~