This semester, I will do a project again with Iannis.
I took some time to review the tutorial Fairness in AI/ML via Social Choice by Evi Micha & Nisarg Shah.
Social Choice Theory: Aggregating individual preferences into "good" collective decisions
Fairness in Social Choice
很熟悉了,可以快速掠过。
这里考虑的是 DIVISIBLE ITEMS.
EF, PROP
Fair division: Agents \(N\), divisible resources \(M\), utility function \(u_i\).
Non-negative utilities (good), non-positive utilities (chores).
Proportionality, envy-freeness.
The Core
A core is a subset of agents, among which they can redistribute their entitlements.
But redistributed entitlements shouldn't be a Pareto improvement.
2 types of the core
- Resource-scaling version
No group of agents \(S\) should be able to find any allocation \(B\) of their proportionally entitled share of the resources that is Pareto improvement. $$ \not\exists S\subseteq N,\frac{|S|}{|N|}\cdot M\to B:u_i(B_i)\gt u_i(A_i)\forall i\in S $$
- Utility-scaling version
No group of agents \(S\) should be able to find any allocation \(B\) of the resources that is a Pareto improvement even after proportional utility-scaling. $$ \not\exists S\subseteq N,M\to B:\frac{|S|}{|N|}\cdot u_i(B_i)\gt u_i(A_i)\forall i\in S $$
- Two versions are equivalent for some setting.
- They are different when utilities are not linear additive.
- If there is no scalable resource, there is no way to define resource-scaling version. But it is more appealing.
Nash Social Welfare
NSW is geometric mean of agent utilities. $$ NSW(A)=\left(\prod_{i\in N}u_i(A_i)\right)^{1/|N|} $$ There are some other welfare functions:
- Utilitarian social welfare: \(USW(A)=\frac{1}{|N|}\cdot\sum_{i\in N}u_i(A_i)\)
为什么要起这么怪的名字,叫 average social welfare 不好嘛!
- Egalitarian social welfare: \(ESW(A)=\min_{i\in N}u_i(A_i)\)
明明可以叫 least social welfare 的……
Theorem [Varian '74]: Any allocation maximizing NSW is envy-free in the core.
Theorem [Orlin '10]: An allocation maximizing the NSW can be computed in strongly polynomial time.
Application of Core: Committee Selection
There are set of voters \(N\), set of candidates \(M\).
Each agent \(i\) approves a subset of candidates \(A_i\subseteq M\).
-
The utility for each voter is defined as the number of intersection of committee member and its approval. $$ u_i(W)=|W\cap A_i| $$
-
The goal is to find \(W\subseteq M\) with \(|W|\le k\) with given \(k\).
\(W\) is in the core if there is no \(S\subseteq N\) and \(T\subseteq M\) with \(|T|\le\frac{|S|}{|N|}\cdot k\) (Here the candidates are resources, so its resource-scaling version) s.t. \(u_i(T)\gt u_i(W)\forall i\in S\).
Advantages
It's broadly defined
- It often depends only on the definition of AGENTS and PREFERENCES.
- Applicable to any setting as long as you define these two pieces of information.
Notions such as the core achieve group fairness to all possible groups.
- No need to pre-specify the groups.
- scale automatically with group size and cohesiveness, without additional parameters.
Envy-Freeness in ML
Classification
The model
- Population of individuals given by a distribution \(D\) over \(X\).
individual \(i\) represent data point \(x_i\in X\).
- Classifier \(f:X\to Y\) maps every individual to a classification outcome
Types of classification:
- Hard binary
- Hard multi-class
- Soft binary
- Soft multi-class
Objective: minimize loss function
Utility function should be somehow related to loss function.
How to measure fairness in ML?
Individual Fairness
Similar individuals should be treated similarly
Classifier \(f\) is individual fair if $$ \forall x,y\in N,D(f(x),f(y))\le d(x,y) $$
Envy-Freeness
Equal individuals shouldn't envy each other.
Classifier \(f\) is envy-free if $$ \forall x,y\in N,u_x(f(x))\ge u_x(f(y)) $$ Envy-freeness is too strong for deterministic classifiers
- loss of optimal deterministic EF classifier \(\ge1\).
- Loss of optimal randomized EF classifier \(\le1/\gamma\).
一直没看懂这一部分在讲个啥
Preference-Informed IF
Similar individuals shouldn't envy each other too much.
Classifier is PIIF if $$ \forall x,y\in N,\exists z\in Y,D(z,f(y))\le d(x,y)\land u_x(f(x))\ge u_x(z). $$
Approximate EF
Equal individuals shouldn't envy each other too much.
Classifier \(f\) is approximately EF if \(\forall x,y\in N,u_x(f(x))\ge u_x(f(y))-\epsilon\).
OR
Classifier \(f\) is approximately EF if \(\forall x,y\in N,u_x(f(x))\ge u_x(f(y))-d(x,y)\)
Average Group Envy-Freeness
Equal groups shouldn't envy-each other too much on average.
Classifier \(f\) is approximately average group EF w.r.t. groups \(G_1,G_2\) if $$ \mathbb E_{x\sim G_1,y\sim G_2}[u_x(f(y))-u_x(f(x))]\le 0 $$ Applicable for decision-making with limited resources
- e.g. deciding on loan or bail applications
- envy can not be prevented
Can be imposed for several pairs of groups simultaneously
Recommendations
Model
- Individuals represented by data points in set \(X\)
- A set items \(Y\)
- A set of contexts \(C\)
Recommendation policy \(\pi\)
- \(\pi_x(y|c)\) is probability of recommending item \(y\) to user \(x\) given a context \(c\).
Utility function: \(u_x(\pi_x)=\mathbb E_{c\sim C_m,y\sim\pi_x(\cdot|c)}[v_x(y|c)]\).
Envy-freeness: \(\forall x,x'\in X,u_x(\pi_x)\ge u_x(\pi_{x'})-\epsilon\)
Two-Sided Fairness
Info
为什么是 two-sided 呢?因为可以给用户定义效用函数,也可以给要推荐的商品定义效用函数。
Many-to-many matching
- each user is recommended \(k\) products
- each product may be recommended to a different number of users
Relevance of products to users given by \(V:X\times Y\to\mathbb R\)
Recommendation policy \(\pi\)
- Each user \(x\) is recommended \(\pi_x\subseteq Y\) with \(|\pi_x|=k\)
- Let \(Y_x^*\) be the top-k products for user \(x\) by relevance
Utilities
- Utility to user \(x\) given by \(u_x(\pi_x)=\frac{\sum_{y\in\pi_x}V(x,y)}{\sum_{y\in\pi_x^*}V(x,y)}\)
- Utility to product \(y\) given by \(E_y(\pi)\), the number of users \(y\) is exposed to
Two-sided fairness
-
Fairness for users: EF1 $$ \forall x,x'\in X,\exists y\in\pi_{x'}:u_x(\pi_x)\ge u_x(\pi_{x'}\diagdown{y}) $$
-
Fairness for products: minimum exposure \(\overline E\) $$ \forall y\in Y,E_y(\pi)\ge\overline E $$
Theorem: There exists an efficient algorithm that achieves EF1 among all users and the minimum exposure guarantee among at least \(m-k\) products.
Non-Centroid Clustering
Goal: Partition the agents into \(k\) clusters, i.e., \(C=(C_1,\cdots,C_k)\)
Model
- Set of agents \(N\) partitioned into \(C=(C_1,\cdots,C_k)\).
- Cluster containing agent \(i\) denoted by \(C(i)\).
- Distance metric \(d:N\times N\to\mathbb R_{\ge 0}\).
\(\alpha\)-EF: For each \(i\in N\) and \(j\in[k]\) with \(i\not\in C_j\): either \(C(i)=i\) or $$ \frac{1}{|C(i)|-1}\sum_{i'\in C(i)}d(i,i')\le\frac{\alpha}{|C_j|}\sum_{i'\in C_j}d(i,i') $$ Theorem:
A 1-envy-free clustering does not always exist, but an \(O(1)\)-envy-free clustering always exist and can be computed efficiently.
Another Setting
Model
- Set of agents \(N\) partitioned into \(C=(C_1,\cdots,C_k)\).
- Cluster containing agent \(i\) denoted by \(C(i)\).
- Binary costs \(d:N\times N\to\{0,1\}\)
Theorem: There exists a balanced clustering (satisfying \(|C(i)|=|C(j)|\pm1\forall i,j\)) s.t. for all \(i\in N\) and \(j\in[k]\), \(\sum_{i'\in C(i)}d(i,i')\le\sum_{i'\in C_j}d(i,i')+\tilde O(\sqrt{n/k})\).
- If we divide by the size of the clusters, then the additive term becomes \(o(1)\).
Nash Social Welfare in ML
Multi-Armed Bandits
多臂赌博机,遗憾最小化算法
\(k\) arms, \(\mu_1,\mu_2,\cdots,\mu_k\), \(\mu^*=\arg\max_{j\in[k]}\mu_j\).
Exploration vs Exploitation
Regret \(R_T=T_{\mu^*}-\sum_{t=1}^T\mu(t)\)
Multi-Agent Multi-Armed Bandits
[Hossain, M, S, 2021]
Agents \(N\)
Each arm \(j\in[k]\) for each agent \(i\in N\) have different data distribution \(\mu_{ij}\).
For each agent \(\mu_i^*=\arg\max_{j\in[k]}\mu_{ij}\).
What is a fair policy?
Distribution \(p=[p_1,\cdots,p_k]\) gives expected reward \(\sum_{j=1}^kp_j\cdot\mu_{ij}\) to agent \(i\)
Maximizing welfare functions
- Utilitarian welfare \(\sum_{i=1}^n\sum_{j=1}^kp_j\cdot\mu_{ij}\)
- Egalitarian welfare \(\min_{i\in N}\sum_{j=1}^kp_j\cdot\mu_{ij}\)
- Nash welfare \(\prod_{i=1}^n\sum_{j=1}^kp_j\cdot\mu_{ij}\)
Regret: \(R_T=NSW(p^*,\mu)-\sum_{t=1}^TNSW(p(t),\mu)\)
Theorem: A variation of UCB achieves the optimal \(\Theta(\sqrt T)\) regret.
Fair Exploration
Agent \(t\) arrives at time \(t\)
We chose distribution \(P_t\), which gives the agent utility \(E_{j\sim P_t}[\mu_j]\)
Regret: \(R_T=\mu^*-(\prod_{t=1}^TE_{j\sim P_t}[\mu_j])^{1/T}\)
Theorem: A variation of UCB achieves near-optimal regret in terms of \(T\).
[Baek, Farias, 2021]
Agent \(t\) arrives at time \(t\) and belongs to group \(g_t\), where \(g_t=g\) with probability \(p_g\)
Policy: choose arm \(a_t\) at time \(t\)
Regret for group \(g:u_T^g(\pi)=\sum_{t\in[T]:g_t=g}(\mu^*-\mu_{a_t})\)
Utility to group \(g:u^g(\pi)=R_T^g(\pi^0)-R_T^g(\pi)\), where \(\pi^0\) is the policy minimizing the overall regret
Nash social welfare: \(NSW(\pi)=\prod_gu^g(\pi)\)
Theorem: A version of UCB exactly optimizes NSW.
Core in ML
Federated Learning
[Chaudhury, Li, Kang, Li, Mehta, 2022]
FL: Choose \(f_\theta:\mathbb R^d\to\mathbb R\) from \(F=\{f_\theta:\theta\in P\subseteq\mathbb R^d\}\)
Utility of each agent: \(u_i(\theta)=M-\mathbb E_{(x,y)\sim D_i}[\ell(f_\theta(x),y)]\)
Goal: Choose \(\theta\) that is fair for all agents
Core: A parameter vector \(\theta\in P\) is in the core if for all \(\theta'\in P\) and \(S\subseteq N\), it holds $$ u_i(\theta)\ge\frac{|S|}{|N|}u_i(\theta')\forall i\in S, $$ with at least one strict inequality.
Pareto Optimality
Proportionality
Theorem: When the agents' utilities are continuous and the set of maximizer of any conical combination of the agents' utilities is convex, a parameter vector \(\theta\in P\) in the core always exists.
Theorem: When the agents' utilities are concave, then the parameter vector \(\theta\in P\) that maximizes the NEW is in the core:
$$
\text{maximize }\prod_{i\in N}u_i(\theta)\text{ subject to }\theta\in P\
\text{maximize }\sum_{i\in N}\log(u_i(\theta))\text{ subject to }\theta\in P
$$
Clustering
In ML
Goal: Analyze datasets to summarize their characteristics
Objects in the same group are similar
In Economics
Goal: Allocate a set of facilities that serve a set of agents. (e.g. hospital)
Centroid Clustering
Input
- Set \(N\) of \(n\) data points
- Set \(M\) of \(m\) feasible cluster centers
- \(\forall i,j\in N\cup M\): we have \(d(i,j)\) which forms a Metric Space
- \(d(i,i)=0\forall i\in N\cup M\)
- \(d(i,j)=d(j,i)\forall i,j\in N\cup M\)
- \(d(i,j)\le d(i,k)+d(k,j)\forall i,j,k\in N\cup M\)
Output
- A set \(C\subseteq M\) of \(k\) centers
- Each data points is assigned to its closest cluster center \(C(i)=\arg\min_{c\in C}d(i,c)\)
Some Objective Functions
\(k\)-median: minimizes the sum of the distances
\(k\)-means: minimizes the sum of the square of the distances
\(k\)-center: minimizes the maximum distance
Fairness in Clustering
Fair clustering through proportional entitlement
- Every group of \(n/k\) agents deserves its own cluster center
Definition in Committee Selection: \(W\) is the core if
-
For all \(S\subseteq N\) and \(T\subseteq M\)
-
If \(|S|\ge|T|\cdot n/k\), then \(|A_i\cap W|\ge|A_i\cap T|\) for some \(i\in S\)
"If a group can afford \(T\), then \(T\) should not be a strict Pareto improvement for the group"
Definition in Clustering: \(C\) is in the core if
-
For all \(S\subseteq N\) and \(y\subseteq M\)
-
If \(|S|\ge n/k\), then \(d(i,C(i))\le d(i,y)\) for some \(i\in S\)
"If a group can afford a center \(y\), then \(y\) should not be a strict Pareto improvement for the group"
Core in Centroid Clustering
A clustering solution in the core does not always exist
Approximation core
\(\alpha\)-core: A solution \(C\) is in the \(\alpha\)-core with \(\alpha\ge1\) if there is no group of points \(S\subseteq N\) with \(|S|\ge n/k\) and \(y\in M\) s.t.
$$
\forall i\in S,\alpha\cdot d(i,y)\lt d(i,C(i))
$$
Theorem. There exists an algorithm Greedy Capture that returns a clustering solution in the \((1+\sqrt2)\)-core under any metric space. For arbitrary metric spaces and \(\alpha\lt 2\), a clustering solution in the \(\alpha\)-core is not guaranteed to exist.
Theorem.
- For \(L_2\), Greedy Capture returns a solution in the \(2\)-core
- For \(L_1\) and \(L_\infty\), GC does not always return a \((\lt1+\sqrt2)\)-core
- For \(L_2\) and \(\alpha<1.154\), a clustering solution in the \(\alpha\)-core is not guaranteed to exist
- For \(L_1\) and \(L_\infty\), and \(\alpha\lt1.4\), a clustering solution in the \(\alpha\)-core is not guaranteed to exist
Core in Non-Centroid Clustering
Non-Centroid Clustering
- Partition the individuals into \(k\) clusters
- loss for each cluster: for \(S\subseteq N\) and \(i\in S,\ell_i(S)\ge0\).
\(\alpha\)-core: A solution \(C\) is in the \(\alpha\)-core with \(\alpha\ge1\), if there is no group of points \(S\subseteq N\) with \(|S|\ge n/k\) s.t. $$ \forall i\in S,\ell_i(S)\lt\ell_i(C(i)) $$ Average loss: for each \(S\subseteq N,\ell_i(S)=\frac{1}{|S|}\sum_{i'\in S}d(i,i')\)
Theorem
- Greedy Capture returns a clustering solution in the \((n/k)\)-core under any metric space
- For arbitrary metric space and \(\alpha\lt1.366\), a clustering solution in the \(\alpha\)-core is not guaranteed to exist
Open question: Does a clustering solution in the \(O(1)\)-core always exist?