This semester, I will do a project again with Iannis.

I took some time to review the tutorial Fairness in AI/ML via Social Choice by Evi Micha & Nisarg Shah.

Social Choice Theory: Aggregating individual preferences into “good” collective decisions

Fairness in Social Choice

很熟悉了,可以快速掠过。

⚠️这里考虑的是 DIVISIBLE ITEMS.

EF, PROP

Fair division: Agents NN, divisible resources MM, utility function uiu_i.

Non-negative utilities (good), non-positive utilities (chores).

Proportionality, envy-freeness.

The Core

A core is a subset of agents, among which they can redistribute their entitlements.

But redistributed entitlements shouldn’t be a Pareto improvement.

2 types of the core

  1. Resource-scaling version

    No group of agents SS should be able to find any allocation BB of their proportionally entitled share of the resources that is Pareto improvement.

    ∄SN,SNMB:ui(Bi)>ui(Ai)iS\not\exist S\subseteq N,\frac{|S|}{|N|}\cdot M\to B:u_i(B_i)\gt u_i(A_i)\forall i\in S

  2. Utility-scaling version

    No group of agents SS should be able to find any allocation BB of the resources that is a Pareto improvement even after proportional utility-scaling.

    ∄SN,MB:SNui(Bi)>ui(Ai)iS\not\exist S\subseteq N,M\to B:\frac{|S|}{|N|}\cdot u_i(B_i)\gt u_i(A_i)\forall i\in S

  • Two versions are equivalent for some setting.
  • They are different when utilities are not linear additive.
  • If there is no scalable resource, there is no way to define resource-scaling version. But it is more appealing.

Nash Social Welfare

NSW is geometric mean of agent utilities.

NSW(A)=(iNui(Ai))1/NNSW(A)=\left(\prod_{i\in N}u_i(A_i)\right)^{1/|N|}

There are some other welfare functions:

  • Utilitarian social welfare: USW(A)=1NiNui(Ai)USW(A)=\frac{1}{|N|}\cdot\sum_{i\in N}u_i(A_i)

    为什么要起这么怪的名字,叫 average social welfare 不好嘛!

  • Egalitarian social welfare: ESW(A)=miniNui(Ai)ESW(A)=\min_{i\in N}u_i(A_i)

    明明可以叫 least social welfare 的……

🤔Theorem [Varian '74]: Any allocation maximizing NSW is envy-free in the core.

🤔Theorem [Orlin '10]: An allocation maximizing the NSW can be computed in strongly polynomial time.

Application of Core: Committee Selection

There are set of voters NN, set of candidates MM.

Each agent ii approves a subset of candidates AiMA_i\subseteq M.

  • The utility for each voter is defined as the number of intersection of committee member and its approval.

    ui(W)=WAiu_i(W)=|W\cap A_i|

  • The goal is to find WMW\subseteq M with Wk|W|\le k with given kk.

WW is in the core if there is no SNS\subseteq N and TMT\subseteq M with TSNk|T|\le\frac{|S|}{|N|}\cdot k (Here the candidates are resources, so its resource-scaling version) s.t. ui(T)>ui(W)iSu_i(T)\gt u_i(W)\forall i\in S.

Advantages

It’s broadly defined

  • It often depends only on the definition of AGENTS and PREFERENCES.
  • Applicable to any setting as long as you define these two pieces of information.

Notions such as the core achieve group fairness to all possible groups.

  • No need to pre-specify the groups.
  • scale automatically with group size and cohesiveness, without additional parameters.

Envy-Freeness in ML

Classification

The model

  • Population of individuals given by a distribution DD over XX.

    individual ii represent data point xiXx_i\in X.

  • Classifier f:XYf:X\to Y maps every individual to a classification outcome

Types of classification:

  • Hard binary
  • Hard multi-class
  • Soft binary
  • Soft multi-class

Objective: minimize loss function

Utility function should be somehow related to loss function.

How to measure fairness in ML?

Individual Fairness

[Dwork, Hardt, Pitassi, Reingold, Zemel, 2012]

Similar individuals should be treated similarly

Classifier ff is individual fair if

x,yN,D(f(x),f(y))d(x,y)\forall x,y\in N,D(f(x),f(y))\le d(x,y)

Envy-Freeness

[Balcan, Dick, Noothigattu, Procaccia, 2019]

Equal individuals shouldn’t envy each other.

Classifier ff is envy-free if

x,yN,ux(f(x))ux(f(y))\forall x,y\in N,u_x(f(x))\ge u_x(f(y))

Envy-freeness is too strong for deterministic classifiers

  • loss of optimal deterministic EF classifier 1\ge1.
  • Loss of optimal randomized EF classifier 1/γ\le1/\gamma.

一直没看懂这一部分在讲个啥

Preference-Informed IF

[Kim, Korolova, Rothblum, Yona, 2019]

Similar individuals shouldn’t envy each other too much.

Classifier is PIIF if

\forall x,y\in N,\exist z\in Y,D(z,f(y))\le d(x,y)\and u_x(f(x))\ge u_x(z).

Approximate EF

Equal individuals shouldn’t envy each other too much.

Classifier ff is approximately EF if x,yN,ux(f(x))ux(f(y))ϵ\forall x,y\in N,u_x(f(x))\ge u_x(f(y))-\epsilon.

OR

Classifier ff is approximately EF if x,yN,ux(f(x))ux(f(y))d(x,y)\forall x,y\in N,u_x(f(x))\ge u_x(f(y))-d(x,y)

Average Group Envy-Freeness

Equal groups shouldn’t envy-each other too much on average.

Classifier ff is approximately average group EF w.r.t. groups G1,G2G_1,G_2 if

ExG1,yG2[ux(f(y))ux(f(x))]0\mathbb E_{x\sim G_1,y\sim G_2}[u_x(f(y))-u_x(f(x))]\le 0

Applicable for decision-making with limited resources

  • e.g. deciding on loan or bail applications
  • envy can not be prevented

Can be imposed for several pairs of groups simultaneously

Recommendations

[Do, Corbett-Davies, Atif, Usunier, 2023]

Model

  • Individuals represented by data points in set XX
  • A set items YY
  • A set of contexts CC

Recommendation policy π\pi

  • πx(yc)\pi_x(y|c) is probability of recommending item yy to user xx given a context cc.

Utility function: ux(πx)=EcCm,yπx(c)[vx(yc)]u_x(\pi_x)=\mathbb E_{c\sim C_m,y\sim\pi_x(\cdot|c)}[v_x(y|c)].

Envy-freeness: x,xX,ux(πx)ux(πx)ϵ\forall x,x'\in X,u_x(\pi_x)\ge u_x(\pi_{x'})-\epsilon

Two-Sided Fairness

为什么是 two-sided 呢?因为可以给用户定义效用函数,也可以给要推荐的商品定义效用函数。

[Biswas, Patro, Ganguly, Gummadi, Chakraborty, 2023]

Many-to-many matching

  • each user is recommended kk products
  • each product may be recommended to a different number of users

Relevance of products to users given by V:X×YRV:X\times Y\to\mathbb R

Recommendation policy π\pi

  • Each user xx is recommended πxY\pi_x\subseteq Y with πx=k|\pi_x|=k
  • Let YxY_x^* be the top-k products for user xx by relevance

Utilities

  • Utility to user xx given by ux(πx)=yπxV(x,y)yπxV(x,y)u_x(\pi_x)=\frac{\sum_{y\in\pi_x}V(x,y)}{\sum_{y\in\pi_x^*}V(x,y)}
  • Utility to product yy given by Ey(π)E_y(\pi), the number of users yy is exposed to

Two-sided fairness

  • Fairness for users: EF1

    x,xX,yπx:ux(πx)ux(πx{y})\forall x,x'\in X,\exist y\in\pi_{x'}:u_x(\pi_x)\ge u_x(\pi_{x'}\diagdown\{y\})

  • Fairness for products: minimum exposure E\overline E

    yY,Ey(π)E\forall y\in Y,E_y(\pi)\ge\overline E

Theorem: There exists an efficient algorithm that achieves EF1 among all users and the minimum exposure guarantee among at least mkm-k products.

Non-Centroid Clustering

[Ahmadi, Awasthi, Khuller, Kleindessner, Morgenstern, Sukprasert, Vakilian, 2022]

[Aamand, Chen, Liu, Silwal, Sukprasert, Vakilian, Zhang, 2023]

Goal: Partition the agents into kk clusters, i.e., C=(C1,,Ck)C=(C_1,\cdots,C_k)

Model

  • Set of agents NN partitioned into C=(C1,,Ck)C=(C_1,\cdots,C_k).
  • Cluster containing agent ii denoted by C(i)C(i).
  • Distance metric d:N×NR0d:N\times N\to\mathbb R_{\ge 0}.

α\alpha-EF: For each iNi\in N and j[k]j\in[k] with i∉Cji\not\in C_j: either C(i)=iC(i)=i or

1C(i)1iC(i)d(i,i)αCjiCjd(i,i)\frac{1}{|C(i)|-1}\sum_{i'\in C(i)}d(i,i')\le\frac{\alpha}{|C_j|}\sum_{i'\in C_j}d(i,i')

Theorem:

A 1-envy-free clustering does not always exist, but an O(1)O(1)-envy-free clustering always exist and can be computed efficiently.

Another Setting

[Li, M, Nikolov, S, 2023]

Model

  • Set of agents NN partitioned into C=(C1,,Ck)C=(C_1,\cdots,C_k).
  • Cluster containing agent ii denoted by C(i)C(i).
  • Binary costs d:N×N{0,1}d:N\times N\to\{0,1\}

Theorem: There exists a balanced clustering (satisfying C(i)=C(j)±1i,j|C(i)|=|C(j)|\pm1\forall i,j) s.t. for all iNi\in N and j[k]j\in[k], iC(i)d(i,i)iCjd(i,i)+O~(n/k)\sum_{i'\in C(i)}d(i,i')\le\sum_{i'\in C_j}d(i,i')+\tilde O(\sqrt{n/k}).

  • If we divide by the size of the clusters, then the additive term becomes o(1)o(1).

Nash Social Welfare in ML

Multi-Armed Bandits

多臂赌博机,遗憾最小化算法

kk arms, μ1,μ2,,μk\mu_1,\mu_2,\cdots,\mu_k, μ=argmaxj[k]μj\mu^*=\arg\max_{j\in[k]}\mu_j.

Exploration vs Exploitation

Regret RT=Tμt=1Tμ(t)R_T=T_{\mu^*}-\sum_{t=1}^T\mu(t)

Multi-Agent Multi-Armed Bandits

[Hossain, M, S, 2021]

Agents NN

Each arm j[k]j\in[k] for each agent iNi\in N have different data distribution μij\mu_{ij}.

For each agent μi=argmaxj[k]μij\mu_i^*=\arg\max_{j\in[k]}\mu_{ij}.

❓What is a fair policy?

Distribution p=[p1,,pk]p=[p_1,\cdots,p_k] gives expected reward j=1kpjμij\sum_{j=1}^kp_j\cdot\mu_{ij} to agent ii

Maximizing welfare functions

  1. Utilitarian welfare i=1nj=1kpjμij\sum_{i=1}^n\sum_{j=1}^kp_j\cdot\mu_{ij}
  2. Egalitarian welfare miniNj=1kpjμij\min_{i\in N}\sum_{j=1}^kp_j\cdot\mu_{ij}
  3. Nash welfare i=1nj=1kpjμij\prod_{i=1}^n\sum_{j=1}^kp_j\cdot\mu_{ij}

Regret: RT=NSW(p,μ)t=1TNSW(p(t),μ)R_T=NSW(p^*,\mu)-\sum_{t=1}^TNSW(p(t),\mu)

🤔Theorem: A variation of UCB achieves the optimal Θ(T)\Theta(\sqrt T) regret.

Fair Exploration

[Barman, Khan, Maiti, Sawarni, 2023]

Agent tt arrives at time tt

We chose distribution PtP_t, which gives the agent utility EjPt[μj]E_{j\sim P_t}[\mu_j]

Regret: RT=μ(t=1TEjPt[μj])1/TR_T=\mu^*-(\prod_{t=1}^TE_{j\sim P_t}[\mu_j])^{1/T}

Theorem: A variation of UCB achieves near-optimal regret in terms of TT.

[Baek, Farias, 2021]

Agent tt arrives at time tt and belongs to group gtg_t, where gt=gg_t=g with probability pgp_g

Policy: choose arm ata_t at time tt

Regret for group g:uTg(π)=t[T]:gt=g(μμat)g:u_T^g(\pi)=\sum_{t\in[T]:g_t=g}(\mu^*-\mu_{a_t})

Utility to group g:ug(π)=RTg(π0)RTg(π)g:u^g(\pi)=R_T^g(\pi^0)-R_T^g(\pi), where π0\pi^0 is the policy minimizing the overall regret

Nash social welfare: NSW(π)=gug(π)NSW(\pi)=\prod_gu^g(\pi)

Theorem: A version of UCB exactly optimizes NSW.

Core in ML

Federated Learning

[Chaudhury, Li, Kang, Li, Mehta, 2022]

FL: Choose fθ:RdRf_\theta:\mathbb R^d\to\mathbb R from F={fθ:θPRd}F=\{f_\theta:\theta\in P\subseteq\mathbb R^d\}

Utility of each agent: ui(θ)=ME(x,y)Di[(fθ(x),y)]u_i(\theta)=M-\mathbb E_{(x,y)\sim D_i}[\ell(f_\theta(x),y)]

Goal: Choose θ\theta that is fair for all agents

Core: A parameter vector θP\theta\in P is in the core if for all θP\theta'\in P and SNS\subseteq N, it holds

ui(θ)SNui(θ)iS,u_i(\theta)\ge\frac{|S|}{|N|}u_i(\theta')\forall i\in S,

with at least one strict inequality.

Pareto Optimality

Proportionality

🤔Theorem: When the agents’ utilities are continuous and the set of maximizer of any conical combination of the agents’ utilities is convex, a parameter vector θP\theta\in P in the core always exists.

🤔Theorem: When the agents’ utilities are concave, then the parameter vector θP\theta\in P that maximizes the NEW is in the core:

maximize iNui(θ) subject to θPmaximize iNlog(ui(θ)) subject to θP\text{maximize }\prod_{i\in N}u_i(\theta)\text{ subject to }\theta\in P\\ \text{maximize }\sum_{i\in N}\log(u_i(\theta))\text{ subject to }\theta\in P

Clustering

In ML

Goal: Analyze datasets to summarize their characteristics

Objects in the same group are similar

In Economics

Goal: Allocate a set of facilities that serve a set of agents. (e.g. hospital)

Centroid Clustering

Input

  • Set NN of nn data points
  • Set MM of mm feasible cluster centers
  • i,jNM\forall i,j\in N\cup M: we have d(i,j)d(i,j) which forms a Metric Space
    • d(i,i)=0iNMd(i,i)=0\forall i\in N\cup M
    • d(i,j)=d(j,i)i,jNMd(i,j)=d(j,i)\forall i,j\in N\cup M
    • d(i,j)d(i,k)+d(k,j)i,j,kNMd(i,j)\le d(i,k)+d(k,j)\forall i,j,k\in N\cup M

Output

  • A set CMC\subseteq M of kk centers
  • Each data points is assigned to its closest cluster center C(i)=argmincCd(i,c)C(i)=\arg\min_{c\in C}d(i,c)

Some Objective Functions

kk-median: minimizes the sum of the distances

kk-means: minimizes the sum of the square of the distances

kk-center: minimizes the maximum distance

Fairness in Clustering

[Chen, Fain, Lyu, Munagala, 2019]

Fair clustering through proportional entitlement

  • Every group of n/kn/k agents deserves its own cluster center

Definition in Committee Selection: WW is the core if

  • For all SNS\subseteq N and TMT\subseteq M

  • If STn/k|S|\ge|T|\cdot n/k, then AiWAiT|A_i\cap W|\ge|A_i\cap T| for some iSi\in S

    “If a group can afford TT, then TT should not be a strict Pareto improvement for the group”

Definition in Clustering: CC is in the core if

  • For all SNS\subseteq N and yMy\subseteq M

  • If Sn/k|S|\ge n/k, then d(i,C(i))d(i,y)d(i,C(i))\le d(i,y) for some iSi\in S

    “If a group can afford a center yy, then yy should not be a strict Pareto improvement for the group”

Core in Centroid Clustering

[Chen, Fain, Lyu, Munagala, 2019]

A clustering solution in the core does not always exist

Approximation core

α\alpha-core: A solution CC is in the α\alpha-core with α1\alpha\ge1 if there is no group of points SNS\subseteq N with Sn/k|S|\ge n/k and yMy\in M s.t.

iS,αd(i,y)<d(i,C(i))\forall i\in S,\alpha\cdot d(i,y)\lt d(i,C(i))

🤔Theorem. There exists an algorithm Greedy Capture that returns a clustering solution in the (1+2)(1+\sqrt2)-core under any metric space. For arbitrary metric spaces and α<2\alpha\lt 2, a clustering solution in the α\alpha-core is not guaranteed to exist.

🤔Theorem.

  • For L2L_2, Greedy Capture returns a solution in the 22-core
  • For L1L_1 and LL_\infty, GC does not always return a (<1+2)(\lt1+\sqrt2)-core
  • For L2L_2 and α<1.154\alpha<1.154, a clustering solution in the α\alpha-core is not guaranteed to exist
  • For L1L_1 and LL_\infty, and α<1.4\alpha\lt1.4, a clustering solution in the α\alpha-core is not guaranteed to exist

Core in Non-Centroid Clustering

Non-Centroid Clustering

  • Partition the individuals into kk clusters
  • loss for each cluster: for SNS\subseteq N and iS,i(S)0i\in S,\ell_i(S)\ge0.

α\alpha-core: A solution CC is in the α\alpha-core with α1\alpha\ge1, if there is no group of points SNS\subseteq N with Sn/k|S|\ge n/k s.t.

iS,i(S)<i(C(i))\forall i\in S,\ell_i(S)\lt\ell_i(C(i))

Average loss: for each SN,i(S)=1SiSd(i,i)S\subseteq N,\ell_i(S)=\frac{1}{|S|}\sum_{i'\in S}d(i,i')

🤔Theorem

  • Greedy Capture returns a clustering solution in the (n/k)(n/k)-core under any metric space
  • For arbitrary metric space and α<1.366\alpha\lt1.366, a clustering solution in the α\alpha-core is not guaranteed to exist

Open question: Does a clustering solution in the O(1)O(1)-core always exist?