Read the paper Proportionally Fair Clustering by Chen et al.

Abstract

nn points, kk centers.

Proportionality: any n/kn/k points are entitled to form own cluster if there is another center that is closer in distance for all n/kn/k points.

Clustering with no justified complaints from any subset of agents.

Tradeoff between proportional solutions and kk-means objective.

Introduction

In centroid clustering, we want to partition data into kk clusters by choosing kk centers and then matching points to one of the centers.

Each data point represents a individual. We with the clustering to be fair. These agents prefer to be clustered accurately.

A clustering is fair if it respects the entitlements of groups of agents: a subset of agents is entitled to choose a center for themselves if they constitute a sufficiently large fraction of the population w.r.t. the total number of clusters. No priori knowledge about which points should be protected.

Preliminaries

N\mathcal N data points, N=n|\mathcal N|=n.

M\mathcal M feasible cluster centers, M=m|\mathcal M|=m.

For i,jMNi,j\in\mathcal M\cup\mathcal N, we have distance (i,j)(i,j) satisfying the triangle inequality.

Open a set XMX\subseteq\mathcal M of X=k|X|=k centers, and then match all points in N\mathcal N to the closest center in XX.

For a particular solution XX and agent iNi\in\mathcal N, let Di(X)=minxXd(i,x)D_i(X)=\min_{x\in X}d(i,x). A good clustering solution XX will have small values of Di(X)D_i(X).

Different measures, different objectives:

  • kk-median: iNDi(X)\sum_{i\in\mathcal N}D_i(X)
  • kk-means: iN(Di(X))2\sum_{i\in\mathcal N}(D_i(X))^2
  • kk-center: maxiNDi(X)\max_{i\in\mathcal N}D_i(X)

Individuals prefer to be closer to their center in terms of distance.

📖Definition 1. Let XMX\subseteq\mathcal M with X=k|X|=k. SNS\subseteq N is a blocking coalition against XX if Snk|S|\ge\lceil\frac{n}{k}\rceil and yM\exist y\in\mathcal M s.t. iS,d(i,y)<Di(X)\forall i\in S,d(i,y)\lt D_i(X). XNX\subseteq \mathcal N is proportional if there is no blocking coalition against XX.

阻塞联盟:存在一个候选中心点 yy,存在一组数据点 SS 满足其点数要比簇平均点数多:使得 SS 中任意一点到 yy 的距离,都比这组数据点到原来分配的对应簇中心点的距离近。

比例公平:一组簇分配中不包含阻塞联盟,则该簇分配是比例公平的。

Equivalently, XX is proportional if SN\forall S \subseteq N with Snk|S| \ge\lceil\frac{n}{k}\rceil and for all yMy \in M, there exists iSi \in S with d(i,y)Di(X)d(i, y) \ge D_i(X).

❗The quantification is over all subset of sufficient size. Do not consider a single iSi\in S and ignore all of the other points.

Advantages of PROP

  • PROP implies weak Pareto optimality: For any PROP solution XX, there does not exist another solution XX' s.t. Di(X)<Di(X)D_i(X')\lt D_i(X) for all iNi\in\mathcal N.
  • Oblivious, does not depend on sensitive attribute or protected sub-groups.
  • Robust to outlier.
  • Scale invariant: a multiplicative scaling of all distances does not affect the set of PROP solution.
  • Approximately PROP can be efficiently computed.
  • Can be efficiently checked: no need to compute entire pairwise distance matrix.

However, PROP is incompatible with kk-center, kk-means, kk-median in the worst case.

📖Definition 2. XMX\subseteq\mathcal M with X=k|X|=k is ρ\rho-approximate proportional (abbr. ρ\rho-proportional) if SN\forall S\subseteq\mathcal N with S>nk|S|\gt\lceil\frac{n}{k}\rceil and yM\forall y\in\mathcal M, there exists iSi\in S with ρd(i,y)Di(X)\rho\cdot d(i,y)\ge D_i(X).

Results and Outline

Existence of PROP and its approximation

PROP clustering solution doesn’t always exist.

  • Better than 22-PROP solution doesn’t always exist.

A greedy algorithm yields (1+2)(1+\sqrt2)-PROP solution in the worst case.

PROP as constraint

To optimize kk-median objective subject to PROP constraint.

Use the rounding to find an O(1)O(1)-PROP solution to the kk-median.

PROP and Randomness

PROP is approximately preserved if we take a random sample of the data points of size O~(k3)\tilde{O}(k^3).

For constant kk, we can check if a given clustering s proportional as well as compute approximately PROP solutions in near linear time, comparable to the time for classic kk-means heuristic.

Existence and Computation of PROP Solutions

🤔Claim 1. For all ρ<2\rho\lt2, a ρ\rho-PROP solution is not guaranteed to exist.

Consider instance N={ai},M={xi}\mathcal N=\{a_i\},\mathcal M=\{x_i\} with N=M=6,k=3|\mathcal N|=|\mathcal M|=6,k=3 with distance:

x1x2x3x4x5x6
a1412
a2241
a3124
a4412
a5241
a6124

It’s easy to check that any solution can not be better than 22-PROP.

🤔Claim 2. In case where N=M\mathcal N=\mathcal M, for all ρ<1.5\rho\lt1.5, a ρ\rho-PROP solution is not guaranteed to exist.

A counterexample in the paper.

Computing 2.414-PROP Clustering

Let B(x,δ)={iN:d(i,x)δ}B(x,\delta)=\{i\in\mathcal N:d(i,x)\le\delta\} (The ball of distance δ\delta about center xx).

Pseudo Code - Greedy Capture

  1. δ0,X,NN\delta\gets0,X\gets\empty,N\gets\mathcal N
  2. while NN\neq\empty, do
    • Smoothly increase δ\delta
    • while xX\exist x\in X s.t. B(x,δ)N1|B(x,\delta)\cap N|\ge1 do
      • NNB(x,δ)N\gets N\diagdown B(x,\delta).
    • while x(MX)\exist x\in(\mathcal M\diagdown X) s.t. B(x,δ)Nnk|B(x,\delta)\cap N|\ge\lceil\frac{n}{k}\rceil do
      • XX{x}X\gets X\cup\{x\}
      • NNB(x,δ)N\gets N\diagdown B(x,\delta)
  3. return XX

Greedy Capture runs in O~(mn)\tilde O(mn) time.

The algorithm continuously grows. When the ball around a center has captured nk\lceil\frac{n}{k}\rceil points, we open that center and disregards all of the captured points.

Difference between kk-means and kk-median: Balls grow around data points in kk-median instead of centers.

🎯Theorem 1. Greedy Capture algorithm yields a (1+2)(1+\sqrt2)-PROP clustering, and there exists an instance for which this bound is tight.

  1. XX uses at most kk centers, because it only opens a center when nk\lceil\frac{n}{k}\rceil unmatched points are absorbed by the sphere. This can only happen at most kk times.
  2. Suppose for contradiction that XX is not (1+2)(1+\sqrt 2)-PROP. …

The example where Greedy Capture yields exactly this bound: N={a1,,a6},M={x1,,x4},k=3\mathcal N=\{a_1,\cdots,a_6\},\mathcal M=\{x_1,\cdots,x_4\},k=3.

x1x2x3x4
a111+√2
a2√2-11-ε
a31+√21-ε
a411+√2
a5√2-11-ε
a61+√21-ε
  • Greedy Capture will open x2x_2 and x4x_4. The coalition {a1,a2}\{a_1,a_2\} can each reduce their distance by factor 1+21+\sqrt2 and ϵ0\epsilon\to0 by deviating to x1x_1.

I strongly suspect that the PROP fairness notion in clustering is thought after the algorithm.🤐

❓Open Problem: Can we fill the gap between 1+21+\sqrt 2 and 22 by inventing new algorithm? Or can we invent a new algorithm that has a better bound than 1+21+\sqrt2?

Local Capture Heuristic

Local Capture heuristic produces more proportional clustering.

It takes a target value of ρ\rho as a parameter, and proceeds by iteratively finding a center that violates ρ\rho-fairness and swapping it for the center in the current solution that is least demanded.

Pseudo Code - LCH

Input: ρ\rho

  1. Initialize XX as a random subset of kk centers from M\mathcal M.
  2. repeat until no changes occur:
    1. for yMy\in\mathcal M do
      • Sy{iN:ρdiy<Di(X)}S_y\gets\{i\in N:\rho\cdot d_{iy}\lt D_i(X)\}
      • if Synk|S_y|\ge\lceil\frac{n}{k}\rceil then
        • xargminxX{iN:dix=Di(X)}x^*\gets\arg\min_{x\in X}|\{i\in N:d_{ix}=D_i(X)\}|
        • X(X{x}){y}X\gets(X\diagdown\{x^*\})\cup\{y\}
  3. return XX

LCH runs in O~(mn2)\tilde O(mn^2) time. No guarantee of convergence! aka for a given ρ\rho, there may not exist a ρ\rho-proportional solution.

If LCH terminate, it must return a ρ\rho-PROP solution.

PROP as a Constraint

Problem with LCH and Greedy Capture: They may find a PROP clustering with poor global objective, even when exact PROP clustering with good global objectives exist.

A bad example: k=2k=2, two easily defined clusters containing 40%,60%40\%,60\% of the data points respectively. It is possible Greedy Capture only open one point in the larger cluster. It is PROP, but not optimized w.r.t. main object.

So we need to solve this problem by optimizing the kk-median objective subject to PROP as a constraint.

Consider kk-median and kk-means.

🎯Theorem 2. There is a ρ\rho-PROP clustering with kk-median objective cc. In polynomial time in mm and nn, we can compute a O(ρ)O(\rho)-PROP clustering with kk-median objective at most 8c8c.

We can compute a constant approximate PROP with kk-median objective at most eight times the minimum kk-median objective PROP clustering.

Linear Programming

The standard linear programming relaxation of the kk-median minimization problem. Then add a constraint to encode PROP.

Minimize iNjMd(i,j)zij  (1)Subject to jMzij=1iN  (2)zijyjjM,iN  (3)jMyjk  (4)jB(j,γRj)yj1jM  (5)zij,yj[0,1]jM,iN  (6)\text{Minimize }\sum_{i\in\mathcal N}\sum_{j\in\mathcal M}d(i,j)z_{ij}\ \ (1)\\ \text{Subject to }\sum_{j\in\mathcal M}z_{ij}=1\forall i\in\mathcal N\ \ (2)\\ z_{ij}\le y_j\forall j\in\mathcal M,\forall i\in\mathcal N\ \ (3)\\ \sum_{j\in\mathcal M}y_j\le k\ \ (4)\\ \sum_{j'\in B(j,\gamma R_j)}y_{j'}\ge1\forall j\in\mathcal M\ \ (5)\\ z_{ij},y_j\in[0,1]\forall j\in\mathcal M,\forall i\in\mathcal N\ \ (6)

B(j,γR)B(j,\gamma R) is the ball with center jj with radius γR\gamma R.

zijz_{ij} is the indicator variable equal to 11 if iNi\in\mathcal N is matched to jMj\in\mathcal M.

yjy_j is the indicator variable equal to 11 if jXj\in X, i.e., if we want to use the center jj in the clustering.

  • Objective (1) is the kk-median objective.
  • Constraint (2) requires that every point must be matched.
  • Constraint (3) ensures that every points must be matched to the only center.
  • Constraint (4) allows at most kk centers to be opened.
  • Constraint (6) relax the indicator to be real value between 00 and 11.

The elaboration of constraint (5): it approximately encodes PROP. Let RjR_j be the minimum value s.t. B(j,Rj)nk|B(j,R_j)|\ge\lceil\frac{n}{k}\rceil. aka RjR_j is the distance of the nk\lceil\frac{n}{k}\rceil farthest point in N\mathcal N from jj.

🤔Lemma 1. Let XX be a clustering, γ1\gamma\ge1. If jM\forall j\in \mathcal M there exists some xXx\in X s.t. d(j,x)γRjd(j,x)\le\gamma R_j, then XX is (1+γ)(1+\gamma)-PROP. If XX is γ\gamma-PROP, then jM\forall j\in \mathcal M there exists some xXx\in X s.t. d(j,x)(1+γ)Rjd(j,x)\le(1+\gamma)R_j.

Proof by contradiction and triangle inequality.

So ρ\rho-PROP clustering XX with kk-median objective cc. In constraint (5) γ=ρ+1\gamma=\rho+1. Lemma 1 guarantees that XX is feasible for LP.

So the fractional solution has kk-median objective is at most cc. Then round the resulting fractional solution. A variation of rounding algorithm preserves constraint (5).

🤔Lemma 2. Let {yj},{zij}\{y_j\},\{z_{ij}\} be a fractional solution to the linear program in LP. There is an integer solution {y^j},{z^ij}\{\hat y_j\},\{\hat z_{ij}\} that is 88-approximation to the objective, and that opens kk centers. Furthermore, for all jMj\in\mathcal M, jB(j,27γRj)y^j1\sum_{j'\in B(j,27\gamma R_j)}\hat y_{j'}\ge1.

Lemma 1 + 2 implies result of the rounding (27(1+ρ)+1)(27(1+\rho)+1)-PROP, by setting γ=1+ρ\gamma=1+\rho. Since kk-median objective of the fractional solution is at most cc, kk-median objective of the rounded solution is at most 8c8c by proof. The factor 2727 can be improved to 1313 in the special case where N=M\mathcal N=\mathcal M.

Proof of Lemma 2

Argument is to show that with constraint (5), after rounding, the results is still approximately satisfied.

Given a fractional solution to the linear program as {yj},{zij}\{y_j\},\{z_{ij}\}, let Cˉi=jMd(i,j)zij\bar C_i=\sum_{j\in\mathcal M}d(i,j)z_{ij} aka contribution of point ii to the kk-median objective in the fractional optimum.

  • Step 1: Consolidate all demands tit_i (for all points it is set to 11) to obtain {ti}\{t_i'\} s.t. i,jM\forall i,j\in\mathcal M with ti,tj>0t_i',t_j'\gt0, they must be sufficiently far away s.t. cij>4max(Cˉi,Cˉj)c_{ij}\gt4\max(\bar C_i,\bar C_j). Let M\mathcal M' be the set of centers with positive demand after this step, i.e., M={jM:tj>0}\mathcal M'=\{j\in\mathcal M:t_j'\gt0\}.
  • Step 2a: Consolidate open centers by moving each center not in M\mathcal M' to the nearest center in M\mathcal M'.
  • Step 2b: …

Very complex.

Sampling for Linear-Time Implementations and Auditing

PROP under uniform random sampling.

Draw N|N| individuals independent and identically distributed from the uniform distribution on N\mathcal N.

This paper shows that PROP is well preserved under random sampling.

PROP Under Random Sampling

For any XMX\subseteq\mathcal M of size kk and center yMy\in\mathcal M, define R(N,X,y)={iN:Di(X)>ρd(i,y)}R(\mathcal N,X,y)=\{i\in\mathcal N:D_i(X)\gt\rho\cdot d(i,y)\}. Note that XX is not ρ\rho-PROP w.r.t. N\mathcal N iff there is some yMy\in\mathcal M s.t. R(N,X,y)N1k\frac{|R(\mathcal N,X,y)|}{|\mathcal N|}\ge\frac{1}{k}.

A random sample approximately preserves this fraction for all solution XX and deviating centers yy.

Proof sketch: we take a union bound over all possible solutions and deviations, and there are only kCmkk\cdot C_m^k such combinations.

🤔🎯Theorem 3. Given N,M\mathcal N,\mathcal M and parameter ρ1\rho\ge1, fix parameters ϵ,δ[0,1]\epsilon,\delta\in[0,1]. Let NNN\subseteq\mathcal N of size Ω(k3ϵ2logmδ)\Omega(\frac{k^3}{\epsilon^2}\log\frac{m}{\delta}) be chosen uniformly at random. Then, with probability at least 1δ1-\delta, the following holds for all (X,y)(X,y):

R(N,X,y)NR(N,X,y)Nϵk\left\vert \frac{\vert R(N,X,y)\vert}{\vert N\vert}-\frac{\vert R(\mathcal N,X,y)\vert}{\vert\mathcal N\vert} \right\vert \le\frac{\epsilon}{k}

已经对 1δ1-\deltaO/Ω(1ϵlog1δ)O/\Omega(\frac{1}{\epsilon}\log\frac{1}{\delta}), Chernoff, Hoeffding 这样的东西 PTSD 了谢谢

NN is a random sample of N\mathcal N. Hoeffding’s inequality implies that for any fixed (X,y)(X,y), a sample of size N=O(1ϵ^2log1δ^)|N|=O(\frac{1}{\hat\epsilon^2}\log\frac{1}{\hat\delta}) is sufficient to achieve

R(N,X,y)NR(N,X,y)Nϵ^\left\vert \frac{\vert R(N,X,y)\vert}{\vert N\vert}-\frac{\vert R(\mathcal N,X,y)\vert}{\vert\mathcal N\vert} \right\vert\le\hat\epsilon

with probability at least 1δ^1-\hat\delta. Note that there are q=mCmkq=m\cdot C_m^k possible choice of (X,y)(X,y) over which we take the union bound. Setting δ=δ^q,ϵ=ϵ^k\delta=\frac{\hat\delta}{q},\epsilon=\frac{\hat\epsilon}{k} is sufficient for the union bound to yield the theorem.

We say solution XX is ρ\rho-PROP to (1+ϵ)(1+\epsilon)-deviations if for all yMy\in\mathcal M and for all SNS\subseteq\mathcal N where S(1+ϵ)nk|S|\ge(1+\epsilon)\frac{n}{k}, there exists some iSi\in S s.t. ρd(i,y)Di(X)\rho\cdot d(i,y)\ge D_i(X). Note that if XX is ρ\rho-PROP to 11-deviation, it is simply ρ\rho-PROP.

🤔Corollary 1. Let NNN\subset\mathcal N be a uniform random sample of size N=Ω(k3ϵ2lnmδ)|N|=\Omega(\frac{k^3}{\epsilon^2}\ln\frac{m}{\delta}). Suppose XMX\subseteq M with X=k|X|=k is ρ\rho-PROP w.r.t. NN. Then with probability at least 1δ1-\delta, XX is ρ\rho-PROP to (1+ϵ)(1+\epsilon)-deviations w.r.t. N\mathcal N.

Linear Time Implementation

We can approximately implement Greedy Capture in nearly linear time, comparable to the running time of the standard kk-means heuristic.

🤔Corollary 2. Greedy Capture, when run on M\mathcal M and a random sample NNN\subseteq\mathcal N of size N=Θ~(k3ϵ2)|N|=\tilde\Theta(\frac{k^3}{\epsilon^2}), provides a solution that is (1+2)(1+\sqrt 2)-PROP to (1+ϵ)(1+\epsilon)-deviation with high probability in O~(k3ϵ2m)\tilde O(\frac{k^3}{\epsilon^2}m) time.

There are similar speedup for Local Capture algorithm.

Efficient Auditing

Audit Problem: To answer whether a solution produced happens to be proportional.

Given N,M\mathcal N,\mathcal M, XNX\subset \mathcal N with X<k|X|\lt k, find the minimum value of ρ\rho s.t. XX is ρ\rho-PROP.

Trivial idea: One can solve the audit problem exactly in O((k+m)n)O((k+m)n) time by computing for each yMy\in\mathcal M, the quantity ρy\rho_y, the nk\lceil\frac{n}{k}\rceil largest value of Di(X)d(i,y)\frac{D_i(X)}{d(i,y)}. Then find the yy that maximizes ρy\rho_y. Again, this takes quadratic time, which can be worse than the time taken to find the clustering itself.

(ϵ,δ)(\epsilon,\delta)-Audit Problem: Find the minimum value of ρ\rho s.t. XX is ρ\rho-PROP to (1+ϵ)(1+\epsilon)-deviations with probability at least 1δ1-\delta. This problem can be efficiently solved by using a random sample NNN\subseteq\mathcal N of points to conduct the audit.

🤔Corollary 3. The (ϵ,δ)(\epsilon,\delta)-Audit Problem can be solved in O~((k+m)k3ϵ2)\tilde O((k+m)\frac{k^3}{\epsilon^2}) time.

Implementation and Empirical Results

Highly depends on which dataset.

Conclusion and Open Problems

The exact PROP may not exist.

Efficient algorithms for computing approximate PROP solutions.

Constrained optimization and sampling for further applications.

Finally, PROP on real data: data dependent trade-off between PROP and kk-means objective.

Open Problems

❓Can we close the approximate factor gap between 22 and 1+21+\sqrt2?

❓Is there a more efficient and easily explainable algorithm for optimizing total cost subject to PROP?

❓Is there any other fair solution concepts for clustering?

❓Can the idea of PROP as a group fairness concept be adapted for supervised learning tasks like classification and regression?

Bonus: Trying to break the 1+21+\sqrt2 barrier

Look at the example for 1+21+\sqrt 2 lower bound by applying Greedy Capture.

Let’s look at the half.

a1a2a3
x11121\sqrt2-11+21+\sqrt2
x21+21+\sqrt21ϵ1-\epsilon1ϵ1-\epsilon

The reason why this is (1+2)(1+\sqrt2)-PROP is that x2x_2 clinches a2,a3a_2,a_3 prior to x1x_1.

But if we choose x1x_1 as clustering center, the clustering will become PROP!

Now consider some modification on Greedy Capture, making it better approximation for PROP.

Pseudo Code for “Advanced” Greedy Capture

Input: k,N,Mk,\mathcal N,\mathcal M.

Output: a set of XX containing at most kk cluster centers.

  1. Δj0jM\Delta_j\gets0\forall j\in\mathcal M
  2. X,NN,CX\gets\empty,N\gets\mathcal N,C\gets\empty
  3. While NN\neq\empty do
    1. Let iM,xNCi\in\mathcal M,x\in N\diagdown C s.t. d(i,x)Δid(i,x)-\Delta_i is minimum
    2. Δid(i,x)\Delta_i\gets d(i,x)
    3. CC{x}C\gets C\cup\{x\}
    4. If iXi\in X then
      • NN{x}N\gets N\diagdown\{x\}
    5. else if iMXi\in\mathcal M\diagdown X and B(i,Δi)CN=n/k|B(i,\Delta_i)\cap C\cap N|=\lceil n/k\rceil then
      • XX{x}X\gets X\cup\{x\}
      • NN(B(i,Δi)C)N\gets N\diagdown (B(i,\Delta_i)\cap C)
  4. return XX

On the (1+2)(1+\sqrt 2)-PROP example, this “Advanced” Greedy Capture computes a exactly PROP clustering.

On the 22-PROP example, this algorithm computes 22-PROP clustering.

But this algorithm doesn’t work. Consider the following case:

The algorithm will open x1x_1 because it always swallow ii with minimum d(i,x)Δid(i,x)-\Delta_i, but point 2,32,3 prefer x2x_2.