Fair AI/ML 2 PROP Fair Clustering
Read the paper Proportionally Fair Clustering by Chen et al.
Abstract
points, centers.
Proportionality: any points are entitled to form own cluster if there is another center that is closer in distance for all points.
Clustering with no justified complaints from any subset of agents.
Tradeoff between proportional solutions and -means objective.
Introduction
In centroid clustering, we want to partition data into clusters by choosing centers and then matching points to one of the centers.
Each data point represents a individual. We with the clustering to be fair. These agents prefer to be clustered accurately.
A clustering is fair if it respects the entitlements of groups of agents: a subset of agents is entitled to choose a center for themselves if they constitute a sufficiently large fraction of the population w.r.t. the total number of clusters. No priori knowledge about which points should be protected.
Preliminaries
data points, .
feasible cluster centers, .
For , we have distance satisfying the triangle inequality.
Open a set of centers, and then match all points in to the closest center in .
For a particular solution and agent , let . A good clustering solution will have small values of .
Different measures, different objectives:
- -median:
- -means:
- -center:
Individuals prefer to be closer to their center in terms of distance.
📖Definition 1. Let with . is a blocking coalition against if and s.t. . is proportional if there is no blocking coalition against .
阻塞联盟:存在一个候选中心点 ,存在一组数据点 满足其点数要比簇平均点数多:使得 中任意一点到 的距离,都比这组数据点到原来分配的对应簇中心点的距离近。
比例公平:一组簇分配中不包含阻塞联盟,则该簇分配是比例公平的。
Equivalently, is proportional if with and for all , there exists with .
❗The quantification is over all subset of sufficient size. Do not consider a single and ignore all of the other points.
Advantages of PROP
- PROP implies weak Pareto optimality: For any PROP solution , there does not exist another solution s.t. for all .
- Oblivious, does not depend on sensitive attribute or protected sub-groups.
- Robust to outlier.
- Scale invariant: a multiplicative scaling of all distances does not affect the set of PROP solution.
- Approximately PROP can be efficiently computed.
- Can be efficiently checked: no need to compute entire pairwise distance matrix.
However, PROP is incompatible with -center, -means, -median in the worst case.
📖Definition 2. with is -approximate proportional (abbr. -proportional) if with and , there exists with .
Results and Outline
Existence of PROP and its approximation
PROP clustering solution doesn’t always exist.
- Better than -PROP solution doesn’t always exist.
A greedy algorithm yields -PROP solution in the worst case.
PROP as constraint
To optimize -median objective subject to PROP constraint.
Use the rounding to find an -PROP solution to the -median.
PROP and Randomness
PROP is approximately preserved if we take a random sample of the data points of size .
For constant , we can check if a given clustering s proportional as well as compute approximately PROP solutions in near linear time, comparable to the time for classic -means heuristic.
Existence and Computation of PROP Solutions
🤔Claim 1. For all , a -PROP solution is not guaranteed to exist.
Consider instance with with distance:
x1 x2 x3 x4 x5 x6 a1 4 1 2 ∞ ∞ ∞ a2 2 4 1 ∞ ∞ ∞ a3 1 2 4 ∞ ∞ ∞ a4 ∞ ∞ ∞ 4 1 2 a5 ∞ ∞ ∞ 2 4 1 a6 ∞ ∞ ∞ 1 2 4 It’s easy to check that any solution can not be better than -PROP.
🤔Claim 2. In case where , for all , a -PROP solution is not guaranteed to exist.
A counterexample in the paper.
Computing 2.414-PROP Clustering
Let (The ball of distance about center ).
Pseudo Code - Greedy Capture
- while , do
- Smoothly increase
- while s.t. do
- .
- while s.t. do
- return
Greedy Capture runs in time.
The algorithm continuously grows. When the ball around a center has captured points, we open that center and disregards all of the captured points.
Difference between -means and -median: Balls grow around data points in -median instead of centers.
🎯Theorem 1. Greedy Capture algorithm yields a -PROP clustering, and there exists an instance for which this bound is tight.
- uses at most centers, because it only opens a center when unmatched points are absorbed by the sphere. This can only happen at most times.
- Suppose for contradiction that is not -PROP. …
The example where Greedy Capture yields exactly this bound: .
x1 x2 x3 x4 a1 1 1+√2 ∞ ∞ a2 √2-1 1-ε ∞ ∞ a3 1+√2 1-ε ∞ ∞ a4 ∞ ∞ 1 1+√2 a5 ∞ ∞ √2-1 1-ε a6 ∞ ∞ 1+√2 1-ε
- Greedy Capture will open and . The coalition can each reduce their distance by factor and by deviating to .
I strongly suspect that the PROP fairness notion in clustering is thought after the algorithm.🤐
❓Open Problem: Can we fill the gap between and by inventing new algorithm? Or can we invent a new algorithm that has a better bound than ?
Local Capture Heuristic
Local Capture heuristic produces more proportional clustering.
It takes a target value of as a parameter, and proceeds by iteratively finding a center that violates -fairness and swapping it for the center in the current solution that is least demanded.
Pseudo Code - LCH
Input:
- Initialize as a random subset of centers from .
- repeat until no changes occur:
- for do
- if then
- for do
- return
LCH runs in time. No guarantee of convergence! aka for a given , there may not exist a -proportional solution.
If LCH terminate, it must return a -PROP solution.
PROP as a Constraint
Problem with LCH and Greedy Capture: They may find a PROP clustering with poor global objective, even when exact PROP clustering with good global objectives exist.
A bad example: , two easily defined clusters containing of the data points respectively. It is possible Greedy Capture only open one point in the larger cluster. It is PROP, but not optimized w.r.t. main object.
So we need to solve this problem by optimizing the -median objective subject to PROP as a constraint.
Consider -median and -means.
🎯Theorem 2. There is a -PROP clustering with -median objective . In polynomial time in and , we can compute a -PROP clustering with -median objective at most .
We can compute a constant approximate PROP with -median objective at most eight times the minimum -median objective PROP clustering.
Linear Programming
The standard linear programming relaxation of the -median minimization problem. Then add a constraint to encode PROP.
is the ball with center with radius .
is the indicator variable equal to if is matched to .
is the indicator variable equal to if , i.e., if we want to use the center in the clustering.
- Objective (1) is the -median objective.
- Constraint (2) requires that every point must be matched.
- Constraint (3) ensures that every points must be matched to the only center.
- Constraint (4) allows at most centers to be opened.
- Constraint (6) relax the indicator to be real value between and .
The elaboration of constraint (5): it approximately encodes PROP. Let be the minimum value s.t. . aka is the distance of the farthest point in from .
🤔Lemma 1. Let be a clustering, . If there exists some s.t. , then is -PROP. If is -PROP, then there exists some s.t. .
Proof by contradiction and triangle inequality.
So -PROP clustering with -median objective . In constraint (5) . Lemma 1 guarantees that is feasible for LP.
So the fractional solution has -median objective is at most . Then round the resulting fractional solution. A variation of rounding algorithm preserves constraint (5).
🤔Lemma 2. Let be a fractional solution to the linear program in LP. There is an integer solution that is -approximation to the objective, and that opens centers. Furthermore, for all , .
Lemma 1 + 2 implies result of the rounding -PROP, by setting . Since -median objective of the fractional solution is at most , -median objective of the rounded solution is at most by proof. The factor can be improved to in the special case where .
Proof of Lemma 2
Argument is to show that with constraint (5), after rounding, the results is still approximately satisfied.
Given a fractional solution to the linear program as , let aka contribution of point to the -median objective in the fractional optimum.
- Step 1: Consolidate all demands (for all points it is set to ) to obtain s.t. with , they must be sufficiently far away s.t. . Let be the set of centers with positive demand after this step, i.e., .
- Step 2a: Consolidate open centers by moving each center not in to the nearest center in .
- Step 2b: …
- …
Very complex.
Sampling for Linear-Time Implementations and Auditing
PROP under uniform random sampling.
Draw individuals independent and identically distributed from the uniform distribution on .
This paper shows that PROP is well preserved under random sampling.
PROP Under Random Sampling
For any of size and center , define . Note that is not -PROP w.r.t. iff there is some s.t. .
A random sample approximately preserves this fraction for all solution and deviating centers .
Proof sketch: we take a union bound over all possible solutions and deviations, and there are only such combinations.
🤔🎯Theorem 3. Given and parameter , fix parameters . Let of size be chosen uniformly at random. Then, with probability at least , the following holds for all :
已经对 ,, Chernoff, Hoeffding 这样的东西 PTSD 了谢谢
is a random sample of . Hoeffding’s inequality implies that for any fixed , a sample of size is sufficient to achieve
with probability at least . Note that there are possible choice of over which we take the union bound. Setting is sufficient for the union bound to yield the theorem.
We say solution is -PROP to -deviations if for all and for all where , there exists some s.t. . Note that if is -PROP to -deviation, it is simply -PROP.
🤔Corollary 1. Let be a uniform random sample of size . Suppose with is -PROP w.r.t. . Then with probability at least , is -PROP to -deviations w.r.t. .
Linear Time Implementation
We can approximately implement Greedy Capture in nearly linear time, comparable to the running time of the standard -means heuristic.
🤔Corollary 2. Greedy Capture, when run on and a random sample of size , provides a solution that is -PROP to -deviation with high probability in time.
There are similar speedup for Local Capture algorithm.
Efficient Auditing
Audit Problem: To answer whether a solution produced happens to be proportional.
Given , with , find the minimum value of s.t. is -PROP.
Trivial idea: One can solve the audit problem exactly in time by computing for each , the quantity , the largest value of . Then find the that maximizes . Again, this takes quadratic time, which can be worse than the time taken to find the clustering itself.
-Audit Problem: Find the minimum value of s.t. is -PROP to -deviations with probability at least . This problem can be efficiently solved by using a random sample of points to conduct the audit.
🤔Corollary 3. The -Audit Problem can be solved in time.
Implementation and Empirical Results
Highly depends on which dataset.
Conclusion and Open Problems
The exact PROP may not exist.
Efficient algorithms for computing approximate PROP solutions.
Constrained optimization and sampling for further applications.
Finally, PROP on real data: data dependent trade-off between PROP and -means objective.
Open Problems
❓Can we close the approximate factor gap between and ?
❓Is there a more efficient and easily explainable algorithm for optimizing total cost subject to PROP?
❓Is there any other fair solution concepts for clustering?
❓Can the idea of PROP as a group fairness concept be adapted for supervised learning tasks like classification and regression?
Bonus: Trying to break the barrier
Look at the example for lower bound by applying Greedy Capture.
Let’s look at the half.
a1 | a2 | a3 | |
---|---|---|---|
x1 | |||
x2 |
The reason why this is -PROP is that clinches prior to .
But if we choose as clustering center, the clustering will become PROP!
Now consider some modification on Greedy Capture, making it better approximation for PROP.
Pseudo Code for “Advanced” Greedy Capture
Input: .
Output: a set of containing at most cluster centers.
- While do
- Let s.t. is minimum
- If then
- else if and then
- return
On the -PROP example, this “Advanced” Greedy Capture computes a exactly PROP clustering.
On the -PROP example, this algorithm computes -PROP clustering.
But this algorithm doesn’t work. Consider the following case:
graph TB 1(x1) 2(x2) 3((1)) 4((2)) 5((3)) 1---|1-ε|3 1---|1.3|4 1---|1.3|5 2---|1|3 2---|1|4 2---|1|5
The algorithm will open because it always swallow with minimum , but point prefer .