Fair AI/ML 9 On Fully Justified Representation
A clustering satisfies Fully Justified Representation (FJR), if there doesn’t exist a subset of points , such that
- .
- .
It is proven that FJR is guaranteed to exist. An exact FJR solution can be found in exponential time by brute force algorithm.
The paper also mentioned that we can achieve the approximation of FJR in polynomial time by greedy capture algorithm with subroutine smallest agent ball. (For maximum cost is , average cost is ).
🤔We are trying to push these lower bounds.
Let’s start from some special scenarios.
Metric Space (Arbitrary Distance)
Important quote from original paper:
The most cohesive cluster: Given a set of agents and a threshold , the most cohesive cluster problem asks to find a cluster of size at least such that the maximum loss of any for is minimized, i.e., find .
This can be done in exponential time by brute force. By calling the most cohesive cluster as subroutine, the greedy capture are guaranteed to find the clustering solution satisfying FJR.
❓Can we find FJR solution in POLYNOMIAL time?
Look back at smallest agent ball. It iteratively find the point with minimal RADIUS centered around that point that covers points.
What if we consider DIAMETER instead?
FJR in Polynomial Time
Greedy Capture with Diameter Growth
Diameter is the maximum distance between a pair of points in the same cluster.
Similar to centroid clustering, we regard each point as a cluster center.
Iteratively, do among the remaining point:
- Start from , slightly increase the diameter of the “ball” around each “cluster center”.
- If a ball includes points, secure it as a cluster.
🤔It seems that for the maximum cost, this version output the exact FJR solution in polynomial time. Because the diameter growth version actually finds the most cohesive cluster. Because in this case, the maximum loss of a group of points is minimized.
Pseudo code
Greedy Cohesive Clustering is omitted here. We focus on the subroutine DiameterGrowth
Input: (Remaining) subset of agents , metric , threshold integer .
Output: Cluster .
- If then return ;
- and sorted pairwise distance (in ascending order); // initialize diameter
- while there is no ball including points, do
- next value in ;
- for do
- (all points have distance less than to any point in )
- return the ball including points.
Running Time Estimation
The number of distances is . So sorting of distances takes time.
We have at most times to increase the diameter .
In each increasing round, we iterate through balls. For each ball, the number of comparison is the size of the ball multiply the remaining point.
Obviously the subroutine runs in polynomial time.
The number of calling the subroutine is at most times.
Conclusion: Greedy Capture with Diameter Growth output the clustering in FJR in polynomial time.
Binary Distance
⚠️In this setting, the “distance” may not necessarily submit to triangle inequality.
Maximum Cost
🎯Claim: For maximum cost and binary distance scenario, the FJR is equivalent to the core.
It is not hard to see a clustering in the core implies FJR.
Reverse implication also holds, because in the definition of FJR, the inequality of cost suggests that LHS to be and RHS to be . Thus the FJR-violator should be a clique where every point has cost in their original clusters.
This definition aligns with definition for the core.
❓Can we find FJR (core) solution by polynomial time algorithm in this setting?
Polynomial time means we need to get rid of finding cliques. Some unconfirmed ideas:
- Find the big “stars”, and play with them.
Average Cost
No idea so far.