A clustering CC satisfies Fully Justified Representation (FJR), if there doesn’t exist a subset of points VV, such that

  • Vn/k\vert V\vert\ge\lceil n/k\rceil.
  • iV:costi(V)<minjVcostj(C(j))\forall i\in V:cost_i(V)\lt\min_{j\in V}cost_j(C(j)).

It is proven that FJR is guaranteed to exist. An exact FJR solution can be found in exponential time by brute force algorithm.

The paper also mentioned that we can achieve the approximation of FJR in polynomial time by greedy capture algorithm with subroutine smallest agent ball. (For maximum cost is 22, average cost is 44).

🤔We are trying to push these lower bounds.

Let’s start from some special scenarios.

Metric Space (Arbitrary Distance)

Important quote from original paper:

The most cohesive cluster: Given a set of agents NN and a threshold τ\tau, the most cohesive cluster problem asks to find a cluster SNS\subseteq N of size at least τ\tau such that the maximum loss of any iSi\in S for SS is minimized, i.e., find argminSN:SτmaxiSi(S)\arg\min_{S\subseteq N':|S|\ge\tau}\max_{i\in S}\ell_i(S).

This can be done in exponential time by brute force. By calling the most cohesive cluster as subroutine, the greedy capture are guaranteed to find the clustering solution satisfying FJR.

❓Can we find FJR solution in POLYNOMIAL time?

Look back at smallest agent ball. It iteratively find the point with minimal RADIUS centered around that point that covers τ\tau points.

What if we consider DIAMETER instead?

FJR in Polynomial Time

Greedy Capture with Diameter Growth

Diameter is the maximum distance between a pair of points in the same cluster.

Similar to centroid clustering, we regard each point as a cluster center.

Iteratively, do among the remaining point:

  • Start from 00, slightly increase the diameter of the “ball” around each “cluster center”.
  • If a ball includes n/k\lceil n/k\rceil points, secure it as a cluster.

🤔It seems that for the maximum cost, this version output the exact FJR solution in polynomial time. Because the diameter growth version actually finds the most cohesive cluster. Because in this case, the maximum loss of a group of points is minimized.

Pseudo code

Greedy Cohesive Clustering is omitted here. We focus on the subroutine DiameterGrowth

Input: (Remaining) subset of agents NNN'\subseteq N, metric dd, threshold integer τ\tau.

Output: Cluster SS.

  1. If Nτ|N'|\le\tau then return SNS\gets N';
  2. δ0\delta\gets0 and ListList\gets sorted pairwise distance (in ascending order); // initialize diameter
  3. while there is no ball including τ\tau points, do
    • δ\delta\gets next value in ListList;
    • for iNi\in N' do
      • BalliBalliBall_i\gets Ball_i\cup (all points have distance less than δ\delta to any point in BalliBall_i)
  4. return SS\gets the ball including τ\tau points.

Running Time Estimation

  1. The number of distances is O(n2)O(n^2). So sorting of distances takes O(n2logn)O(n^2\log n) time.

  2. We have at most O(n2)O(n^2) times to increase the diameter δ\delta.

    In each increasing round, we iterate through O(n)O(n) balls. For each ball, the number of comparison is the size of the ball multiply the remaining point.

Obviously the subroutine runs in polynomial time.

The number of calling the subroutine is at most kk times.

Conclusion: Greedy Capture with Diameter Growth output the clustering in FJR in polynomial time.

Binary Distance

⚠️In this setting, the “distance” may not necessarily submit to triangle inequality.

Maximum Cost

🎯Claim: For maximum cost and binary distance scenario, the FJR is equivalent to the core.

It is not hard to see a clustering in the core implies FJR.

Reverse implication also holds, because in the definition of FJR, the inequality of cost suggests that LHS to be 00 and RHS to be 11. Thus the FJR-violator should be a clique where every point has cost 11 in their original clusters.

This definition aligns with definition for the core.

❓Can we find FJR (core) solution by polynomial time algorithm in this setting?

Polynomial time means we need to get rid of finding cliques. Some unconfirmed ideas:

  1. Find the big “stars”, and play with them.

Average Cost

No idea so far.