In this post, we focus on upper and lower bound of core where non-clusterings are.

Let the loss (or cost) function for every data points be maximum distance between any other points and itself:

i(S)=maxjSd(i,j).\ell_i(S)=\max_{j\in S}d(i,j).

We have known that a clustering in the exact core must exist given N,kN,k.

For the approximation of core: The Greedy Capture algorithm computes the upper bound of 22 in polynomial time. And it have the lower bound of 11.

What about the gap between 1 and 2???

There are some open problems and I am trying to answer.

Complexity for Auditing the Core

❓What is the time complexity for verifying whether a given clustering is in the exact core?

TL;DR It is a NP-hard problem.

The analysis of verifying problem:

We need to check if any subset SVS\subseteq V with Sn/k|S|\ge\lceil n/k\rceil can form a cluster such that the maximum distance between any two points in SS is less than each agent’s maximum distance to others in its current cluster. If such a subset exists, then the clustering violates the core property.

Consider a reduction from another NP-complete problem MAXCLIQUEMAXCLIQUE. The problem asks if there is a clique of a specified size in an undirected graph G=(V,E)G=(V,E).

Problem Setup: We have an undirected graph G=(V,E)G = (V, E) with nn vertices. We need to decide if a given clustering is in the core, with each data point’s cost being the maximum distance between itself and all other points in the same cluster.

Constructing the Graph: To reduce MAXCLIQUEMAXCLIQUE to our core-checking problem, we will build a graph with the properties:

  • For every edge (i,j)E(i,j)\in E, the distance d(i,j)d(i, j) is less than
    1. the maximum distance between ii and all other points in ii’s cluster
    2. the maximum distance between jj and all other points in jj’s cluster.

Formulating the Core Condition in Terms of MAXCLIQUEMAXCLIQUE: If a cluster is in the core, no subset SS of data points, with Snk|S| \geq\lceil\frac{n}{k}\rceil, can improve their maximum distance cost by clustering together. If we interpret each cluster as requiring a subset of nodes that are all pairwise connected (thus forming a clique), we can state the core problem as follows:

  • Determine if there exists a subset of nodes of size at least nk\lceil\frac{n}{k}\rceil such that every pair of nodes in it are connected.
  • This requires identifying the largest clique in GG and checking if it can be at least as large as nk\lceil\frac{n}{k}\rceil, which directly maps to finding the maximum clique.

Verifying Core Membership: To verify if a clustering is in the core using this construction, we would:

  • Check if there exists a subset SVS \subseteq V with Snk|S| \geq \lceil\frac{n}{k}\rceil such that all nodes in SS have edges to each other (i.e., a clique).
  • If such a subset SS exists, it would imply that these nodes could deviate by forming their own cluster with a smaller maximum distance, violating the core property.
  • Thus, solving this for all clusters to determine if the clustering as a whole satisfies the core property would require us to solve a clique-finding problem.

Completing the NP-Hardness Reduction: Since finding a maximum clique is NP-hard, and since verifying the core property requires solving the maximum clique problem to check for the presence of sufficiently large cliques in each cluster, the problem of deciding if a clustering is in the core is NP-hard.

Complexity for Computing Core

❓What is the complexity for computing a clustering in the exact core?

I don’t have a clear idea yet.

But it maybe harder than NPNP. Maybe PNPP^{NP}, NPNPNP^{NP} or PSPACEPSPACE?

Complexity for Computing α-Core

❓What is the complexity for computing a clustering in α\alpha-core (1<α<21\lt\alpha\lt2)?

Is there exist polynomial algorithm?