Fair AI/ML 8 Split and Merge Algorithm on Average Cost
硕士论文就跟 Non-Centroid Clustering 杠上了。我和导师都希望能
- 探索 Split and Merge 算法的更多性质
- 扩展 Split and Merge 算法到更一般的情况
仍然只考虑:
binary distance
the distance of each pair of data point is either or .
non-metric
no restriction of triangle inequality
asymmetric distance
Equivalent Setting
Those three conditions is equivalent to APPROVAL VOTING.
- Each data point is a voter.
- Each voter approve a subset of other data points
Cost Definition
The maximum and average costs are therefore defined equivalently and accordingly.
The maximum cost:
The minimum cost:
It’s not hard to prove that Split and Merge algorithm also computes a clustering in the core with BINARY, ASYMMETRIC, NON-METRIC “distance”.
On Average Cost
🎯Theorem (On Average Cost): Split and Merge algorithm computes a clustering in -core.
Assume there exist a subset of agents where each agent would improve more than multiplicative factor .
In Split and Merge algorithm, the average cost of data point is at most .
Therefore, for each subset of , there exist one agent does not improve its cost (from to ). (Consider maximum cost, Split and Merge computes the solution in the core, which means for each subset of , not all agent would be strictly better off.)
Thus, the average cost of this agent is at most . Possible improved average cost is at least , meaning that the improvement can not be greater than factor of .
Can we improve this factor? No idea… 🤔
Following Directions on Average Cost
From the paper, we learn that for symmetric, metric, arbitrary distance, the lower bound of core approximation factor is .
🤔But what if we consider binary distance remove symmetric and metric limitation? Could the core be non-empty in this case?
🤔What about graphs with certain structures? (scale-free, small-world networks…)