Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied.
Credit: Jing ZHANG, Ruidong FAN, Hong TAO, Jiacheng JIANG, Chenping HOU.
Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied.
To deal with this problem, a research team led by Chenping Hou published their new research on 15 June 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a constrained Clustering with Weak Label Prior (CWLP) to consider compound weak label prior in an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. Except for the theoretical convergence and computational complexity analyses, the experimental evaluation illustrates the superiority of the proposed approach.
In the research, both pairwise constraints information and cluster ratio information are helpful in improving the confidence of the clustering problem. To establish a unified model by simultaneously integrating pairwise constraints information and cluster ratio information, which could effectively improve the clustering performance.
Specifically, the pairwise constraints information is utilized as a regularization term in the spectral clustering model. The cluster ratio is added as a constraint to the indicator matrix. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with a scaled cluster indicator matrix. Instead of fixing an initial similarity matrix in the integrated model, they learn a new similarity matrix that is more suitable for deriving the final clustering results. These ideas can help to reduce information loss and obtain a globally optimized clustering result. Extensive experiments on ten benchmark data sets clearly validate the effectiveness of the proposed method for constrained clustering with weak label prior.
In our future work, methods to decrease the computational complexity of the proposed method are worth studying, so that the computational efficiency can be increased even more and the improved method can be applied to large-scale datasets.
DOI: 10.1007/s11704-023-3355-7
Journal
Frontiers of Computer Science
DOI
10.1007/s11704-023-3355-7
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Constrained clustering with weak label prior
Article Publication Date
15-Jun-2024