Cluster Validity Analysis on Soft Set Based Clustering

Rabiei Mamat, Mustafa Mat Deris, Ahmad Shukri Mohd Noor, Sumazly Sulaiman

Abstract


The issue of data uncertainties are very important in categorical data clustering since the boundary between created clusters are very arguable. Therefore the algorithm called Maximum Attribute Relative (MAR) that is based on the attribute relative of soft-set theory was proposed previously. MAR exploiting the data uncertainties in multi-value information system by introducing a series of clustering attribute. The clusters will be form by using this selected clustering attributes. However, clustering algorithm define clusters that are not-known a priori. Hence, the final clusters of data requires some validation techniques. In this paper, the validity of the clusters produced by MAR was evaluated. The two datasets obtained from UCI-ML repository and an examination results obtained from Malaysian Ministry of Education. The results shows that the clusters produced by MAR has objects similarity up to 99%.

Keywords


Attribute Relative; Categorical Data; Data Clustering; Soft-Set;

Full Text:

PDF

References


D. Molodtsov, “Soft set theory-first results,” in Computer and Mathematics with Applications, vol. 37, no. 4/5, pp. 19-31, 1999.

R. Mamat, M. M. Deris and T. Herawan, “MAR – Maximum attribute relative of soft-set for partition attribute selection,” Knowledge Based System, vol. 52, pp. 11-20, 2013.

Z. Pawlak, “Hard and soft sets,” in RSKD '93 Proceedings of the International Workshop on Rough Sets and Knowledge Discovery: Rough Sets, Fuzzy Sets and Knowledge Discovery, 1993, pp. 130-135.

D. Pei, and D. Miao, “From soft sets to information systems,” in 2005 IEEE International Conference on Granular Computing, vol. 2, 2005, pp. 617-621.

Z. Pawlak, “Rough sets,” International Journal of Computer and Information Sciences, vol. 11, no. 5, pp. 342-356, 1982.

Z. Pawlak, Rough Sets: Theoretical Aspect of Reasoning about Data. Netherlands: Springer, 1991.

Y. Y. Yao, and N. Zhong, “Granular computing using information tables,” in Data Mining, Rough Sets and Granular Computing, T. Y. Lin, Y. Y. Yao, and L. A. Zadeh, Eds. Heidelberg: Physica, 2002, pp. 102-124.

Z. Pawlak, “Rough set approach to knowledge-based decision support,” European Journal of Operational Research, vol. 99, no. 1, pp. 48-57. 1997.

Z. Pawlak, and A. Skowron, “Rudiments of rough set,” Information Sciences, vol. 177, no. 1, pp. 3-27, 2002.

T. Herawan, and M. M. Deris, “On multi-soft sets construction in information systems,” in Emerging Intelligent Computing Technology and Applications with Aspects of Artificial Intelligence, D.-S. Huang, K.-H. Jo, H.-H. Lee, H.-J. Kang, and V. Bevilacqua, Eds. Berlin, Heidelberg: Springer, 2009, pp. 101-110.

S. C. Sripada, and S. M. Rao, “Comparison of purity and entropy of kmeans clustering and c-means clustering,” Indian Journal of Computer Science and Engineering, vol. 2, no. 3, pp. 343-346. 2011.

C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379-423 and 623-659, 1948.

H. Xiong, M. Steinbach, P. N. Tan, and V. Kumar, “HICAP: Hierachical clustering with pattern preservation,” in Proceeding of the 4th SIAM International Conference on Data Mining, 2004, pp. 279-290.

Y. Zhao, G. Karyis, and U. Fayyad, “Hierachical clustering algorithms for document datasets,” Data Mining and Knowledge Discovery, vol. 10, no. 2, pp. 141-168, 2005.

H. Kim, and H. Park, “Sparse non-negative matrix factorization via alternating non-negative constrained least squares for microarray data analysis,” Bioinformatics, vol. 23, no. 12, pp. 1495-1502, 2007.

L. Hubert, and P. Arabie, “Comparing partitions,” Journal of Classification, vol. 2, no. 1, pp. 193-218, 1985.

W. M. Rand, “Objective criteria for the evaluation of clustering methods,” Journal of American Statistical Association, vol. 66, no. 36, pp. 846-850, 1971.

K. Y. Yeung, and W. L. Ruzzo, “Principal components analysis for clustering gene expression data,” Bioinformatics, vol. 17, no. 9, pp. 763-774, 2001.

M. J. Santos, and M. Embrechts, “On the use of the adjusted rand index as a metric for evaluating supervised classification,” Artificial Neural Networks – ICANN 2009, C. Alippi, M. Polycarpou, C. Panayiotou, and G. Ellinas, Eds. Berlin, Heidelberg: Springer, vol. 5769, 2009, pp.175- 184.

C. J. V. Rijssbergen, “Foundation of evaluation,” Journal of Documentation, vol. 30, no. 4, pp. 365-373, 1974.

R. Marxer, P. Holonowicz, P. Purwins, and A. H. Hazan, “An fmeasure for evaluating of unsupervised clustering with non-determined number of clusters,” Technical Report, UniversitatPempeuFabra, 2008.

T. Herawan, M. M. Deris, and J. H. Abawajy, “A rough set approach for selecting clustering attribute,” Knowledge Based System, vol. 23, no. 3, pp. 220-231, 2010.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

ISSN: 2180-1843

eISSN: 2289-8131