Neural Networks 2004

Adaptive topological tree structure for document organisation and visualisation

By Richard Freeman and Hujun Yin

Abstract

The self-organising map (SOM) is finding more and more applications in a wide range of fields, such as clustering, pattern recognition and visualisation. It has also been employed in knowledge management and information retrieval. We propose an alternative to existing 2-dimensional SOM based methods for document analysis. The method, termed Adaptive Topological Tree Structure (ATTS), generates a taxonomy of underlying topics from a set of unclassified, unstructured documents. The ATTS consists of a hierarchy of adaptive self-organising chains, each of which is validated independently using a proposed entropy-based Bayesian information criterion. A node meeting the expansion criterion spans a child chain, with reduced vocabulary and increased specialisation. The ATTS creates a topological tree of topics, which can be browsed like a content hierarchy and reflects the connections between related topics at each level. A review is also given on the existing neural network based methods for document clustering and organisation. Experimental results on real-world datasets using the proposed ATTS method are presented and compared with other approaches. The results demonstrate the advantages of the proposed validation criteria and the efficiency of the ATTS approach for document organisation, visualisation and search. It shows that the proposed methods not only improve the clustering results but also boost the retrieval.

Keywords

Self-organizing maps; Document clustering; Information retrieval; Growing network; Unsupervised learning; Text mining; Taxonomy generation;

Bibliographic Details

@article{freemnNNet04,
   Author = {Richard Freeman, and Hujun Yin},
   Title = {Adaptive topological tree structure for document organisation and visualisation},
   Journal = {Neural Networks},
   Volume = {17},
    Number = {8-9},
   Pages = {1255-1271},
   Year = {2004} }
}