Tree view self-organisation of web content
By Richard Freeman, Hujun Yin
Abstract
When browsing a large set of unstructured documents, it is advantageous if the documents have been organised and presented in a way that makes navigation efficient, understanding underlying concepts easy and locating related information quickly. This paper proposes a new method termed Treeview self-organising maps (Treeview SOMs) for clustering and organising text documents by means of a series of independently and automatically created, hierarchical one-dimensional SOMs. The method generates a topological taxonomy tree for a set of unstructured text documents in terms of presentation and visualisation. The documents are organised in a hierarchy of dynamically generated and automatically validated topics extracted from the corpus of the documents. The results presented in a labelled tree view, clearly show underlying contents of the documents and can help browsing the document set more efficiently than those of previous work using SOMs or hierarchical clustering methods. A brief overview on general document clustering and a review on SOM-based document analysis methods are also provided together with a comparison among them.
Keywords
Self-organising maps; Information retrieval; Browsing facility; Document clustering; Knowledge organisation
Bibliographic Details
@article{freemanNeuro05,
Author = {Freeman, Richard and Yin, Hujun},
Title = {Tree view self-organisation of web content},
Journal = {Neurocomputing},
Volume = {63},
Pages = {415-446},
Year = {2005} }
}