Neurocomputing 2005

Tree view self-organisation of web content

By Richard Freeman, Hujun Yin

Abstract

When browsing a large set of unstructured documents, it is advantageous if the documents have been organised and presented in a way that makes navigation efficient, understanding underlying concepts easy and locating related information quickly. This paper proposes a new method termed Treeview self-organising maps (Treeview SOMs) for clustering and organising text documents by means of a series of independently and automatically created, hierarchical one-dimensional SOMs. The method generates a topological taxonomy tree for a set of unstructured text documents in terms of presentation and visualisation. The documents are organised in a hierarchy of dynamically generated and automatically validated topics extracted from the corpus of the documents. The results presented in a labelled tree view, clearly show underlying contents of the documents and can help browsing the document set more efficiently than those of previous work using SOMs or hierarchical clustering methods. A brief overview on general document clustering and a review on SOM-based document analysis methods are also provided together with a comparison among them.

Keywords

Self-organising maps; Information retrieval; Browsing facility; Document clustering; Knowledge organisation

Bibliographic Details

@article{freemanNeuro05,
   Author = {Freeman, Richard and Yin, Hujun},
   Title = {Tree view self-organisation of web content},
   Journal = {Neurocomputing},
   Volume = {63},
   Pages = {415-446},
   Year = {2005} }
}