IDEAL 2009

Web Feed Clustering and Tagging Aggregator Using Topological Tree-Based Self-Organizing Maps

By Richard Freeman

Abstract

With the rapid and dramatic increase in web feeds published by different publishers, providers or websites via Really Simple Syndication (RSS) and Atom, users cannot be expected to scan, select and consume all the content manually. This is leading to an information overload for consumers as the amount of content increases. With this growth there is a need to make the content more accessible and allow it to be efficiently searched and explored. This can be partially achieved by structuring and organising the content dynamically into topics or categories. Typical approaches make use of categorisation or clustering, however these approaches have a number of limitations such as the inability to represent the connections between topics and being heavy dependent on fixed parameters.

In this paper we apply the topological tree method, to dynamically identify categories, on financial and business news feed dataset. The topological tree method is used to automatically organise an aggregation of the financial news feeds into self-discovered topics and allows a drill down into sub-topics. The news feeds, organised using the topological tree method, are discussed against those of typical web aggregators. A discussion is made on the criterions of representing news feeds, and the advantages of presenting underlying topics and providing a clear view of the connections between news topics. The topological tree has been found to be a superior representation, and well suited for organising financial news content and could be applied to categorise and filter news more efficiently for market abuse detection.

Keywords

Feed aggregator, document clustering, RSS, Atom, blog, news feed, weblogs, web clustering, self-organizing maps, topological tree, neural networks, post retrieval clustering, taxonomy generation, enterprise content management, enterprise search, information management, topic maps, enterprise search, information management.

Bibliographic Details

@inproceedings{freemanIdeal09,
   Author = {Freeman, Richard T.},
   Title = {Web Feed Clustering and Tagging Aggregator Using Topological Tree-Based Self-Organizing Maps},
   BookTitle = {Intelligent Data Engineering and Automated Learning-IDEAL 2009. 
   Tenth International Conference, 23-26 September},
   Series = {Lecture Notes in Computer Science Vol.5788},
   Address= {Burgos, Spain},
   Publisher = {Springer-Verlag},
   Pages = {368-375},
   Year = {2009}
}