Online edition of India's National Newspaper
Monday, Apr 22, 2002

About Us
Contact Us
Business
News: Front Page | National | Southern States | Other States | International | Opinion | Business | Sport | Miscellaneous |
Advts:
Classifieds | Employment | Obituary |

Business

News summarising services

THE NET, with its hundreds of news sites, is a rich browsing source for obtaining the latest news articles/analysis. This week NetSpeak scans a couple of services that automatically collect and summarise news stories from multiple Net sources.

NewsInEssence (NIE): This is a multi-source summarisation service developed by a few researchers at the University of Michigan — http://www.umich.edu/clair. The service collects and summarises news on a specified topic available at different online news sites. The service automatically provides all the articles on a subject, along with a summary. This way it facilitates access multiple articles on a subject.

Besides presenting its own cluster of related articles on subjects specified by the service, NIE lets you build your own cluster of articles on subjects of your choice. The cluster can be created either by providing an input string or by feeding the URL of the news stories. That is, if you come across an interesting article on a subject, say `web services', and want to find out the links to similar articles, use the URL of the article as a `seed' to create your cluster. Or if you want to get the links to the latest write-ups on `distributed computing', type the string in the `Query Words' box and start the cluster building process. Before starting the process, you can also adjust cluster parameters like article sources, the strength of articles' similarity and the like. Once the cluster process is initiated, the service's news robot, `NewsTroll', gets activated and the web Window with the message `Running NewsTroll' becomes visible. The robot will scan all the sources that NIE understands (at present the news sites the service supports include BBC, CNN, MSNBC, USA Today and Yahoo). Within a few minutes you will be presented with the results table from which you can choose to either read the full version of the related articles or obtain the summary of all the documents. By providing your e-mail address, you can also get the results e-mailed to you. Check out the link at: http://www.newsinessence.com.

Newsblaster: The service, Newsblaster, featured by Columbia University is another attempt to automate the news aggregation, summarisation and delivery process. The service classifies news articles into such categories like the `U.S', `World,' Finance, and Science/Technology. Newsblaster scans on-line news sources like ABC News, CNN, Wired, Washington Post, USA Today and the like, locates the articles related to a subject and produces a summary of the content of these pieces. Apart from presenting the summary, the service displays all the article links from which it generated the summary. If interested, visit the link at: http://www.cs.columbia.edu

/nlp/newsblaster/.

Though the summaries generated by these services may not always be very accurate, one can certainly use them to at least to gain some knowledge of the topics.

PDF to text conversion

A few weeks ago, this column featured a couple of programs and services that convert a document file into a PDF file. A number of readers have asked if there is a tool that does the opposite: converts a PDF file to a Word or HTML file. A couple of products that do this are discussed here.

Gemini Solo

Gemini solo (http://www.iceni.com/soloSet.html) enables you to convert PDF documents into an array of popular formats that include RTF, HTML, JPEG, Tiff and plain text. If you want to try this software, download the demo version from the site. After installing load the PDF file that needs to be converted. As the program has its own PDF viewer, it reads the PDF file and displays it. To convert the PDF document, click at the `Export' option, select the number of pages to be included in the new document and access the `Options' button to input such details as the `Text Output' format and the 'Image Output' format. For example, if you want the PDF file to be converted to a Word file, select the `RTF' option in the `Text Output' format box. After feeding the required values, click at the OK button and provide an output file name. As mentioned in the program, the output from the demo version will not be completely accurate and may exhibit errors.

Advanced PDF to HTML converter: This is a reasonably good software that converts a PDF document into HTML pages. If you are using the evaluation version, the software adds the message `evaluation version' in some of the HTML pages The program can be downloaded from the link at: http://www.intrapdf.com (from the `Get trial package' option).

The above programs are by no means perfect. Readers are invited to send information about better products.

Teoma: A new search engine

Teoma — available at: http://www.teoma.com/ — is a new search engine which claims that it uses better technology than the currently popular search service, Google. Temoa uses what it calls `subject-specific' popularity to rank a page (http://static.wc.teoma.com/docs/teoma/about/searchWithAuthority.html) and the company claims that this technology helps it provide more relevant links. In every search attempt, the service provides three types of research results: `Results' contains web pages relevant to the search, `Refine' contains suggestions to narrow down the search and `Resources' contains sites created by experts on the search topic that has been specified.

J. Murali

(The author can be contacted at:

murali27@satyam.net.in)

Send this article to Friends by E-Mail

Business

News: Front Page | National | Southern States | Other States | International | Opinion | Business | Sport | Miscellaneous |
Advts:
Classifieds | Employment | Obituary |


The Hindu Group: Home | About Us | Copyright | Archives | Contacts | Subscription
Group Sites: The Hindu | Business Line | The Sportstar | Frontline | Home |

Copyright © 2002, The Hindu. Republication or redissemination of the contents of this screen are expressly prohibited without the written consent of The Hindu