
Bharat Kumar
WHEN you knock on the doors of Prabhakar Raghavan's residence, you almost think the young man who answers the bell is going to say, ``Do come in. I will call papa.'' You expect to see a grey-haired man, looking down his nose at you. Researchers who write books on ``Randomized Algorithms'' that address computational problems are supposed to look like that. But all that the young man - clad in T-shirt and shorts - said was, ``C'mon in. Good to see you.'' (Okay, we admit he looked formal for the photo.)
As Chief Technology Officer at Verity Inc, he is expected to visualise the way forward for the company which specialises in knowledge-retrieval. As head of marketing there, he is simultaneously expected to sell the concept within the company and sell the products outside.
Once you begin though, his passion with his work shows. The excitement grows when you broach his favourite: knowledge-retrieval in organisations. A day before his vacation ended and he was to return to the US, Raghavan spoke to eWorld. Excerpts from the chat:
What is Verity doing in the enterprise solutions space when search engines and directories are already pervasive on the Web?
Searching for information within an organisation is very different from searching on the Web. The Web has much of its information on HTML or dynamic HTML pages. Within organisations, that's not how it is. Wordstar is still a reality in many organisations. At a different geographic location, the same company might be using Word 97. How do you search for information in all these diverse sources? They could be diverse in terms of format, language or repository.
When two organisations merge, the complexity here is higher. Or if an accounts personnel shifts to marketing, there are certain documents that he can no longer access. How do you manage this? You can't show all documents to him and then say, ``Access Denied'' when he clicks on a summary of a document that he is not eligible to access. Then information is compromised. You will have to ensure that the user can't get to such documents by way of intelligent querying.
Verity helps pull out information from all these sources - at what is commonly called the enterprise portal layer - bring it to an even keel and then extract relevant information.
What if pages are generated dynamically based on a user request? You can't search in pages that do not exist. How do you tackle this?
Dynamic pages are a pain here. All you can do is look at a summary of contents - that typically exists for all clusters of information - and classify them accordingly.
How do you classify material?
A directory service such as Yahoo! typically uses humans to look at several thousands of pages a day and then classify them accordingly. You can also do this by Rules-based classification. This means that a user specifies rules once - and subsequently if he needs a change - and searches are automatically conducted accordingly. For instance, government intelligence agencies in the US use this method to get to information on their network. If an employee wants updated information on all the droughts in Kazakhsthan in the last five years, the same is sent to him by the system for every new update.
Then you have another method: by way of exemplary documents. This helps companies who don't want to key in such rules. They want the software to do it. Here, you feed in a brochure of a glass company if you want to cluster together all data relating glass and windows and panes. In a subsequent search, if Microsoft's Windows documents are thrown up, you intervene and indicate the change required. The software thus learns along the way.
Finally, you have thematic mapping. Here, what if a customer has not automated his information and has no clue to rules and all that? This method uses a crawler to go into documents and suggests concepts and rules. This is not fully automatic. This only complements human specification of rules and provides directions for use. For instance, if it finds a lot of information on Musharraf and Kashmir clubbed together, it clusters such documents together. Far away, in a computational sense, lies another cluster that has documents related to Kyoto and global-warming.
We have heard that Social networks are a pet theme for you. Can you elaborate?
The first step in knowledge-retrieval is the search. The second is content organisation that we have just discussed. The third is social networks. Here's how it works: If two people in a team regularly access network security-related documents, a software should typically suggest that a third person should follow a similar path. This is called collaborative filtering. If that third person regularly looks only at operating systems-related documents, then the software should - virtually - move him away from the network security cluster towards the operating systems cluster in a geometric space (this space is mathematically referred to as Tensor Space). His interests collide with another team that is geographically far away, but that is the team he is clustered with.
Social networks should also allow the pop-up of a window with contact details of experts in an organisation if someone is looking for relevant information.
In IBM for instance, we were asked to put down our skills in writing and register it. The objective was that then knowledge could be easily shared. But that doesn't work individually, people don't care about this. If software can automatically capture this knowledge....
How about return on investment on the use of your solutions? Have you been able to measure it?
Siemens is our customer. In a joint study with us, it found that it was saving $125,000 a day. Here the RoI was about two weeks. So, the measurement is in weeks rather than in months or years.
Is there some kind of re-engineering that your customers have to do before they successfully implement your product? Or...
A knowledge-retrieval solution cannot be a shrink-wrapped product. If someone asked me what I thought of a portal in a box, I would say it would fit a shrink-wrapped organisation. There is no such organisation. We deploy services staff and outside consultants to shape the product according to consumer needs.
Each of them is in a different state of preparedness for use of such solutions.
May we step back a bit and look at the user industry as a whole? As CTO at a small technology company what is your take, for instance, on e-commerce picking up?
Click-and-mortar is an important phenomenon now. In the early days, everyone thought the new players would rule the roost and that the old guard would give way. But that hasn't happened. Walmart is now one of the big players in the e-commerce business.
You were part of the IBM research team that discovered that information on the Web is structured in the form of a bow-tie. How did you arrive at the conclusion and how can a company benefit from this discovery?
Over a period of time, we sent out crawlers that automatically surf the Web and cull out information, to check the pages that were linked from one to the other. We looked for pages that you could start with and end with. We looked for pages that linked to another but where there was no reverse link. For instance, a company could link to the US Government site because the latter is a customer but the US Government does not need to link back to the company's site. Then we looked at the reverse links. Here, we looked at the numbers of pages that could link to one page as opposed to from that page.
This knowledge could be applied to organisations to see how people behave on the Web and compare it with how people behave behind a firewall. We are still finding out how this knowledge can help. For instance, you can actually cull out pages that have excellent information and which link to other pages but which other pages do not link to. Then you can bring this quality page closer to the cluster. This helps in content organisation.
Picture: Mr Prabhakar Raghavan
Please e-mail us at eworld@thehindu.co.in if you have queries on computer usage or if you find an interesting way of using the computer.