Microsoft's ancient circa-1997 Indexing Service gets no respect. And that's a shame, because it's a surprisingly decent content indexing engine that supports arbitrary metadata. Sure, there may be better choices, but Indexing Service's saving grace is that it's completely free. It's a default component of Windows 2000, Windows XP Pro, and Windows 2003 Server. And I'll show you how you can programmatically query it from .NET, too.
I have often wondered (and not yet found time to find out) whether the data gathered by the indexing service remains secure or whether it can be used to gain insight into documents you would otherwise have no access to.
One of the reasons why I never got into using the context menu properties dialog custom fields for documents in Windows Explorer is that the MRU list seems to be shared between all user accounts, so comments placed against files can be read by anyone regardless of permissions just by opening a combo box list (at least this was the case when I last tried it).* This then triggered the suspicion that perhaps the indexing service also needed some testing effort before trusting it to make sure that it didn’t have the same problem, at which point it seemed too much bother. A good example of the sort of obvious vital questions that documentation typically never covers.
*The other reason was that many applications recreate their documents files when editing so anything you put in there has only the odd chance of being retained over time.
I did some performance testing of Index Server using the BBS and MAGAZINE archives of http://www.textfiles.com . That’s 16,164 textfiles in 358 folders (408 megabytes total). The catalog.wci index folder was 123 megabytes. With that corpus, I got query times of…
“bbs”, 7262 rows, ~400 ms
"phreak", 848 rows, ~50 ms
"hack", 2602 rows, ~135 ms
"apple", 1910 rows, ~100 ms
"atwood", 16 rows, ~5 ms
The number of results returned is obviously critical to Index Server performance, which makes it even more of a shame that I couldn’t get paging to work. Luckily the MaxRecords property works fine to restrict the total # of results.
I have an entirely dumb question here, which is how this compares with various desktop-search utilities (for example, the one from that search company with the big G, little o.)
Great post! I have another dumb question, how would you apply this to index a database drive ASP.NET site apart from having the site generate static html files?
how this compares with various desktop-search utilities
I am totally unclear how Index Server cooperates with the standard Windows Search, if at all. I’m not sure it does. There’s also that Advanced button in the file and folder properties dialog which contains a “For fast searching, allow Indexing Service to index this folder” checkbox.
In general, as a standard desktop search, it’s pretty weak. But as a basic indexing solution for programmers, it’s not bad.
apart from having the site generate static html files?
You got it. Indexing Service can only index files in the filesystem. There would have to be some kind of background or batch process that periodically generated HTML files that represent your database to a series of folders. With the META tags properties, this is totally feasible. I have it on good authority that http://www.drugstore.com still indexes its site this way, for example.
I’m sure it’s possible to come up with a fancier solution, but hey, this one is free!
Very nice article. I have been wondering if there was a way to index and display MetaData. This is probably the best information on the Indexing Service that I have found from around the internet, especially related to .NET integration.
There are free solutions out there that are a lot easier to deploy. Yes, this search engine ships as part of windows, but it is a huge pain to configure, it is opaque and generally confusing.
Have a question, which is cifferent from what you have posted here., but thought you guys can probably answer my question.
How do i change the default location of the catalog.wci directory. I have a web server that has c and d drive and the c is getting filled up. catalog.wci file is ~1.3Gb and is growing taking up the space on the cdrive. i want to change the path to d drive … like d:\inetpub from its default location c:\inetpub. i have researched and found that the locatiion is set in the registry set: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex
and specifically the key "ISAPIDefaultCatalogDirectory"
but when i look it up in the registry of the machine in question it just shows "Web"
Any idea how I can change this?
are there are implications to this change?
The catalog has hooks in the filesystem. It should update more or less automatically as files are added, deleted, or changed in the underlying filesystem.
So the catalog should always be very close to updated with no interaction at all. However, if you copy 50,000 files into a folder in the catalog, it might take a little while to get up to date…
It works programmabily but i cannot make it work with those .idq files whatever i try.
where should those be located?
i put one idq in a folder and put the path in registry where it was suggested and nothing. i cannot query the parameter.
i repeat. through c# i can do that.
What i really need is to be able to programmabily add catalogs (i know how to do that), remove catalogs (i know how to do that too), add properties (i know how to do that too) and make them cached (HOW IS THAT IN C#?). Please help me.
Does anybody know how to display the frase around the hit in the page results?
I don’t wont to highlight the hits on the document, but to show the title and instead of the abstract, the frase around the hit.
But I have performance problem when I have to load about 2000 results. The delay that I get is about 20 seconds and the problem seems to be in that row:
Hi, I keep having difficulties implementing this on our Windows 2003 SBS server
Catalogs are being built from filesystem; not from webpages…
Would there be a standard set of aspx pages (or other) that I can upload and with some changes (pointing to the corret catalog) would actually work. I have tried dozens none of them I get a result on the browser.
When i query through the default indexing service interface, normal results appear…