Getting Started with Indexing Service

Microsoft's ancient circa-1997 Indexing Service gets no respect. And that's a shame, because it's a surprisingly decent content indexing engine that supports arbitrary metadata. Sure, there may be better choices, but Indexing Service's saving grace is that it's completely free. It's a default component of Windows 2000, Windows XP Pro, and Windows 2003 Server. And I'll show you how you can programmatically query it from .NET, too.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2005/12/getting-started-with-indexing-service.html

I have often wondered (and not yet found time to find out) whether the data gathered by the indexing service remains secure or whether it can be used to gain insight into documents you would otherwise have no access to.

One of the reasons why I never got into using the context menu properties dialog custom fields for documents in Windows Explorer is that the MRU list seems to be shared between all user accounts, so comments placed against files can be read by anyone regardless of permissions just by opening a combo box list (at least this was the case when I last tried it).* This then triggered the suspicion that perhaps the indexing service also needed some testing effort before trusting it to make sure that it didn’t have the same problem, at which point it seemed too much bother. A good example of the sort of obvious vital questions that documentation typically never covers.

*The other reason was that many applications recreate their documents files when editing so anything you put in there has only the odd chance of being retained over time.

A few notes.

  1. A shout out to my homey David Truxall, who had the only decent (aka not Server.CreateObject) .NET code sample for querying Index Server:

http://www.dotnetjunkies.com/WebLog/davetrux/archive/2004/03/03/8345.aspx

  1. I did some performance testing of Index Server using the BBS and MAGAZINE archives of http://www.textfiles.com . That’s 16,164 textfiles in 358 folders (408 megabytes total). The catalog.wci index folder was 123 megabytes. With that corpus, I got query times of…

“bbs”, 7262 rows, ~400 ms
"phreak", 848 rows, ~50 ms
"hack", 2602 rows, ~135 ms
"apple", 1910 rows, ~100 ms
"atwood", 16 rows, ~5 ms

The number of results returned is obviously critical to Index Server performance, which makes it even more of a shame that I couldn’t get paging to work. Luckily the MaxRecords property works fine to restrict the total # of results.

I have an entirely dumb question here, which is how this compares with various desktop-search utilities (for example, the one from that search company with the big G, little o.)

Great post! I have another dumb question, how would you apply this to index a database drive ASP.NET site apart from having the site generate static html files?

how this compares with various desktop-search utilities

I am totally unclear how Index Server cooperates with the standard Windows Search, if at all. I’m not sure it does. There’s also that Advanced button in the file and folder properties dialog which contains a “For fast searching, allow Indexing Service to index this folder” checkbox.

In general, as a standard desktop search, it’s pretty weak. But as a basic indexing solution for programmers, it’s not bad.

apart from having the site generate static html files?

You got it. Indexing Service can only index files in the filesystem. There would have to be some kind of background or batch process that periodically generated HTML files that represent your database to a series of folders. With the META tags properties, this is totally feasible. I have it on good authority that http://www.drugstore.com still indexes its site this way, for example.

I’m sure it’s possible to come up with a fancier solution, but hey, this one is free!

but it is a huge pain to configure, it is opaque and generally confusing

Well, I’m not so sure about that. Anyone can set up a quick Indexing Service demo app within 10-15 minutes using what I just posted.

It’s kind of clunky, definitely, but it’s not so complicated if you have sample code and guidance (eg, this post).

Very nice article. I have been wondering if there was a way to index and display MetaData. This is probably the best information on the Indexing Service that I have found from around the internet, especially related to .NET integration.

Very nice job.

There are free solutions out there that are a lot easier to deploy. Yes, this search engine ships as part of windows, but it is a huge pain to configure, it is opaque and generally confusing.

If you need freetext search in a program, I’d be happier with something simple like SWISH-E
a href="http://swish-e.org/"http://swish-e.org//a for example

Have a question, which is cifferent from what you have posted here., but thought you guys can probably answer my question.
How do i change the default location of the catalog.wci directory. I have a web server that has c and d drive and the c is getting filled up. catalog.wci file is ~1.3Gb and is growing taking up the space on the cdrive. i want to change the path to d drive … like d:\inetpub from its default location c:\inetpub. i have researched and found that the locatiion is set in the registry set: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex

and specifically the key "ISAPIDefaultCatalogDirectory"
but when i look it up in the registry of the machine in question it just shows "Web"
Any idea how I can change this?
are there are implications to this change?

Thanks
Vijay

How do i change the default location of the catalog.wci directory

I would delete the current index and recreate it-- you select the physical location of the catalog in steps 1-3, above.

The catalog has hooks in the filesystem. It should update more or less automatically as files are added, deleted, or changed in the underlying filesystem.

So the catalog should always be very close to updated with no interaction at all. However, if you copy 50,000 files into a folder in the catalog, it might take a little while to get up to date…

Hi, when I delete a file it still shows as a result of a search for a few hours. Is there a way to trigger it on each delete??

Is there some way to set the catalog to incrementally update itself on some timeframe (eg at 2300 daily) ?

It works programmabily but i cannot make it work with those .idq files whatever i try.

where should those be located?

i put one idq in a folder and put the path in registry where it was suggested and nothing. i cannot query the parameter.

i repeat. through c# i can do that.

What i really need is to be able to programmabily add catalogs (i know how to do that), remove catalogs (i know how to do that too), add properties (i know how to do that too) and make them cached (HOW IS THAT IN C#?). Please help me.

thank you in advance.

Does anybody know how to display the frase around the hit in the page results?
I don’t wont to highlight the hits on the document, but to show the title and instead of the abstract, the frase around the hit.

Is there a way i can highlight the search words in the searched document?

First of all, this is a very usefull article! Thanks for sharing!

Karina on October 6, 2006 12:46 AM
Is there a way i can highlight the search words in the searched document?

Yes, there is : http://www.nsftools.com/misc/SearchAndHighlight.htm
works with javascript.

Hi, the article is really useful!

But I have performance problem when I have to load about 2000 results. The delay that I get is about 20 seconds and the problem seems to be in that row:

da.Fill(ds, o, “IndexServerResults”);

Can you advice me what to do with it?

Hi, I keep having difficulties implementing this on our Windows 2003 SBS server :frowning:
Catalogs are being built from filesystem; not from webpages…
Would there be a standard set of aspx pages (or other) that I can upload and with some changes (pointing to the corret catalog) would actually work. I have tried dozens none of them I get a result on the browser.
When i query through the default indexing service interface, normal results appear…

Much appreciated thanks in advance…
Regards,