[
Advertise | Submit Code | About us | Contact us | Link us
]
Go!
Membership Services
Login
Register

Home
C# General

General

C# Language

Design & Architecture

Algorithms

Database

Security

Active Directory

COM Interop

Remoting
C# Windows Forms

General

Combo and List boxes

Miscellaneous Controls

Button Controls

Edit Controls
Cutting Edge

ASP.NET 2.0

Visual Studio 2005

Windows Longhorn

SQL Server 2005
C# Multimedia and GDI+

General

DirectX

GDI+

Audio
Internet & Web

General

Images and multimedia

Database

Utilities

Security

ASP.NET Controls

Design and Architecture

Webservices
.NET

General

Design & Architecture

Algorithms

Database

Security

Active Directory

COM Interop

Remoting

ADO.NET

XML.NET

Tools

Enterprise

IDE
Visual Basic .NET

VB.NET General

VB.NET Controls
General Reading

.NET Books Review

Product Showcase

Book Chapters

Business Design & Strategy
Community

Discuss

Job Board

Discussion

CodeXchange
DeveloperLand

Advertise

Submit Code

About us

Contact us

Link us
Miscellaneous

Favorite Links

Downloads

Programming Sites

Top Stories
Regular Expressions

E-Mail

Date/Time
Home > .NET > Enterprise
Your free search engine – Microsoft Indexing Server
Posted by on Sunday, May 15, 2005 (EST)

Microsoft Indexing Server is a powerful indexing and search engine for your web or file search. This article explains how to set up Indexing Server and how to search its index from right within your application.

This article has been viewed: 16,730 times
Technology: Enterprise.

IndexingServerSample_VS2003.zip (59.47 KB)
IndexingServerSample_VS_2005_Beta2.zip (72.73 KB)

Contents

Introduction

Many web applications provide a search capability, which allows users to search all the content of a web application. This seems at first glance to be a difficult task. How can you search all the files in your web application and at the same provide robust search capabilities most users have come to take for granted? For example search for any combination of words with AND or OR combination, search for phrases or search for partial word matches. Building such a search engine is not a small task. There are many custom solutions out there which create their own search algorithms. They are either cheap but provide only limited search capabilities or they are expensive.

But there is no need to build your own solution or to buy an expensive one. Microsoft provides a solution to this problem - Microsoft Indexing Server. The Microsoft Indexing Server comes as part of Windows 2000, Windows XP and Windows 2003 and does not require any additional licensing. This article will explain the capabilities of the Indexing Server. It will also walk through the code how you can use the Indexing Server to provide search capabilities from right within your application. It is important to understand that the Indexing Server indexes files on the files system. It is not a web spider which can walk your web site and find all linked pages and index them. Web spiders are used by the search engines like Google, MSN, Yahoo, etc. to index web pages. But Indexing Server is able to be pointed to a local web site, get the physical path where the files for this web site are located and then index those files and also store the virtual path of those files. This way you can still index the files of your local web application and know the URL for each file.

Top Go to Table of Contents

How to install Microsoft Indexing Server?

Windows 2000 comes with MS Indexing Server 2.0 and Windows XP and Windows 2003 come with MS Indexing Server 3.0. This article will concentrate on MS Indexing Server 3.0 although almost everything applies to the previous version. MS Indexing Server is a separate Windows component which needs to be installed. Go to "Add or Remove Programs" in your Control Panel and select "Add/Remove Windows Components". Make sure that the component "Indexing Service" is installed. It is important to install this component after IIS has been installed. IIS has an Indexing Service extension, which will only get installed if the Indexing Service is installed after IIS. If this component does not show up, then uninstall the "Indexing Service" component and then reinstall it again. Afterwards you will see the "Indexing Service" extension also in IIS. More info to this issue can be found here [^] .

Top Go to Table of Contents

How to start the Indexing Service?

You can configure the Indexing Server through the "Computer Management". You find under the entry "Services and Applications" in the left side pane an entry called "Indexing Service". Right click on "Indexing Service" and select Start from the popup menu. First time you do this it will ask you also if you want to start the service when the computer starts up. Answer with yes, which will set the "Indexing Service" to automatic startup. Through here you can also stop, pause and resume the Indexing Service. When you expand the "Indexing Service" entry with the plus sign then you can see a list of catalogs. A catalog is a group of folders which gets indexed for search. The Indexing Server comes out of the box with two catalogs but you can create custom catalogs as needed.

Top Go to Table of Contents

How is the System catalog used?

The System catalog is used by the Windows file search function. Windows file search is opened when you select "Search" from your windows start menu or open up the file explorer and click on the "Search" button. If the Indexing Server is not running or if there is no System catalog then this will search the actual file system. But it will utilize the System catalog when available and when the Indexing Service is running. You can also configure the Search function to utilize the System catalog or not. At the bottom of the search pane (in the file explorer) select the option "Change Option". Select the option "With Indexing Service" and then select the option "Yes, enable Indexing Service". You can the same way turn the usage of the "Indexing Service" off. Utilizing the Indexing Service will make searching faster as all the files have been already indexed and the system just needs to search the index instead of the file system itself.

Expand the "System" catalog with the plus sign to find out more details about it. Select the Directories entry to see all the folders which are included or excluded in this catalog. By default this includes the folders "c:\" and "c:\Documents and Settings" but it excludes the folders "c:\Documents and Settings\*\Application Data\*" and "c:\Documents and Settings\*\Local Settings\*". This means it will exclude the "Application Data" folder with all its subfolders as well as the "Local Settings" folder and all it sub folders for any user profile. You can double click each folder and change it to include or exclude. You can also add new folders by right clicking on the "Directories" entry in the left side pane and then selecting "New | Directory" from the popup menu. You can enter a local folder or a UNC path to any network folder. For network folders you also need to enter the username and password to use.

Top Go to Table of Contents

How is the Web catalog used?

The Web catalog is only available when IIS has been installed. It by default points to the "Default Web Site" of the local IIS instance. Right click on the "Web" entry in the left side pane and select Properties from the popup menu. Select the Tracking tab to see which web site this catalog is pointing too. By default this is the "Default Web Site". You can select any existing web site of your local IIS instance. The Indexing Service will find the physical path of this web site and add it to the directories to be indexed. It will also look for all virtual folders configured under this web site and again add their physical paths to this catalog.

The "Indexing Service" extension ties IIS right into the Indexing Server. In the IIS Manager open up the properties dialog box of the web site or any virtual folder configured under this web site. Under the "Home Directory" or "Virtual Directory" tab you see the option "Index this resource". Unselecting this option and saving the settings will remove this physical path from the appropriate catalog in the Indexing Server (this requires that the catalog is started and may take a bit till it makes that change). Selecting this option will automatically add the physical folder again for the appropriate catalog. This allows you to control right from within the IIS Manager which folders will be indexed or not. You also have the "Index this resource" option for all file system folders shown under a web site or virtual folder. But I have not seen that this option makes any difference for file folders shown in the IIS Manager.

For this to work on IIS 6 which comes with Windows XP and Windows 2003 you need to make sure that the "Indexing Service" extension is running. Open the IIS Manager and open the "Web Service Extension" item in the left side pane. It shows on the right side all extensions and by default the "Indexing Service" is prohibited. Select it and enable it through the "Allow" button. Also, make sure that IIS has been installed before the Indexing Service as explained earlier.

Top Go to Table of Contents

What other administrative options are available?

You can create new catalogs, by right clicking on the "Indexing Service" entry in the left side pane and selecting "New | Catalog" from the popup menu. Enter the name of the catalog and the folder where the catalog files are stored. You then need to stop and restart the Indexing Service itself in order for the catalog files for this new catalog to be created. You can also stop, pause and start individual catalogs by right clicking on the catalog name and selecting the appropriate option under "All Tasks" in the popup menu. You can add or remove folders to be included through the "Directories" entry under the catalog name. If you want to index a web site then open up the properties of the catalog (right click on the catalog name and select Properties from the popup menu) and under the "Tracking" tab select the web site to index.

The Indexing Service can also create an abstract for the indexed files. Select the "Generation" tab of the properties dialog and uncheck the option "Inherit above settings from Service". Then select the option "Generate abstracts" and enter the maximum length of the abstract. You can also set this through the properties of the Indexing Service itself, which then applies to any catalog which inherits the settings from the Indexing Service.

Windows also gives administrators control over which folders or files can be indexed by the Indexing Server. This allows to protect sensitive files so that they never get included in an index and therefore will never show up in a search result. Good example would be any financial details about the company. Open the file explorer and navigate to the appropriate folder or file. Bring up the properties of the folder or file, click on the Advanced button and then uncheck or check the option "For fast searching, allow Indexing Service to index this file" and then save the settings. If you selected a folder then it will ask you if this setting should be applied to all subfolders or just the selected folder itself. For example the actual catalog files itself have all unchecked that setting so that Indexing Server will never try to index its own catalog files.

You can also query the catalog through the Computer Management console. Expand a catalog with the plus sign and you will see an entry called "Query the Catalog". This brings up a web page with a simple query form. You can perform simple or advanced searches. Select the option "Standard query (free text)", type in a search term and then click the Search button. This will perform a simple search and display any matches below the search form. Select the "Advanced query" and then type in a complex search term which can include operators like AND and OR. We will look at the actual query language used by Indexing Server later in this article. This is a convenient way for administrators to test the actual catalog.

Indexing Server is very easy to use and very powerful. The only annoying things for a production usage are the fact that you need to restart the Indexing Service after creating a new catalog as well as stopping the Indexing Service before deleting a catalog. Sure, this are not every day tasks but when used with many catalogs - potentially for many customers in a hosted environment - this is a bit annoying.

Top Go to Table of Contents

How can I query an Indexing Server catalog from within my code?

You can query an Indexing Server catalog through the standard OLEDB data provider. The connection string tells the OLEDB data provider which provider is used, which in this case is the MSIDXS (Microsoft Indexing Server) provider. You can query a local Indexing Server catalog or a catalog on a remote Indexing Server. When querying a local Indexing Server catalog you also specify the data source in the connection string. Use the name of the local catalog as the data source, for example:

string ConnectionString = "Provider=MSIDXS; Data Source=\"Web\";";
const string QueryString = "SELECT * FROM WEBINFO;";

You donbt specify a data source when you query a catalog on a remote Indexing Server. So you just specify the provider. The name of the remote Indexing Server and the catalog are then specified in the query itself. Here is an example:

const string ConnectionString = "Provider=MSIDXS;";
const string QueryString = "SELECT * FROM EnterpriseMinds.Web..WEBINFO;";

The following sections will cover the query language in more detail. But you can see that the FROM clause has changed. First you specify the name of the remote machine followed by a dot, then the Indexing Server catalog you want to query on that remote machine followed by two dots and finally the actual view you want to query. The following code snippet shows how to connect to the OLEDB data source, execute the query and return a data reader with the result-set.

public static IDataReader Query(string ConnectionString, string QueryString)
{
 // get a OLEDB connection object and set the connection string
 OleDbConnection Connection = new OleDbConnection();
 Connection.ConnectionString = ConnectionString;

 // set the query string to execute
 IDbCommand Command = Connection.CreateCommand();
 Command.CommandText = QueryString;
 Command.CommandType = CommandType.Text;

 // open the data connection
 Connection.Open();

 // execute the query and return the data reader; when it gets closed it
 // also closes the connection object
 return Command.ExecuteReader();
}

The caller needs to provide the connection string and the query string. First we create an OLEDB connection object and set the connection string on it. Next we create an OLEDB command object and set the query string. Finally we open the OLEDB connection, execute the command and return an OLEDB data reader. When the caller closes the OLEDB data reader it will also close the underlying OLEDB connection. In the next code snippet we utilize the above method to provide a simplified LocalQuery method. The caller passes along the catalog name and query string and does not need to know the details of the connection string:

// - the OLEDB data provider to use for searching the Indexing Server is MSIDXS;
// - when we connect to a specific indexing catalog then you specify the data
// source part of the connection string and set it to the indexing catalog name
const string ProviderConnectionString = "Provider=MSIDXS;";
const string DataSourceConnectionString = " Data Source=\"{0}\";";

public static IDataReader LocalQuery(string CatalogName, string QueryString)
{
 string ConnectionString = ProviderConnectionString + String.Format(DataSourceConnectionString, CatalogName);

 // perform the query and return result
 return Query(ConnectionString, QueryString);
}

The method assembles the connection string, which also includes the catalog name as data source and then calls the Query method passing along the connection string and the query string. The next code snipped uses again the above Query method to create a RemoteQuery method. It allows to query any catalog on any remote machine available. It again hides the details of the connection string and how to modify the query string itself:

// - the OLEDB data provider to use for searching the Indexing Server is MSIDXS;
const string ProviderConnectionString = "Provider=MSIDXS;";

// form clause and dot-notation character
const string FromClause = " FROM ";
const string DotNotation = ".";

public static IDataReader RemoteQuery(string RemoteMachineName, 
 string CatalogName,
 string QueryString)
{

 // replace the FORM clause with a FROM remote_machine.catalog_name..from_clause,
 // for example FROM SCOPE() against the remote machine enterpriseminds and the
 // catalog WEB becomes FROM enterpriseminds.web..SCOPE(); this allows to query
 // remote indexing catalogs
 QueryString = QueryString.Replace(FromClause, 
 FromClause + 
 RemoteMachineName +
 DotNotation + 
 CatalogName + 
 DotNotation + 
 DotNotation);

 // perform the query and return result
 return Query(ProviderConnectionString, QueryString);
}

The method searches for the FROM clause in the query string and then replaces it with FROM followed by the remote machine name, a dot, the catalog name and two dots. It uses a connection string without the data provider. Finally it calls again the Query method passing along the connection string and the modified query so it can query the remote Indexing Server catalog. As you can see it is very easy to query local and remote catalog using the OLEDB data provider.

Top Go to Table of Contents

Which query language is used by the OLEDB Indexing Server provider?

The MSIDXS provider supports the SQL query language which is well known and makes it very easy to query the Indexing Server. It only allows to query, so does not support any updates, inserts or deletes. The supported query syntax is slightly adapted for this provider. You can find detailed documentation here [^]. The biggest difference is in the FROM clause. You can specify an existing view or the SCOPE() function. The Indexing Serve comes with the following pre-defined views [^] . The referenced MSDN help page lists which fields are included in which view. We have for example used in the above query string the WEBINFO view. You are allowed to use a "SELECT * FROM view", which you are not allowed to do when using the SCOPE() function. You can also create your own views by using the CREATE VIEW command. Here is the CREATE VIEW command used to define the WEBINFO view:

CREATE VIEW WEBINFO AS
 SELECT Vpath, path, FileName, size, write, attrib, Characterization, 
DocTitle
 FROM SCOPE()

Instead of a view you can use the SCOPE() function which defines the scope of the select. The SCOPE() function supports the following arguments:

DEEP TRAVERSAL OF - This searches the paths specified and all the folders beneath it. For example DEEP TRAVERSAL OF "/" searches the root folder and all the files and folders underneath it. This quite actually is the default scope used when you only specify SCOPE(), which means everything of the web site is included in the search.

SHALLOW TRAVERSAL OF - This searches the specified paths only, meaning it excludes any subfolder there might be. For example SHALLOW TRAVERSAL OF "/" searches only the root folder of the web site but none of the sub-folders.

You can list for both DEEP TRAVERSAL OF and SHALLOW TRAVERSAL OF as many paths as you want. You can also include each command multiple times. Here are a few examples to look at:

FROM SCOPE(' SHALLOW TRAVERSAL OF ("/", "/Help") ') - This includes only the root and Help folder in the search. None of the sub-folders are included.

FROM SCOPE(' "/Info", "/Help" ') - The argument DEEP_TRAVERSAL OF is the default argument. This example includes the Info and Help folders and all their sub-folders.

FROM SCOPE(' "/Info" ', ' SHALLOW TRAVERSAL OF "/Help" ') - This example performs a DEEP TRAVERSAL on the Info folder, which means it includes all sub-folders and only includes the Help folder without its sub-folders.

The WHERE clause supports the same filtering as the standard SQL language. Please refer to this article [^] for a complete description. The LIKE operator can be used to perform pattern matching with wildcard characters. The MATCHES operator can be used to perform pattern matching with regular expressions. The FREETEXT operator performs a best matching of words and phrases. And the CONTAINS operator performs text matching including proximity searches and stemming. Here are a few examples:

WHERE FileName LIKE '%web%' - Returns all items where the file name contains the word web.

WHERE FileName LIKE '%config' - Returns all the items where the file name ends with the word config.

WHERE CONTAINS(FileName, ' "web" OR "default" ') - Returns all the items where the file name includes the word web or default.

WHERE FREETEXT(FileName, ' default web ') - Returns again all the items where the file name includes the word web or default.

WHERE MATCHES(Contents, ' |(product|)|{2,3|} ') - Returns all the items where the contents includes the word product two to three times.

This allows to build some very complex queries utilizing the standard SQL language. Building your own search engine which supports all this would be a mayor undertaking. The following article [^] lists and explains all the fields you can utilize in your SELECT statements.

Top Go to Table of Contents

Which files is Indexing Server capable of indexing?

We have seen so far how to set up Indexing Server to index files, folders and web sites. We also covered how we can connect to Indexing Server catalogs and programmatically search them using the OLEDB data provider and the SQL query language. What we have not yet covered is what types of content does Indexing Server search for you and how can you extend this. The Indexing Server extracts the content of files using so called filters. Filters are components which implement the IFilter interface and do understand how to read the content of a file. The content of an Html file is different then that of a RTF file, Microsoft Office file or a plain text file. Therefore there are different filters available for these different file types. The "MSN Desktop Search" utilizes the same filter plug-ins. The following article [^] lists which filter is able to read which file types. It also lists a number of additional available filters and from where you can download them. Keep in mind that the article talks about the MSN Desktop Search, but all these filters apply also to the Indexing Server.

All registry settings for the Indexing Server can be found at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\ContentIndex. You see for example a sub key called Catalogs which lists all the catalogs defined on this Indexing Server. For each catalog you can find also all the settings. Under the ContentIndex key you find a number of settings for the Indexing Service, one of them is called DLLsToRegister. It lists all the DLLbs Indexing Server will register when starting up. This also includes also all the filters used by Indexing Server. Comparing this with the article which lists all the available filters you see that out-of-the-box the following filters are installed:

  • query.dll - Filters files with the TXT, ASM, BAT, C, CPP, CXX, CMD, DEF, DIC, H, HPP and XML extensions. These are all read as plain text files.
  • nlhtml.dll - Filters files with the ASCX, ASP, ASPX, CSS, HHC, HTA, HTM, HTML, HHT, HTW, HTX, ODC and STM extension. These are all files which contain or render HTML content.
  • offfile.dll - Filters files with the DOC, DOT, POT, PPS, PPT, XLB, XLC, XLS and XLT extension. All these files are MS Office files.
  • mimefilt.dll - Filters files with EML extension, which are MIME content.
  • mspfilt.dll - Filters files with the TIFF extension. This filter gets installed by MS Office 2003.

You can find a complete description of all the Indexing Server registry keys here [^]. If you are interested to write your own filter for your custom file types, then follow this link [^]. It provides a complete description of the IFilter interface you need to implement.

Top Go to Table of Contents

Summary

Microsoft Indexing Server is a powerful indexing and search engine for your web or file search. The OLEDB data provider in conjunction with the SQL query language makes it very easy to query the index created by the Indexing Server. You can provide with no additional effort a simple text search as well as a powerful search with partial word, word and phrase matching. This includes stemming as well as language sensitive searching. This means there is no additional effort on your side, regardless if you index and search English or Asian content. The filter framework provides the ability to extend the Indexing Server to search any custom content you might have. It is also very nice that you can use the same filters for the MSN Desktop Search engine.

The attached sample application allows to search any local or remote Indexing Server catalog. You can enter the SQL query string to execute or select from a set of default query strings. The result-set is shown in a list view. It also shows the number of returned matches. If you have comments on this article or this topic, please contact me @ klaus_salchner@hotmail.com [^]. I want to hear if you learned something new. Contact me if you have questions about this topic or article.

Top Go to Table of Contents

About Klaus Salchner

Klaus Salchner has worked for 14 years in the industry, nine years in Europe and another five years in North America. As a Senior Enterprise Architect with solid experience in enterprise software development, Klaus spends considerable time on performance, scalability, availability, maintainability, globalization/localization and security. The projects he has been involved in are used by more than a million users in 50 countries on three continents.

Klaus calls Vancouver, British Columbia his home at the moment. His next big goal is doing the New York marathon in 2005. Klaus is interested in guest speaking opportunities or as an author for .NET magazines or Web sites. He can be contacted at klaus_salchner@hotmail.com or http://www.enterprise-minds.com.

Enterprise application architecture and design consulting services are available. If you want to hear more about it contact me! Involve me in your projects and I will make a difference.

Click here if you want to know more about .

Other articles that may interest you

  • Write a Word Add-In – Part 0
  • Write a Word Add-In – Part I
  • Lengthy Operations on Single Thread in .NET Application
  • Learning Draughts
  • Exceptions and Performance
  • Average Rating :

    Discussion Forums
    Got a programming related question? Hopefully someone has the answer... Want to help out other developers? Visit our discussion forums.

    Sponsored by:

    New Articles

  • Exceptions and Performance
    Almost every time exceptions are mentioned in mailing lists and newsgroups, people say they're really expensive.Let's examine that claim, shall we?

  • Creating multilingual websites - Part 1
    Extend the existing globalization capabilities of .NET to create flexible and powerful multilingual web sites. First, create a custom ResourceManager, and then create custom localized-capable server controls to easily deploy multilingual functionality.

  • Parameter passing in C#
    Many people have become fairly confused about how parameters are passed in C#, particularly with regard to reference types. This page should help to clear up some of that confusion

  • Most Popular Articles

  • LDAP, IIS and WinNT Directory Services
    This article explains how to use .NET Directory Services to retrieve and search directory objects, create new directory objects and edit or delete existing directory objects. Describes Active Directory Application Mode (ADAM) and how to use the IIS, WinNT and LDAP directory (ADSI) provider.

  • An in-depth look at WMI and instrumentation, Part II
    WMI stands for Windows Management Instrumentation and, as the name indicates, is about managing your IT infrastructure this article is the second part of a two-part series.

  • An in-depth look at WMI and instrumentation, Part I
    WMI stands for Windows Management Instrumentation and, as the name indicates, is about managing your IT infrastructure this article provides an in-depth look at WMI and MOM 2005

  • New Books

  • Murach's ASP.NET 2.0 Upgrader's Guide: VB Edition
    What’s new and how to use it! That’s what this book delivers if you’re a VB developer who’s interested in upgrading from ASP.NET 1.x to ASP.NET 2.0.

  • C# in easy steps
    Learn to program with Microsoft’s premier programming language. No previous programming knowledge is assumed. With numerous easy-to-follow examples, this title explains the essentials of object-oriented programming with C#.

  • Murach's ASP.NET web programming with VB.NET
    Murach's ASP.NET web programming with VB.NET by Doug Lowe and Anne Prince is a in depth training and reference book for ASP.NET programming using VB.NET. The book builds upon Murach's previous books and covers more advanced concepts for programming ASP.NET pages.

  • Got Code?

    if you have any article , source code , or anything else you'd like to share with this community that you think others might find useful, please submit it here and we will gladly make it available on this site. submit@developerland.com.
    Partners

    All articles are copyrighted by their individual authors unless otherwise specified , everything else Copyright ©2004-2006 DeveloperLand