Microsoft Site Server
The powerful Intranet server, optimized for Windows NT Server, for publishing and finding information easier and faster
Integrating Microsoft Site Server Search with Microsoft Exchange
This paper details indexing Microsoft Exchange public folders with Site Server. It provides conceptual and procedural information to help you deploy and configure Site Server’s advanced search capabilities to work with information in Exchange public folders.
© 1998 Microsoft Corporation. All rights reserved.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.
Microsoft, BackOffice logo, Win32, MS-DOS, Windows, IntelliMirror, Windows, the Windows start logo, Site Server, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
Other product or company names mentioned herein may be the trademarks of their respective owners.
Microsoft Corporation • One Microsoft Way • Redmond, WA 98052-6399 • USA
Knowledge Management...................................................... 1
Integrating exchange and Site server search: Planning 2
Determining the Data Structure 2
Collecting Knowledge 2
Capacity Planning.................................................................. 3
How Many Machines? 3
Disk Space Requirements 3
Memory Requirements 3
Requirements for crawling Exchange Public Folders 5
Exchange Server 5
Outlook Web Access 6
Microsoft Outlook 6
Setting up Exchange 6
Setting up the Microsoft Exchange Public Folders 6
Replicated Folders 7
Install IE 4 7
Install Windows NT Option Pack 7
Additional Components 8
Installing Site Server 8
Installation type 8
Select Features 8
Configure User Accounts 9
Search Service Configuration 10
Modify Service Logon 10
Creating a Search Catalog Definition 11
Creating a new Search Catalog 11
Name the Catalog 12
Specify the Crawl Type 12
Exchange Server Names: 13
Exchange information. 14
Configure start address 15
Select Host 16
Complete the configuration 16
Catalog Administration 17
Gatherer Logs 17
Search Server Administration 18
Catalog Test 20
Publishing the Catalog 21
Custom Solutions................................................................. 24
Information is located in many different places throughout a company, such as on file systems, Web servers, databases, e-mail discussion servers, and document management systems. These files, database records, and e-mail messages are generally referred to as documents. The integration of Exchange and the Search component of Site Server make those types of documents more accessible from one application. You can use Search to build a catalog from these documents. Search finds and gathers the documents and then indexes them in a catalog.
Before you can set up your servers to extract information, you must first determine the type of information you want to make available for searching. Do you want to limit the available information to Exchange public folders, or do you want to provide information from a variety of sources such as information on file systems, Web servers, databases, or e-mail discussion server? For the purposes of this chapter, the focus is on how to integrate Exchange and Site Server to provide information stored in public folders, which can include information from replicated news groups.
The solution described here uses Exchange public folders to collect and store information. Public folders store e-mail messages, forms, task items, attachments, graphics, sound bites, and many other types of information in one location. You can use one or more of the following methods to get information into a public folder:
· Add the public folder to distribution lists. The public folder acts as an archive of the messages sent to the distribution lists.
· Create a Microsoft Outlook or Web form that adds structured messages to the public folder. The advantage of this method is that you can add custom attributes to the message. Later, users can limit their searches based on those custom attributes.
· Replicate Network News Transfer Protocol (NNTP) newsgroups, including USENET newsgroups, into the folder. This enables you to create an archive of information posted to a newsgroup.
Users can also simply store information in the public folder using Outlook, and you can also use knowledge management to improve access to existing public folders in your organization.
Individual server resource requirements will of course vary according to the following factors:
· The number of services running on the machine. (e.g. Site Server, SQL Server, Exchange, etc).
· Whether you choose to implement SQL Server as the database or MS Access, and whether the database will be homed on the same machine as Site Server.
· The number of catalogs on the server and documents within each catalog.
· The number of users to be serviced.
The minimum recommended configuration for Site Server (with SQL Server) is:
· Pentium 166Mhz or greater.
· 128 Meg of RAM
· 2Gig of hard drive space.
Initial hardware recommendations for the Search service are Pentium Pro 200, 128 Meg of RAM and disk space equivalent to 200% of indexed content.
Site server was designed to handle a large volume of user queries. Plan to deploy at least one machine for every 40,000 users or 700k documents cataloged.
Add machines in the following circumstances:
· When adding more documents or more than 32 catalogs.
· For fault tolerance.
· When servicing a distributed user base, to conserve bandwidth.
The minimum disk space recommendation for Site Server is 2 GB. Site Server itself takes up approximately 150 MB of disk space to install but incidental services such as SQL Server will require substantially more.
For the purpose of the Search service plan enough disk space to accommodate at least 150% of the content you intend to catalog (200% recommended).
Messages consume about 1k in the index, and slightly less (900bytes) in the property store, per message.
Memory should be added according to the chart below:
# of Documents in Property Store
> 1 million
The amount of resources will also vary depending on the services of Site Server you will be utilizing. Personalization and Management, and Analysis are not required for Exchange indexing.
Site Server may be installed on the same server as Exchange and SQL Server or on a separate machine. Likewise, it is not necessary that SQL Server be installed on the same computer as Site Server. The following components are required for Exchange Public Folder indexing.
· Exchange Server 5.5
· Outlook Web Access.
· SQL Server 6.5 (SP4) or SQL Server 7.
If SQL Server is not used, Microsoft Access will be automatically installed during Site Server setup.
Site Server may alternately use MS Access. It is recommended that SQL Server be used in medium or large environments to provide scalability and performance. It is not possible to transfer Site Server data to SQL Server from MS Access once the installation is complete..
· Site Server
· NT 4 (SP3/SP4)
· NT Option Pack
· IIS 4
Outlook Web Access does not have to be on the same computer as Exchange Server; it can be a stand-alone Web server, or it can be the same Web server on which Search is installed.
The Exchange Server used for crawling, the Outlook Web Access server, and the Exchange Server that is connected to the Outlook Web Access server do not have to be the same computer. However, both Exchange Servers must be in the same organization, and must share the Public folders. Specifying different computers for the Exchange Server in the Microsoft Exchange configuration properties and the Exchange Server connected to the OWA server allows you to access messages from one location and display them from another.
Using Search, you can crawl Microsoft Exchange Public folders to include messages and their attachments in your catalogs. When site visitors search these catalogs, they can click a results link and view the message or document. Messages may be viewed using Outlook Web access or Microsoft Outlook.
The following sections describe the software requirements for crawling Exchange Public folders.
Search is configured to work with one Exchange Server in an Exchange installation for cataloging and searching messages. This Exchange Server must be version 5.0 or higher. If you install Site Server on the same computer as Exchange Server, Exchange Server must be version 5.5 with Service Pack 1 installed.
All Public folders that you want to catalog and search must be hosted (“homed”) on this server or another server in the same site. With some additional limitations, you can search Public folders that are homed in other sites, either by creating replicas in the local site or by accessing the other site over the network via site affinity settings.
You can configure Search to allow site visitors to access Public folder messages using a Web browser through Outlook Web Access. In this case, Outlook Web Access server must be version 5.5 or greater. The Exchange Server connected to the Outlook Web Access server must be version 5.0 or greater.
You can configure Search to allow site visitors to access Public folder messages using Microsoft Outlook. In this case, the site visitor must be running Microsoft Outlook 97 version 8.03 or above and Internet Explorer version 3.0 or above, on Microsoft® Windows 95 or Microsoft® Windows NT 4.0 or above.
To allow the Site Server search service to crawl Exchange public folders, the Site Server Search service must have administrative privileges on the Exchange Site’s configuration container. You can add the Search administrator to the Exchange site.
For simplicity of administration, it is easiest to simply use the Exchange Server service account for both services, since the Exchange service account already has administrative privileges at this level. This account will be used for Content Access and the Search service..
You must also set up Microsoft Exchange Public folders to allow site visitors to access messages in them. Your site visitors will only be able to see search results for those Public folders where they have at least read (“Reviewer”) access. The administrator or folder owner must modify the permissions on each Public folder to include users who are allowed to search it by granting Reviewer rights.
In addition to assigning permissions to individual Exchange mailboxes or distribution lists, you can assign Public folder permissions to Default (authenticated users—those who have logged into a mailbox with a user name and password) and Anonymous (unauthenticated users—those who have not). Granting access to Anonymous has no effect on authenticated users; to grant access to all users, grant permission to both Default and Anonymous.
This has implications for how to set up Public folder permissions when using Outlook Web Access. When site visitors click on a search results link that points to an Outlook Web Access server, if the Public folder message in the search results specifies authenticated access only, then the site visitor must log on to Outlook Web Access before viewing the message. If the message can be viewed by unauthenticated users, Outlook Web Access displays the message without prompting the site visitor to log on. For this reason, if you set up a Search site for Public folder content that can be viewed by anyone, you might want to grant read privileges (for example Reviewer rights) to Anonymous. Then, users will not have to log on to read the messages.
However, if there is secure content in Exchange Public Folders, it is very important to disable anonymous logon on the virtual root containing the search page. Otherwise, the users accessing the search page will not be authenticated and consequently, they will be searching using “anonymous user” privileges.
Microsoft Exchange Public folders can be replicated to multiple servers within an Exchange site, or to servers in other Exchange sites within your organization, thereby allowing users to have fast access to a replica of the folder through a local server. Replicated Public folders can affect Search in several ways.
By default, Search can crawl Public folders “homed” on any server within the same Exchange site as the primary Exchange Server that you configured. Once you specify the initial folder for the start address, if any of the subfolders are located on different servers within the site, automatic connections will be made to those servers as needed. No special configuration is necessary in this case.
Your Exchange site may contain local replicas of Public folders that are “homed” in other sites. Search needs administrative access to the folder to successfully crawl it. In the case where the administrator has limited administrative access to the home site, the administrator of the other Exchange site will need to either disable this setting (in Microsoft Exchange Administrator, click the Folder object, then on the File menu click Properties. On the General tab, clear the Limit administrative access to home site option) or grant Owner permissions to the Windows NT account under which Search is running under.
Internet Explorer 4.01 or greater is required on the Server where you will be installing Site Server. If you do not have it installed, Site Server will guide you through Internet Explorer installation during setup.
The Windows NT 4 Option Pack provides several applications that are used by Site Server, including IIS 4.0, Microsoft Management Console and Microsoft Index Server. When you install Site Server from CD, the Site Server setup detects whether these applications have been installed. If they are not present, Site Server will guide you through the NT 4 Option Pack setup.
The NT 4 Option Pack can also be manually installed prior to Site Server installation. When installing the NT 4 Option Pack, make sure that the following installation choices are checked after selecting the custom setup option.
· FrontPage 98 Server Extensions.
· Internet Information Server
· Internet NNTP Service
· SMTP Service
· Microsoft Index Server
Visual InterDev and FrontPage 98 are included with Site Server 3.0 for developers who wish to create dynamic web pages. These products are not required to run Site Server 3.0. If you intend on using FrontPage 98, it must be installed prior to installing Site Server.
The following installation procedures assume that Microsoft Access will be utilized as the Site Server database.
Start setup.exe from the Site Server 3.0 CD, and select Server Installation to begin installing Site Server.
This “quick” setup example will not be using SQL server. You will need to make some changes during the installation process. Choose the Complete/Custom option.
Select the features you want. In this case, make sure that Access is selected as the Analysis database.
If you have chosen to use SQL Server as the Site Server database, check off the box labeled SQL Server Database Support in the Select Features dialog.
You will be prompted to configure the user accounts. Select each service and click on the Set User Accounts button. Enter the account and password for each feature account. The same account is recommended for both features.
Setup will stop any Web Server , Indexing, NNTP and SMTP Services.
Review the options that will be installed and click Confirm to continue.
Setup will proceed to copy files, setting configuration data and start the services.
As soon as you complete the basic installation of Site Server 3.0, you should modify the login account used by the Site Server Search service. Remember that this service will be accessing the Exchange Server and requires administrative access to the configuration container of the site. Changing the service account logon and password is done through the Service Manager applet. By default, the Site Server Search service uses the system account which has inadequate privileges for the task.
Open the Service Manager applet and find the Site Server Search service.
Click Startup to access the properties page.
By default, the “Log On As” is set to System Account. Select the option labeled “This Account” and enter the Domain\AccountName and password. As mentioned previously, it is probably wise to simply use the Exchange service account when able.
Creating a Search Catalog allows you to leverage the information contained in Exchange public folders. Search Catalogs provide the index and property store that Site Server creates, and that your users can search. A single catalog can host indexes for Exchange Public Folders, as well as any ODBC data source, Web pages and file systems.
Start by creating a Search Catalog based solely on Exchange Public Folders.
Start the Site Server Service Admin (MMC) from the Programs menu.
In the Search folder, navigate to and right click on Catalog Build Server. Select New Catalog Definition with a Wizard.
Specify a name for the new catalog. The name is used for administrative purposes and may contain up to 39 characters. Note that spaces are not allowed in the catalog name.
To specify the crawl type, select the Exchange Crawl option.
Enter the NetBios name of the Exchange public folder and outlook web access servers.
Exchange server - Provide the name of the Exchange Server that Search will use to access Public folders. Public folder replication lets you use any Exchange Server in your site.
Outlook Web Access server - Provide the name of the Outlook Web Access server. This server is used if you write links from a search query that require Outlook Web Access. If you are not using Outlook Web Access for search results, you do not have to provide this information.
Provide the Exchange Site and Organization name in the appropriate fields.
Exchange site. Provide the name of the site that contains the Exchange Server.
Exchange organization. Provide the name of the organization that contains the site.
When configuring the Search hosts with Microsoft Exchange properties, it is essential to correctly configure the name of the site that contains the Exchange Server, and the name of the organization that contains the site. If this information is not correctly set, Search will not be able to catalog or search Exchange messages. The Exchange administrator can find this information by running the Microsoft Exchange Administrator program.
To build a catalog that crawls Exchange Public folders, you must specify an Exchange start address, which indicated the Public folder to crawl. You can specify whether to crawl all the sub-folders contained in the main folder, or you can specify a folder depth. The start address format reflects the hierarchy of the public folders and starts with exch://. Each folder name is separated by a slash (/). For example, to crawl a folder called Company News, use the start address exch://ExchangeServer/Public Folders/All Public Folders/Company News, where ExchangeServer is the name of the Exchange Server you have configured for Search.
Select the search host where this catalog should be homed. If this is the first Site Server, only one will be shown.
Before starting a build of a catalog with Exchange data, the “Content Access” account for the server (in this example “server1”) must be set to the same account that was used to configure the “Search Service Account”. The data in Exchange Public Folders must be crawled using the same account. This may be accomplished in two ways:
1) Set this account to also be the “Default Content Access Account”.
2) Set the “site access account” of “server1” to be the Exchange administrative account.
To complete the creation of the new catalog, select “Start build now” before clicking finish.
There are two types of catalog builds. A full build uses the start address of the crawl. An incremental catalog build starts with the catalog from the previous build and updates the catalog with any changes made to the content since the last crawl. Be aware of the volume of documents to be indexed before selecting a full build.
After creating the new Catalog Definition, we can verify that the catalog and search is functioning properly.
In the scope pane of the Microsoft Management Console under Search, double-click the icon for your server to expand it, double-click Catalog Build Sever to expand it, double-click the catalog you just created to expand it, and then click Gatherer Logs.
The View Log Files page appears in the results pane of the Microsoft Management Console.
The most current log file is displayed, as shown below.
You can use two different logs to find out about any problems or errors in the catalog: the NT Event Viewer and the Search gatherer logs.
For general Search events, start with the Event Viewer, which records events such as the time the crawl was started and stopped, and any fatal errors encountered during the crawl. For example, if you build a catalog then find out it is empty; the Event Viewer could tell you whether the start address was incorrect or inaccessible due to security.
For document-specific events, use the Search gatherer log. Each time Search builds a catalog, a gatherer log is generated which records all access errors in a gatherer log for that catalog. For example, if a document no longer exists, the access error to that link is recorded in the log. The gatherer log also records other events, such as the time the crawl was started and stopped. You can view gatherer logs to trace errors, find out which documents were not crawled successfully, and identify network problems.
With the Search Server highlighted, the right window provides a snapshot of the catalog search status. In this case, the new PublicFolder catalog shows a status of enabled. In the case where a configuration or connection problem existed where the search service could not access or index the public folders, the status would read “empty”.
The Search Server properties page allows you to specify the default catalog that will be used during a search as well as performance settings. At this point we are interested in setting the default catalog.
Right click on the Search Server, and select Properties to access the Search Server properties page. Using the drop down box, select a default catalog for the search server. In this case, you can use the catalog you just created.
Useful setting are also available on the Search Server catalog itself. Right click on your catalog under the Search Server, and choose Properties to access this property page.
The general property page allows you to enable/disable this catalog as well as adjust setting for queries. Limiting the number of query results returned can improve otherwise sluggish server performance. The default values are 10000 milliseconds for query timeout and a limit of 200 query return results.
The number of documents contained within the catalog can be viewed on the Statistics page of the search catalog property page.
At this point we have verified that the Catalog Build Server is gathering document properties and that the Search Server is enabled and that documents exist in the catalog.
To test the basic functionality of the search catalog, expand the Search Server, the new Catalog and select the Search icon. This page provides a straight forward text search on the selected catalog. Enter some text in the field and click the submit button. Results are displayed under the search field.
Clicking on one of the search results should launch Internet Explorer and bring you to the Outlook Web Access login page. Once connected, the message or document will be displayed in Internet Explorer.
Viewing the catalog in action is made easy due to the many sample search pages included with Site Server. The sample pages along with thorough product documentation will provide you the tools to build powerful Web applications to support indexed public folders.
Start Internet Explorer and navigate to the Site Server samples page by entering http://your-server-name/siteserver/samples. A page similar to the one below will be displayed.
The object of interest here is the Search link. Select it, and when the Search Sample page is displayed, select the Microsoft Exchange option from the left pane. You will be presented with a generic query page for you catalog. Again enter, some text in the Search for field and click search. Results should be returned and displayed as shown below.
Should multiple catalogs exist, the Catalog field would display a drop down box with a list of available catalogs on this server. Since there is only a single catalog in this example, and it is the default catalog, there is no need to specify a catalog server.
Note the methods of access for the results. Documents may be accessed through either Outlook or Outlook Web Access as described in the query results text. Try viewing your search results using both techniques.
The next step is to test your Search Web pages with your catalogs, and then place the pages in a virtual directory on your Web server so your site visitors can use them.
You can help users target a search and improve the display of results. Your pages can use a variety of other search features, such as logging each query for search reporting and using alternative types of query syntax.
The detailed discussion of solutions development is beyond the scope of this document. Excellent information on developing Site Server solutions can be found at www.microsoft.com/backoffice/siteserver.
 When implementing Site Server only in a search capacity, no supplemental database must be running.
 It is recommended that Exchange Server and Site Server be run on different computers. Further, if you expect to index a large volume of data, then running indexing and searching on different computers is recommended.
 The Domain\AccountName account must be in the administrators group of the Site Server computer. In this example, STREETMARKET1\exservice account should be in the administrators group.