Deploying and Maintaining Microsoft SQL Server 7.0 in an In-Store Environment

by Pradeep Raman

 

Contents

Introduction.............................................................................................................................................................. 4

Infrastructure........................................................................................................................................................... 5

Network Infrastructure..................................................................................................................................... 5

Network Architectures and Protocols.................................................................................................... 5

Windows NT Domain models................................................................................................................. 5

Single Domain Model....................................................................................................................... 5

Master Domain Model..................................................................................................................... 6

Multiple Master Domain Model..................................................................................................... 6

Complete Trust Domain Model...................................................................................................... 6

No Domain (or Workgroup) Model................................................................................................ 6

Domain models with a BDC in each store.................................................................................... 6

Recommended Domain Model...................................................................................................... 7

Hardware Infrastructure.................................................................................................................................. 7

Configuration........................................................................................................................................................... 9

Character Sets and Sort Order................................................................................................................ 9

Master and Tempdb databases.............................................................................................................. 9

Database growth parameters................................................................................................................ 10

Logging..................................................................................................................................................... 10

SQL Server configuration parameters................................................................................................. 10

Implementation...................................................................................................................................................... 12

Deployment............................................................................................................................................................. 13

Remote Access Service.......................................................................................................................... 13

Remote Control Software..................................................................................................................... 13

Software Distribution and Management Software........................................................................... 14

Recommendations................................................................................................................................. 14

Operations............................................................................................................................................................... 16

Remote Administration................................................................................................................................. 16

Jobs.................................................................................................................................................... 16

Operators.......................................................................................................................................... 16

Alerts................................................................................................................................................. 16

Recommended Centralized Administration scenario.............................................................. 17

Backup and Recovery................................................................................................................................... 17

Full Database backup.................................................................................................................... 17

Differential backup........................................................................................................................ 17

Transaction Log backup............................................................................................................... 17

Recommended backup scenario................................................................................................. 18

Recommended recovery scenario............................................................................................... 18

Database Maintenance................................................................................................................................. 18

Polling...................................................................................................................................................................... 20

Snapshot Replication............................................................................................................................. 20

Merge Replication................................................................................................................................... 20

Transactional Replication..................................................................................................................... 21

Replication Implementation Details................................................................................................... 21

Recommended Replication techniques.............................................................................................. 21

Connected stores............................................................................................................................. 22

Disconnected or occasionally connected stores........................................................................ 22

Pros and Cons for Replication as a Polling solution......................................................................... 22

Pros.................................................................................................................................................... 22

Cons.................................................................................................................................................. 22

Bulk-copy utility (BCP)......................................................................................................................... 23

Data Transformation Services (DTS).................................................................................................. 23

Two-Phase Commit................................................................................................................................ 24

Troubleshooting.................................................................................................................................................... 25

Support Infrastructure........................................................................................................................... 25

Automated Alerts.................................................................................................................................... 25

Remote Monitoring................................................................................................................................ 26

Licensing................................................................................................................................................................. 27

Conclusion............................................................................................................................................................... 28

Finding More Information................................................................................................................................. 30

Introduction

In-store systems have traditionally been developed using proprietary, non-relational database platforms.  With the advent of strong competition in every facet of the retail industry, companies are forced to focus on increasing efficiencies within the store since these translate directly into better customer service, greater throughput of customers, greater revenue, and increased availability of data for the corporate merchandising departments.

The database platform plays a critical role in accomplishing efficiencies through the use of technology within the store environment.  The database needs to provide strong OLTP performance in a concurrent environment to handle the transactions during the day and also strong decision support performance for end-of-day and reporting activities.  In addition, the ability to automate and customize the transfer of data to corporate warehouses is mandatory. 

Price/Performance ratios are also an important factor in selecting an in-store database platform.  Organizations need to consider the number of licenses required for supporting all of their stores and the performance required for achieving all of their business goals.  Perhaps the most critical characteristic of an ideal in-store database platform is its ability to support automated and remote administration since each outlet in most organizations will not be staffed with technically oriented employees.  Microsoft SQL Server 7.0 delivers on all of these areas and is a compelling choice for an in-store database platform.

This document is designed to provide a retail organization’s development and implementation teams with the information they need to successfully deploy and maintain Microsoft SQL Server 7.0 in an in-store environment.  General recommendations for the optimal solution are provided wherever applicable and various options are discussed in each section along with their respective pros and cons.  If any of the recommendations conflict with an organization’s specific business needs, they should be overridden for a more appropriate choice.

Infrastructure

Before SQL Server 7.0 is deployed, the network and hardware infrastructure in place within the store should be examined.  This section discusses the various options available for each major category in this topic and provides some general recommendations assuming no conflict with the organization’s specific business or technical needs.  For the purposes of this section and the rest of the document, Windows NT is assumed as the operating system being utilized on the server as well as the clients.

Network Infrastructure

The issues to be discussed in this area include network architecture and protocols, and the Windows NT Domain model.

Network Architectures and Protocols

q  Ethernet is the most popular LAN architecture in use today and is a good choice for an in-store LAN since it provides good performance with low startup and administrative costs.

q  Fast Ethernet implemented on Unshielded Twisted Pair Category 5 cabling is a good solution for significantly increasing in-store performance (to 100 Mbps) at a reasonable cost.

q  The network should be analyzed for points of failure and fault tolerance should be built into the architecture to eliminate single failure points whenever accommodated by the budget.

q  All in-store communication as well as Internet/WAN access should utilize the TCP/IP protocol.  This protocol is supported on most operating platforms and is considered to be a universal standard for internetwork connectivity.  It also allows the in-store PCs to support IIS, HTTP, FTP, SMTP, and streaming media applications.

q  Static IP addressing is preferred over using Dynamic Host Configuration Protocol (or DHCP) for the distribution of IP addresses to devices in a small in-store environment since this significantly reduces the administration workload.  On the other hand, in-store environments with a larger number of in-store devices should prefer using DHCP for distribution of IP addresses over static IP addressing due to ease of administration.  Windows Internet Name Service (WINS) and Domain Name Service (DNS) should be implemented to provide name resolution services.

q  In general, standardize on as few network protocols as possible within the store in order to maximize the communication efficiency. 

Windows NT Domain models

Selecting the proper domain model is critical to the organization’s overall network performance.  There are five different domain models in Windows NT as described below.  This paper discusses the issues involved in implementing Windows NT Server 4.x domains.  The upcoming Windows 2000 Server should scale much higher than the limits specified below in supporting larger domains.  Furthermore, the limits specified in this section represent recommended limits as opposed to hard and fast rules for implementation.

 

Single Domain Model

q  Consists of one domain for the entire organization.  While this centralizes account and resource management thus maximizing ease of administration, it limits the number of accounts to approximately 40,000 (the recommended NT 4.x Domain limit). 

q  Requires the domain controllers to have additional processing power due to the increased workload. 

q  Requires the least administration among all the domain models since there are no trusts to be maintained.

Master Domain Model

q  The user accounts are managed centrally in a master domain while decentralized resource domains manage the resources. The number of accounts is limited to approximately 40,000.

q  The domain controller in the master domain will need additional processing power.

q  Each resource “trusts” the master domain (one-way trust) and therefore administration cost is more than the single domain model, but less than the other domain models.

Multiple Master Domain Model

q  Similar to the Master Domain Model except that multiple master domains are allowed.  Therefore, this model supports a virtually unlimited number of accounts.

q  The domain controllers in the multiple master domains share the workload of managing the accounts.

q  More trusts to manage than the Master Domain Model and hence higher centralized administration costs are incurred.

Complete Trust Domain Model

q  Decentralized account and resource management.  Supports an unlimited number of users.

q  Unlike the other three models, domain specific account policies are supported.

q  Highest administration costs due to the large number of trusts involved.  This is the least desirable model in an enterprise environment and is more suited for a departmental implementation.

q  No performance penalties are involved due to localized authentication for accounts and resources.

No Domain (or Workgroup) Model

q  There are no domains and each store has its own workgroup instead. 

q  Decentralized account and resource management.  Supports an unlimited number of users.

q  Since corporate administrators have to log in to each workgroup individually to apply changes, the administration costs can be high if all the workgroups need the same changes.

q  No performance penalties are involved due to localized authentication for accounts and resources.

Domain models with a BDC in each store

q  In order to avoid the performance penalty of incurring a WAN round-trip for each authentication request, the four domain models listed above require the configuration of the in-store server as a Backup Domain Controller (BDC).

q  Configuring the in-store server as a BDC will add some overhead to it (based on the size of the account database) since it needs to synchronize the security accounts database with the Primary Domain Controller (PDC) on a regular basis across a WAN or dial-up link.

q  Configuring each in-store server as a BDC also does not support disconnected or occasionally connected stores well.  Upon restoration of the link to corporate, the workload on the in-store servers will potentially be large since the accounts database has to be synchronized in addition to sending transactions and receiving lookup table information.  Performance could be affected on slow dial-up links.

Recommended Domain Model

q  The workgroup model (or the no domain model) is the preferred choice for in-store environments that involve only a handful of client devices, primarily due to its low cost and decreased maintenance complexity.

q  This model does incur high administration costs when changes have to be distributed throughout all the stores.  So, it is imperative that the account and resource setup be planned prior to deploying workgroups to the stores so that the number of future changes is minimized.  Generic logins should be used to anticipate future in-store usage since the primary user-based security will be handled by the in-store front-end application.

q  Except for the cases where changes need to be distributed to all the stores, the no domain model incurs extremely low administration costs since there are no trusts to manage and each store operates within its own workgroup.

q  In organizations with a larger number of in-store devices or a dynamic account list, a workgroup model may increase administration costs and hence a domain model is preferred.  In these cases, the Single/Multi Master Domain model provides the centralized account management and decentralized resource management best suited for such an in-store environment.  The master domain would reside in the corporate location while each store would contain a server configured as a backup domain controller.

Hardware Infrastructure

q  Memory and processor speed on the server should be maximized based on the budget.  They also depend on the number and nature of applications running on the in-store server.

q  If the budget accommodates it, implement hardware (or software) RAID Level 1 (disk mirroring) or RAID Level 5 (disk striping with parity) to prevent single point of failure in the disk drives and also maximize the performance of the system.  Mirroring provides better “Write” performance (which is critical for On-line Transaction Processing (or OLTP) applications) whereas disk striping provides better “Read” performance (which is critical for Decision Support Systems (or DSS) applications).  However, striping requires a minimum of three drives to implement versus two for mirroring.  In general, mirroring is the preferred fault tolerance solution for an in-store system.  Although implementing RAID is highly recommended, it is not required for SQL Server and it may not be practical for many retailers since it adds significantly to the per-store costs.

q  Invest in the fastest disk drives supported by the server because they are always the bottleneck in any machine configuration.  The different versions of the SCSI interface standard (SCSI, SCSI-2, FAST SCSI-2, WIDE SCSI-2) provide superior data transfer performance compared to the disk drives using the IDE interface.  FAST SCSI-2 is the most popular choice on modern servers but may have a higher cost associated with it compared to the IDE drives. 

q  For more details on hardware configurations for SQL Server machines, please refer to the Compaq white papers listed in the “Finding More Information” section of this document.

 

Configuration

There are several configuration options that need to be setup on Microsoft SQL Server 7.0 during and after the installation process in order to maximize the database performance.  This section details the configuration options and their respective settings that will maximize performance and minimize on-going administration.

Character Sets and Sort Order

Character sets, also known as code pages, control the way character-based data is stored internally in SQL Server.  Each character set consists of 256 characters (128 standard printable characters followed by 128 language specific letters and symbols).  Sort orders control the collation and presentation of data in response to database queries. A sort order determines how character data is compared and in what sequence it is returned from a query.  Both of these configurations are determined during the installation process.

Alternately, the Unicode standard for character representation handles 65,536 characters and covers virtually every character in use today in the different alphabet sets.  This representation takes up twice the storage as a character but provides support for storage of different alphabets such as the European alphabet or the Japanese Kanji.  Generally, language independence can be provided by the front-end application and the database should continue to store the data in a single character set representation.  The Unicode standard should be utilized if the application involves storage of internationalized data.  If the Unicode standard is selected, a Unicode Collation method must also be provided during the installation process.

q  Character set should be chosen over the Unicode standard as the data representation method to optimize storage and performance provided there are no requirements to store internationalized data.  The Unicode standard should be used if such requirements exist.

q  Standardize on the same character set and sort order throughout all the SQL Server installations in the organization in order to facilitate direct data transfer between servers.

q  The default character set (1252 ISO or ANSI) and sort order (Case-insensitive dictionary sort order) work well and should be used unless they conflict with specific business needs.

q  Do not use character sets to adhere to the language requirements because that should be controlled by the front-end applications and not by the database.

Master and Tempdb databases

The master database is a system database that stores information about all the other databases as well as system configurations and all login accounts.  Tempdb is a system database that is used to provide temporary storage for user transactions.

q  During installation, the size of the master database defaults to 7.5 MB and the size of the tempdb database defaults to 8 MB.  If the number of user databases and login accounts is limited, then the default size for the master database is probably sufficient.  It may be a good idea however to increase the size of master to account for future growth and avoid the expansion penalty (explained in the next topic). 

q  If the application utilizes temporary tables, either by explicitly creating them or by using the order by and group by clauses on large resultsets, then the initial size of Tempdb should be increased appropriately to avoid the expansion penalty (explained in the next topic).  The average space used

by each tempdb operation multiplied by the average number of concurrent users is a good approximation of the ideal tempdb size.  Calculate the space used in a tempdb operation by multiplying the number of rows being loaded into temporary tables by the record size.

q  Given the limited amount of RAM available on the in-store servers and the limited number of users in an in-store environment, it is best to avoid creating tempdb in RAM.  RAM is best utilized by assigning it to SQL Server and hence using it primarily for data and procedure caching.

Database growth parameters

SQL Server 7.0 automatically expands the size of a database as required during operation.  Either a percentage of the existing size or a fixed number of megabytes, as specified by the administrator, controls the amount of expansion.  While this feature significantly reduces on-going maintenance, it does create a performance penalty if the database is frequently being expanded. 

q  Specify a larger initial database size instead of accepting the default (which is 1 MB) when the user database is created.  Approximate the size based on minimum anticipated amount of data storage.

q  Specify a meaningful number of megabytes or percentage to expand by when the initial size is used up instead of accepting the defaults (which are 10% of the existing size or 10MB).  The reasons for the initial size filling up must be analyzed prior to making this decision.

Logging

In-store systems require up-to-the-minute recovery i.e. until the last committed transaction.  This functionality is provided in SQL Server 7.0 by using transaction logging. 

q  The database configuration option “truncate log on checkpoint” should be set to False so that the transactions can be retained in the log file which can then be backed up periodically. 

q  The transaction log should be backed up periodically to facilitate better recovery options.

q  Furthermore, if any non-logged operations such as “SELECT INTO” or “BCP” are performed on the database, differential backups must be done immediately to protect those transactions.  Please refer to the Operations section in this document for details on the preferred backup strategy.

q  The initial size of the transaction log file should be approximated using the number of transactions estimated between each time the log file is backed up.  This will avoid the penalty associated with constantly expanding the log file when the default of 1 MB is accepted.

SQL Server configuration parameters

SQL Server 7.0 allows users to determine the values for a number of configuration parameters that drive its overall performance.  Although, this level of tuning may be required for installations involving very large databases running on high-end servers with large disk subsystems, most in-store installations do not require it.  The default parameter values work well for in-store databases and should be maintained on all the store servers.  Some of these parameters are listed below.  Please refer to the white paper titled “Microsoft SQL Server 7.0 Performance Tuning Guide” at http://msdn.microsoft.com/developer/sqlserver/sql7perftune.htm for additional details on setting configuration parameters for high-end systems.

q  Max Async I/O: The default value of 32 is sufficient for in-store systems.  The overall goal is to control the speed of “checkpointing” the database i.e. writing modified data or log pages from the buffer cache to the disk subsystem and also truncating the inactive portion of the log.

q  Max Worker Threads: The default value of 255 is sufficient for in-store systems.  This controls the number of Windows threads used by SQL Server to service the commands being submitted to the database at a given time.

q  Boost SQL Server Priority on Windows NT: This allows SQL Server to run at a higher priority than other Windows NT processes and should be used only if the machine is dedicated to SQL Server.  The default value is zero, which forces SQL Server to run at the same priority as the other processes.

q  In general, in-store servers should take advantage of the auto tuning and auto-configuration features built in to SQL Server 7.0 and not change the default configuration parameter values.

 

Implementation

This section will outline the database design and implementation issues that have an impact on performance, administration, and the ability to troubleshoot from a remote location.  These are general recommendations and should not serve to override any conflicting business requirement.

q  Implement all data integrity checks on the database in addition to performing the same checks in the front-end code.

q  Reduce the size of transactions as well as the amount of I/O whenever possible.  This will increase performance and concurrency support.

q  Transactions that affect a large number of rows on a table that contains a complex trigger could perform poorly in a concurrent environment due to the length and number of locks being held.  Isolating the logic contained in the trigger into a stored procedure could improve performance.  In general, transactions should be made as compact as possible in order to facilitate good performance in a concurrent environment.

q  Use constraints and Null/Not Null column properties to perform data validation instead of using triggers.

q  SQL Server 7.0 has a new index storage mechanism, which does not significantly add to the overhead involved in maintaining additional indexes.  Therefore, when in doubt, index the column.

q  Use the power of stored procedures to reduce the number of client requests submitted to the server. 

q  Balance performance with scalability in the analysis for selecting the appropriate cursor.  Server side cursors will generally provide better performance for a given query, but they are not as scalable as client-side cursors for supporting a large number of clients.  The appropriate cursor should be selected based on the application needs and scalability requirements.

For additional details on performance tuning for SQL Server 7.0 applications, please refer to the white paper at: http://msdn.microsoft.com/developer/sqlserver/sql7perftune.htm

 

 

Deployment

This section discusses the issues involved in the deployment of SQL Server 7.0 on the in-store PCs.  This includes the RDBMS installation on the in-store server and the client utilities on the in-store clients.  It is assumed that the initial installation and configuration of all PCs will occur in a centralized site using “ghosting” or other disk copying methods.  This section details the options for distributing updates (such as service packs, or bug fixes) to the in-store devices for connected as well as disconnected or occasionally connected stores. 

Costs and ease of administration are the two overriding factors taken into consideration for the recommendations provided in this section.  In addition, the solutions must be applicable for both permanent WAN links as well as standard dial-up links.  The fact that the distribution method selected for SQL Server will also be applicable for other software updates to the same devices is also central to these recommendations. These are general recommendations and should not serve to override any conflicting business requirement.

Remote Access Service

q  Remote Access Service (RAS) is built-in software provided with Windows NT that facilitates dial-in access to the network on which the target NT device resides.  Assuming that the bandwidth of the dial-in connection is sufficient, this is a solid no-cost solution for performing file transfers.

q  Besides support for dial-in access, RAS also provides support for using the Point-to-Point Tunneling Protocol (PPTP) on top of a Point-to-Point Protocol (PPP) connection to the Internet.  This allows for a secure simulation of a local area network connection over an encrypted public link and provides significant cost benefits because the costs are now reduced to a fixed monthly fee to an ISP for Internet access.  This can be significantly cheaper than dialing long distance to/from the store on a regular basis.

q  RAS also provides access in the event of a loss of network connectivity and can be used to provide connectivity in a cost-effective manner to stores located in remote areas where network connectivity may not be feasible.

q  However, this solution involves dialing in to each store server individually and hence increases administration costs if updates need to be distributed throughout a large retail chain.

q  Another option is to develop a custom application that incorporates RAS for disconnected stores and automates the task of connecting to each store and performing the necessary transfers.  The application can be designed to provide detailed session logs for each connection, which can be monitored from a centralized location.  The custom application can also be designed to execute applications, update registry settings, and perform any other programmable actions on the in-store server.  This option can be more efficient than packaged software solutions since the custom application will be designed to satisfy a narrow, specific set of requirements.

Remote Control Software

q  Third party tools such as PCAnywhere provide the ability to take control of target PCs and therefore can be used to obtain access to an in-store machine.  Basically, each keystroke or mouse click on the caller’s PC is transferred and executed on the target PC and each resulting screen on the target PC is transferred back to the caller. 

q  Although both network and dial-in connections are supported, this is a slow speed solution and requires significant bandwidth in order to be used as a deployment mechanism.  This solution is better suited for on-demand scenarios for one machine at a time such as troubleshooting or helpdesk monitoring scenarios.

Software Distribution and Management Software

q  This category of tools provide powerful software distribution and management capabilities for the enterprise.  They allow for centralized and automated administration of tasks such as file transfers, remote program execution, and service pack and patch updates among others.  They obviously cost more than both Remote Control and Remote Access solutions.

q  While administration is easy and automated, it requires staff knowledgeable in the chosen tool to be available at the centralized location.  This certainly adds to the costs compared to the alternatives presented above.

q  Microsoft’s Systems Management Server is a good candidate for desktop management and software distribution in a Windows NT environment.  It can also be used for automated file transfers but it requires the implementation of a domain model instead of the simpler workgroup model.

q  Sterling Commerce’s Connect:Remote is one of the leading software tools in use by large retailers to solve the software distribution and management problem.  It performs well over a WAN connection and includes many sophisticated features such as session logging and remote program execution along with the speed required for supporting an enterprise-wide deployment effort.

q  Using a combination of SMS, for desktop management, and Connect:Remote for other tasks such as polling and remote program execution is also an option chosen by larger retail organizations.

Recommendations

q  Packaged software such as Connect:Remote (or SMS in combination with Connect:Remote) provide a scalable and reliable solution for tasks such as software distribution, desktop management, polling, and automating bi-directional file transfers.  These third party solutions have an advantage over custom RAS solutions in the areas of scalability and reliability across a large enterprise-wide deployment.  Many of Connect:Remote’s sophisticated logging and administration features can be utilized in addition to its scalability and reliability to justify the additional per-store costs.

q  Third party solutions involve higher total costs, but are better suited for scaling in size and complexity.  Besides the software purchasing costs, they also involve additional operational training and administration costs.  These solutions can also be customized by adding simple scripts to perform additional tasks.

q  These third party solutions will support both permanent WAN links as well as standard dial-up links.

q  If the deployment needs are simplistic in size and complexity and if the appropriate development skill sets are available, the custom application solution, incorporating RAS for disconnected stores, is a flexible solution for the lowest cost.  The application must include features such as session logging and the ability to execute programs, update registry settings, and perform multi-threaded or simultaneous operations on the in-store server.  In addition, the application must have a comprehensive user interface for the centralized support personnel to monitor the status of all activities.

q  This solution involves minimal administration and training costs and can be customized to the organization’s specific business needs.  It does however add significantly to the software development and maintenance costs and is limited in terms of scalability.

q  Using a PPTP connection on top of a PPP connection to the Internet should be explored as an alternative to dialing in directly to/from the store.  Based on usage patterns, this can provide significant cost savings.

q  Remote Control tools should be deployed at each store and utilized strictly for troubleshooting and monitoring purposes.  This greatly increases the ability to provide remote administration and support, which is one of the critical requirements for any in-store system.

Operations

This section discusses the operational requirements for SQL Server 7.0 in an in-store environment and including issues such as Remote Administration, Backup and Recovery, and Database Maintenance.

Remote Administration

SQL Server 7.0 provides remote administration capabilities by using a master server to store job definitions and target servers to execute them.  The master server is usually defined at the central administration site while the target servers are located at each of the remote sites.  The target servers periodically connect to the master server to receive new task lists and then disconnect until after execution, when the job status information is relayed back to the master server.  In this manner, jobs are defined once and executed at multiple remote sites with the status of each execution being captured back at the master server.

Jobs, Operators, and Alerts are the primary components of the administration function in SQL Server 7.0.  There are wizards available to define each of these components and they all require the SQL Server Agent Service in order to execute.

Jobs

q  Consist of steps, which can include executable programs, Transact-SQL scripts, Windows NT commands, Active Scripts, or replication agents.

q  Can be categorized to provide for easy filtering and grouping.

q  Allow control of sequence of steps with full error handling between each step.

q  Can be executed locally (standalone jobs) or on multiple remote servers (MultiServer Administration jobs).

q  Are executed by schedules and alerts.

Operators

q  One or more individuals responsible for maintaining the SQL Server databases.

q  Are notified of job status via e-mail, pager, or a “net send” command.

Alerts

q  Consist of one or more SQL Server events and the response to each of those events.

q  Responses to events can occur in the form of operator notifications, job executions, raising an SNMP trap, or forwarding an event to another server.

 

 

Recommended Centralized Administration scenario

q  Configure a server at the corporate site to serve as the master server.  Enlist the server at each store site to serve as target servers.

q  Define primary and secondary operators at the centralized site who will be responsible for responding to alerts.

q  Define the MultiServer Administration jobs (such as backup jobs or database maintenance jobs) on the master server.  Categorize each job appropriately.

q  Define “On Success” and “On Failure” actions for each step in the job.  If an operator needs to be alerted on failure then execute a RAISERERROR WITH LOG statement to write the error message and the severity to the Windows NT event log. 

q  Define alerts to notify operators via e-mail or pager based on the errors written to the Windows NT event log.  Alternatively, alerts can also be defined to process jobs based on certain errors.

q  Schedule the jobs on the respective target servers.  Thus, all maintenance jobs can be automated and managed from a centralized location.

q  This configuration can also be used to support other tasks such as polling and software distribution from corporate to the stores.  In general, all the SQL Server tools can be run from a remote location.  Therefore, with the right security privilege, almost any task performed on the SQL Server host itself can be performed from a remote location.

Backup and Recovery

Providing the ability to recover until the last committed transaction in most situations should be a mandatory requirement for in-store systems.  This can be accomplished in SQL Server 7.0 by using a combination of Full Database Backups, Differential Backups, and Transaction Log backups.  SQL Server 7.0 backups can be performed while the database is live and most Transact-SQL statements are allowed during the backup operation. 

It is a good practice to store backups on an external media such as high-speed tape drives and also use off-site storage for disaster recovery purposes.  A more cost-effective alternative to tape drives is to utilize the hard drives of other in-store devices to store the backup files.  This can provide significant cost savings across all stores and still provide acceptable performance and recovery options

Full Database backup

As the name implies, a snapshot of the entire database is saved.  This can potentially take a long time based on the database size.  Every recovery MUST begin with restoring a full database backup.  Therefore, performing this backup periodically is recommended.

Differential backup

Only the data changes made since the last full database backup are saved.  This operation takes a lot less time than the full database backup and is generally performed more frequently.

Transaction Log backup

All the committed transactions since the last backup (full, differential, or transaction log backup) are saved and then cleared from the log file.  This operation takes very little time and its frequency depends on the tolerance for data loss as dictated by business needs.

Recommended backup scenario

1.       Perform a full database backup every night to a fresh tape (or to a new file on a hard drive).

2.       Perform differential database backups every 2 hours and immediately after any non-logged operation (such as SELECT INTO or BCP).

3.       Perform transaction log backups every 15 minutes.

4.       Backup jobs can be automated and managed from a centralized location.  Refer to the “Recommended Centralized Administration scenario” section detailed above for additional details on automating and managing backup and other maintenance jobs.

Recommended recovery scenario

1.       Backup the current active transaction log from the server’s hard drive in order to achieve up-to-the-minute recovery.  If it is not accessible, then the database can only be restored to the state represented by the last saved transaction log (which should be at most 15 minutes ago).

2.       Restore the last full database backup.

3.       Restore the last differential backup created since the last full database backup.

4.       Restore each transaction log backup created since the last differential backup. 

5.       If available, restore the active transaction log backup created in step 1.

6.       These recovery steps can be implemented as a script and executed by an operator on demand.  Refer to the “Recommended Centralized Administration scenario” section detailed above for additional details on managing jobs.

Database Maintenance

SQL Server 7.0 significantly reduces the need for frequent database maintenance routines and also provides wizards that automate the definition, execution, and monitoring of these tasks.  Although not required, these tasks can be scheduled periodically to provide peace of mind and increase the comfort level of the corporate administrators. In addition, the integrity checks should be performed:

q  When integrity errors appear in the error log.

q  Prior to a major Operating System or SQL Server upgrade.

q  Prior to a major hardware upgrade or repair operation.

q  In general, the following maintenance tasks can be scheduled using the “Database Maintenance Plan Wizard”:

q  Update Statistics used by the query optimizer (with a sampling size of 10%).  This is optional since it is executed automatically by SQL Server 7.0 by default.  However, if desired, it can be performed on a biweekly basis or as frequently as necessary depending on the number of transactions processed each day.  This operation should be very fast for most in-store databases and therefore can be performed even when the database is online.

q  Perform internal data integrity checks.  This can be scheduled to execute once a month during off-peak hours.

q  Perform database integrity checks. This can be scheduled to execute once a month during off-peak hours.

The wizard also provides the ability to create text or HTML reports from executing each task and also store the historical job status information on a remote server, say the master server at corporate.

 

Polling

This section discusses the implementation issues involved in a common in-store task, namely polling.  This is the process of feeding selected data, termed an article, from the stores periodically to the corporate databases as well as receiving articles from corporate that need to be loaded into the in-store database.  Reliability and the efficient use of bandwidth are the most critical factors in designing a polling solution.  In addition, the ability of the solution to integrate into the existing software distribution solution (such as SMS or Connect:Remote) is also explored in this section.  This provides the advantage of simplifying the deployment and support effort and reducing administration costs by using a single tool to perform both software distribution and polling. 

Replication, the Bulk-copy utility, Data Transformation Services and the Two-Phase Commit methodology are explored in this section as possible solutions for polling.   

Replication uses a publish-and-subscribe method for performing the data transfer.  A distribution database is utilized for persisting the data to be replicated.  SQL Server 7.0 supports bi-directional replication to both homogeneous as well as heterogeneous database platforms.  Replication allows for graceful handling of network failures since the data will be held at the distribution database and transferred once the connection is restored.  Thus, replication provides guaranteed delivery in an automated manner.  Furthermore, the SQL Server 7.0 Replication architecture is scalable up to hundreds of publishers (replication data sources) and thousands of subscribers (replication data receivers).  Specifically the following three major types of replication are provided in SQL Server 7.0:

Snapshot Replication

q  A current snapshot of the selected data, or article, is transferred from the publisher to each subscriber.

q  This should be primarily used to synchronize publishers and subscribers if a large number of changes have occurred since the last synchronization.  This replication method should also be limited to synchronizing a few tables at a time and not the entire database.  It is applicable only for read-only subscribers.

q  This method can use up a lot of bandwidth depending on the size of the article being replicated.

q  Snapshot replication provides transactional consistency on a delayed basis and provides a good degree of site autonomy if the subscriber does not update the data.

Merge Replication

q  Both the publisher and the subscriber are allowed to change the data contained in the article and the two versions are synchronized using merge replication.  Administrators control the criteria for resolving conflicts between the two updates. The “winner” of the conflict can be based on a user-specified priority scheme, a “first one wins” solution, or a custom resolution using COM and stored procedures.

q  This replication method is best suited for scenarios where both the subscribers and the publishers of data need to update the articles.

q  The number of conflicts to be resolved and the complexity of the resolution mechanism are the principal factors in determining the performance of this type of replication.

q  Merge replication does not guarantee transaction integrity and provides a high level of site autonomy since all sites can perform updates to the database.

Transactional Replication

q  INSERT, UPDATE, and DELETE statements are captured from the transaction logs and stored in the distribution database to be applied to all the subscribers in the same order.

q  Transactional replication can also be used to replicate stored procedures whereby each execution along with its parameters are forwarded to the subscribers and executed there.  This reduces the volume of commands and improves performance in some cases.

q  This method of replication is best suited for near real-time distribution scenarios and for efficiently replicating data from multiple publishers to a single subscriber.

q  Transactional replication guarantees transactional consistency and requires that the subscribers do not update the database.

Replication Implementation Details

q  The number of network round trips involved in transactional and merge replication is controlled by the CommitBatchSize (default = 100), CommitBatchThreshold (default = 1000), TransactionsPerHistory (default = 100), and PollingInterval (default = 3) parameters. 

q  The Distribution agent issues a query to get all transactions pending distribution.  It then sends these to the subscriber, committing the commands as determined by the CommitBatch parameters.  When all available commands have been processed, it sleeps for the specified PollingInterval, then re-issues the query to get commands pending distribution. 

q  If the subscriber is SQL Server, commands are applied to the subscriber in batches based on command type and memory consumption.  For most data changes to tables, the distribution agent batches up the commands and sends them to the subscriber when the size of the batched commands exceeds 10 * the packet size (by default about 40K) or when it needs to commit the transaction.  If the subscriber is an ODBC datasource, commands are applied individually to the subscriber.

q  The distribution agent uses ODBC to query the publisher and then uses either ODBC or OLEDB to transfer the data to the subscribers.  The data is transferred in batches as described above.

q  For snapshot replication, the distribution agent uses the BCP API to transfer the data to each subscriber.  The amount of data being transferred in this method will directly affect network usage and performance.

     Recommended Replication techniques

q  Use the parameters described in the Replication Implementation Details section above, to control the number of network round trips involved and thus the overall performance.  The default values for determining the batch size may need to be increased based on the number of records being transferred at a given time.  The PollingInterval can also be increased if real-time replication is not a requirement.

 

 

Connected stores

q  Use transactional replication to transfer articles containing individual transactions from each store to the corporate database periodically throughout the day (say once a shift).  The store databases are the publishers of these articles and the corporate database is the sole subscriber.

q  Snapshot replication can be used to initialize the price-updates and other lookup information at each store.  Thereafter, in general, transactional replication can be used to distribute the changes to these types of information on a daily basis.  If the number of changes is anticipated to be unusually large at any given point, snapshot replication could be used to re-initialize each store database and transactional replication could be restored thereafter.  The corporate database is the publisher of these articles and each store database is a subscriber.

Disconnected or occasionally connected stores

q  Use merge replication to transfer all the changes since the last synchronization from the stores (the publishers) to the corporate database (the subscriber).  The frequency of the synchronization will depend on the frequency of the connection to the corporate network.

q  Snapshot replication can be used to initialize the price-updates and other lookup information at each store.  Thereafter, in general, transactional replication can be used to distribute the changes to these types of information each time that a connection is established.  If the number of changes is anticipated to be unusually large at any given point, snapshot replication could be used to re-initialize each store database and transactional replication could be restored thereafter.  The corporate database is the publisher of these articles and each store database is a subscriber.

Pros and Cons for Replication as a Polling solution

Pros

q  Replication provides a scalable and reliable solution for guaranteed delivery of data from one or more sources to one or more destination databases.  It also supports the transfer of data to non-SQL Server databases (although it does not provide the ability to transform the data).

q  The three replication methods detailed above utilize network resources efficiently (as controlled by user-defined parameters) and provide centralized administration of all the data publishers and subscribers in the enterprise through a graphical interface in the SQL Enterprise Manager utility.  Replication can be easily customized (for each store if necessary) to control the frequency of the data transfer as well as the type of replication performed.

q  Replication does not require any custom application development since the ability to ensure that the same data is not transferred more than once as well as the ability to handle network and hardware errors gracefully is built into the solution.  Detailed error handling is also provided without any custom development to facilitate easy troubleshooting and remote administration.

Cons

q  Replication does not integrate into other existing software distribution solutions (such as SMS or Connect:Remote) and requires the use of the Enterprise Manager utility for administration purposes.  This adds complexity to the overall deployment and support effort since two separate solutions and tools have to be implemented to handle software distribution and polling.

q  Replication provides limited flexibility in the selection of the data to be transferred since only horizontal or vertical partitions of database tables can be selected for replication.  The ability to gather data from multiple tables into one rowset, which can then be transferred to the destination, is not provided by replication.  This could increase the volume of data being transferred depending upon the database structure and the replication requirements.

q  Replication is unproven as a polling solution in the retail enterprise environment and hence provides no reference installations in terms of polling. 

Bulk-copy utility (BCP)

q  An alternative solution for polling can be developed using the BCP utility provided by SQL Server 7.0.  This utility allows for the data in specified SQL Server tables to be exported to operating system files as well as for the data in the specified files to be imported into SQL Server tables.  The BCP utility can be executed from the command prompt and scheduled through a batch file. 

q  In addition, the BCP API can be utilized to develop a custom application that performs the necessary data transfers and which can be executed by any standard scheduling system.

q  A data transfer solution built using the BCP utility can be integrated into the existing software distribution solution (such as SMS or Connect:Remote). The custom application will utilize the BCP API calls to export the data from the source database tables into individual files, which can then be transported through the existing software distribution solution to the target server, where another custom application will be executed to import the data.

q  BCP does not provide much flexibility in data selection and forces the creation of an individual file for each database table participating in the transfer.  Furthermore, custom logic will have to be included in the application to ensure that the same data is not transferred more than once and to gracefully handle network or hardware failures during the transfer process.

q  Administration will be simplified since it will be integrated into the existing software distribution solution and therefore not require an additional monitoring tool.  The custom application should provide specific error handling capabilities to facilitate easy troubleshooting and remote administration for the operations staff.

q  File compression techniques can be used to maximize the available bandwidth.

q  This solution involves additional software development and maintenance costs for the custom applications, but provides the benefit of integrating into a proven and reliable existing software distribution solution.

Data Transformation Services (DTS)

q  DTS is another utility provided by SQL Server that can be used to develop custom polling solutions that integrate into the existing software distribution system.  It provides the ability to create a “package” of tasks that can be scheduled and executed to run in sequence.

q  DTS also provides an object model consisting of 32-bit COM objects that can be utilized to develop custom applications in Microsoft Visual Basic, Microsoft Visual J++, or Microsoft Visual C++.

q  DTS provides greater flexibility in the selection of the data by allowing the transfer of the results of a custom Transact-SQL query or stored procedure.  In this manner, all the necessary data can be gathered into a single result set, which can then be exported and transferred to the target server using the infrastructure of the existing software distribution solution. 

q  DTS utilizes a high-speed in-process COM server to transfer OLE DB rowsets directly from the source to the destination database.  It supports data transfer between heterogeneous databases and its extendable COM-based architecture facilitates complex data validations and transformations as the data moves from the source to the destination.

q  The custom application should be designed to integrate into the existing software distribution solution so that an additional monitoring tool is not required.  The custom application will also require additional logic to prevent the same data from being transferred more than once as well as to handle network and hardware failures gracefully.  Error handling should be included to specify the nature and location of the errors to facilitate easy troubleshooting and remote administration.

q  This solution involves additional software development and maintenance costs for the custom applications, but provides the benefit of integrating into a proven and reliable existing software distribution solution.

Two-Phase Commit

q  This polling solution involves the real-time update of both the source and destination databases bound by a transaction i.e. either both databases are updated or neither are updated.

q  This solution for polling provides no support for disconnected stores and will incur a heavy performance penalty at the POS devices if the link to corporate is a slow WAN or dial-up link.

q  It also prevents the POS device from processing transactions if the WAN or dial-up link is down which is usually unacceptable for a retail environment.

 

Troubleshooting

This section discusses the issues involved in troubleshooting and supporting the SQL Server 7.0 installations at the stores and the in-store systems in general.  The ability to provide remote troubleshooting and monitoring support is part of the priority for remote administration in an in-store environment.  This section details issues such as support infrastructure, automated alerts, and remote monitoring support in SQL Server 7.0.

Support Infrastructure

q  As mentioned in the deployment section, RAS can be used to provide the primary dial-in support for the SQL Server database.  A custom application that incorporates RAS is a solid, low-cost solution for performing file-transfers, database operations, and remote program execution.

q  Remote control software installed on an in-store machine provides an additional level of support capability to the corporate helpdesk staff.  However, it does add to the per-store cost for the organization.

q  In-store databases can be monitored from a central location by using the Enterprise Manager utility provided with SQL Server 7.0.  This utility can also be utilized to monitor replication tasks, database maintenance routines, and polling tasks.

Automated Alerts

q  SQL Server 7.0 provides alerts to monitor the messages appearing in the Windows NT event log.  The alerts can be triggered based on severity levels, specific error numbers, or even error messages containing a specific word or phrase.

q  By default, severity levels between 19 and 25 trigger an alert since they represent SQL Server error messages.  In addition, alerts must be set up for handling errors occurring in any of the custom jobs created on the server.  Using the RAISERROR WITH LOG statement within the jobs can trigger custom alerts.

q  Alerts should be set up on the master server which also receives the status of the jobs from each target server.  In this manner, jobs and alerts can be defined centrally and executed in a distributed manner.

q  In response to alerts, administrators can page an operator, send an e-mail, forward the event to another server, or even execute a specific program.  Any combination of these actions must be utilized to provide real-time notification to the administrators of the problems at one or more stores.

q  SQL Server 7.0 provides support for the Simple Network Management Protocol (SNMP) which provides the ability to monitor the status of SQL Server installations over multiple platforms (such as UNIX).  This also allows for integration with network management tools that are SNMP-compliant.  Besides alerts, you can also monitor performance information, access databases, and view server and database configuration parameters on these platforms.

Remote Monitoring

q  SQL Server 7.0 provides utilities to monitor SQL statements, including stored procedures, and current user activity, as well as hardware activity.

q  SQL Server Profiler can be used to monitor all SQL related activity such as execution of stored procedures or batch statements.

q  Windows NT Performance Monitor can be used to monitor hardware and other system activity on the server.

q  These utilities are generally used on an as-needed basis to troubleshoot specific problems at a particular store.

SQL Server 7.0 provides good support for remote monitoring and troubleshooting in addition to all the remote administration capabilities.  Automated alerts should be used to identify problems on a real-time basis and RAS can be used as the primary infrastructure for providing connectivity to the store databases.

Licensing

Microsoft provides a cost-effective and flexible licensing model for SQL Server 7.0 in an in-store environment.  This includes the ability to license limited-use and full-use versions of the product as well as the ability for Independent Software Vendors (ISVs) to embed SQL Server as part of their solution.  Furthermore, the SQL Server 7.0 data engine will be included with the release of Office 2000 and will be licensed similar to the Jet engine.  This allows all development in Office 2000 to be SQL Server compliant and creates a seamless upgrade path to the Enterprise version of SQL Server 7.0. 

SQL Server 7.0 is currently available in three different versions, namely, Standard, Enterprise, and Small Business Server (SBS). In addition, users covered by a per-seat license from any of these three editions can install a Desktop SQL Server installation on their client computer.

q  The performance of the Small Business Server Edition is limited to the throughput typical of 50 concurrent users, although individual SBS installations may be licensed for fewer than 50 users. The performance of the Standard and Enterprise editions is limited only by the hardware and operating systems on which they run.

q  SQL Server Desktop is a fully featured relational database management system (RDBMS), targeted for personal and shared use, that runs on Windows 95, Windows 98, or Windows NT Workstation. SQL Server Desktop shares the same code base as SQL Server 7.0 and is available on the SQL Server 7.0 and SQL Server 7.0 Enterprise Edition CD. SQL Server Desktop may be deployed and used on each device to which a per-seat SQL Server Client Access License has been dedicated.

q  The SQL Server Desktop edition scales effectively over 2 Symmetrical Multi-Processing (SMP) CPUs and has optimizations to minimize memory usage.  Furthermore, the Desktop edition supports full merge and snapshot replication and supports only subscriptions to transaction replications.  It cannot publish transaction replications.

q   In addition, the SQL Server Desktop edition does not include the SQL Server OLAP Services software or the English Query capabilities present in the Standard and Enterprise versions.  Features such as parallel queries, failover clustering, and extended memory addressing are not supported in this edition.

q  The SBS edition runs on the Microsoft BackOffice Small Business Server and has a 10 GB size limitation on a per-database basis.  It scales over 4 SMP CPUs and provides full-text search capabilities.  However, features such as OLAP Services, failover clustering, and extended memory addressing are not supported in this edition.

q  The Standard edition includes SQL Server OLAP Services as well as English Query and contains no performance or load controlling mechanism.  It supports up to four SMP CPUs and 2 GB of RAM. The Standard edition does not support failover clustering or extended memory addressing.

q  The Enterprise edition includes all the features contained in the Standard version and adds failover clustering and support for up to 32 SMP CPUs and 3 GB of RAM.  It runs only on the Microsoft Windows NT, Enterprise Edition platform.

q  All editions of SQL Server 7.0 contain fixes for known year 2000 issues. For the most current year 2000 compliance information on SQL Server 7.0, please visit www.microsoft.com/year2000.

For more information on these versions refer to the SQL Server Books Online.  For the latest licensing and pricing information, refer to the www.microsoft.com/sql site.

Conclusion

Microsoft SQL Server 7.0 is a compelling choice for an in-store database platform primarily due to its built-in capabilities for remote administration, scalable and reliable transaction processing, polling solutions for different scenarios, and its market leading price/performance ratio.  This document detailed the keys to a successful implementation of SQL Server 7.0 in an in-store environment, which can be summarized as follows:

q  Infrastructure issues including choice of network architecture, protocols, and Windows NT Domain models will affect general network performance.  Fast Ethernet running TCP/IP only is the preferred network architecture. The Workgroup or no domain model is a good low-administration solution for an in-store environment.

q  Automated database expansion carries a penalty if overused and hence should be avoided by sizing the databases appropriately during installation.  Initial sizing is also important for system databases such as master, tempdb, and the transaction log files.

q  Avoid the use of complex triggers for multi-row updates since it degrades performance.  Exploit the power of stored procedures and the new index storage mechanism during application development.

q  Analyze the size and complexity of the deployment requirements along with the costs involved in the development and maintenance of custom software in selecting the proper software distribution and management solution.  For small organizations with minimal software distribution needs, custom applications, incorporating RAS for disconnected stores, provides a flexible, low-cost solution.  However, for most organizations, third party solutions such as SMS or Connect:Remote along with some customized scripts provide the most scalable and reliable solution for software distribution and desktop management.  Remote Control software should be used to provide monitoring capabilities for the in-store devices.

q   Set up a dedicated master server at the corporate site and enlist each in-store server as a target server.  Jobs and Alerts can then be defined and monitored centrally while being executed on each target server.

q  Full, Differential, and Transaction Log backups should be used to provide up-to-the-minute recovery for the in-store database. 

q  The Database Maintenance Plan Wizard should be used to schedule periodic integrity checks and update the statistics used by the query optimizer.

q  Examine the software development and maintenance costs involved along with the need to integrate the administration tools for software distribution and polling, in selecting the appropriate solution for polling.

q  SQL Server 7.0 provides three major types of replication for use as a solution to the common in-store task of polling if integration of administration tools is considered a lower priority than reducing software development and maintenance costs.  Transactional replication works best for transferring data on an ongoing basis between the in-store database and the corporate database.  In addition, occasionally connected stores can be supported using merge replication.

q  If custom development is not a restricting factor, the BCP and DTS utilities in SQL Server 7.0 provide the ability to integrate the polling solution with a software distribution solution and thereby reduce the administration costs as well as the deployment and support complexity.

q  The in-store infrastructure along with SQL Server utilities such as SQL Server Profiler, NT Performance Monitor, and the Enterprise Manager should be used to provide full support for centralized helpdesk and troubleshooting operations.

 

Finding More Information

q  Microsoft SQL Server Books Online provides information on SQL Server architecture and database tuning along with complete documentation on command syntax and administration. SQL Server Books Online can be installed from the SQL Server installation media on any SQL Server client or server installation. It is recommended that any machine that will have someone actively working on SQL Server have SQL Server Books Online installed on the hard disk for easy access.

q  For the latest information on Microsoft SQL Server, including other white papers on SQL Server 7.0, visit the Microsoft SQL Server Web site at http://www.microsoft.com/sql/.

q  For the most current year 2000 compliance information on SQL Server 7.0, please visit www.microsoft.com/year2000.

q  For detailed performance tuning information on SQL Server 7.0, refer to the Microsoft white paper titled “Microsoft SQL Server 7.0 Performance Tuning Guide” at http://msdn.microsoft.com/developer/sqlserver/sql7perftune.htm

q  Compaq has updated its RAID white paper, which provides 50 pages of excellent information on database server performance. Note that the 3 pages of Microsoft SQL Server-specific information in this white paper pertain to version 6.5 and are not applicable to SQL Server 7.0. The white paper is titled "Configuring Compaq RAID Technology for Database Servers" and is located at http://www.compaq.com/support/techpubs/whitepapers/ecg0110598.html.

q  A 30-page white paper from Compaq's Windows NT integration team titled "Disk Subsystem Performance and Scalability" is located at http://www.compaq.com/support/techpubs/whitepapers/ecg0250997.html.  It details hardware performance characteristics of Compaq hard drives and physical drive behavior. The information contained in this paper will be applicable to SCSI hard drives available from Compaq or other vendors.