Chapter 20 - Setting Up and Managing Replication

How to set up replication publications - SQL's capabilities to make information available to other servers adds new, distributed capabilities to your systems.
How to set up a SQL Server to receive information that is being replicated - Once information is available to your system, there are many different ways you can incorporate it into your database.
The different ways you can distribute information in your system to provide more meaningful data to your users. - You can divide information to limit access to only certain elements of a database table.

In this chapter, we'll be going over the fundamentals of replication, including how to install and use it and how it might be a good fit for your projects. Replication is an intricate feature of SQL Server. The boundaries of it have not been defined yet and there will be innovative projects that stretch these boundaries still to come.

Replication is a broad term, and to understand how to install and use it, you first need to understand what it is and what it is not. To help with this, we'll first be taking a look at some of the terms that Microsoft is using, how they implement the replication capabilities, and what it means to your system.

Understanding the Basics

Replication with SQL Server provides you with the capability to duplicate information between servers. This capability brings with it the ability to synchronize information sources for multiple domains, even in cases where they are physically separated by distance and possibly poor communications links. Replication, at its most fundamental level, recognizes changes made to information in a database and sends these changes to the remote system or systems for their use.

Microsoft has employed what is called Loose integration in their replication model. You may have heard of the phrase, real-enough time, in systems implementation. Loose integration follows the real-enough time model. This means that information will flow across the system of servers not in real-time and not necessarily in batch mode, but as quickly as it can. The phrase, real-enough time, was first introduced when information was being distributed by e-mail to remote locations. It was certainly not real-time, as no live connection was maintained between sites.

Using e-mail didn't implement a batch approach, as often transactions were often still addressed on a transaction-by-transaction basis. With the e-mail system, you're certain that your transactions are going to make it to the destination when a connection is completed between their system and yours.

SQL Server will synchronize the database tables—often almost immediately in many cases—but not concurrently with the transaction that made the original change that is to be replicated.

Certainly there are pros and cons to this approach. On the positive side, the user of the system will not be waiting for a remote server to respond before the application can continue. This alone can be a big time saver. Another benefit is that the system can manage some fault-tolerance in the queue of things to be done.

If you have a transaction that is going to a remote server and the remote system is down, the transaction can be queued for later processing. The server engine on the local side, the distribution engine, can retry the connection to, and update of, the remote server at regular intervals and make sure the transaction happens as soon as possible.

Another positive side-effect of this approach is a realistic approach to wide area networks that may need to be connected over slow links. Transactions can be handled as quickly as the connection allows, without bogging down applications that may be using the database tables that are to be replicated.

On the negative side, it's possible to have information in two databases that are using this schema that is out of date or different on different sides of the replication equation. With true replication as implemented in SQL Server at this time, this is a situation that you cannot control, but may have to address if this presents a problem. Remember, there are limited situations where this might occur, such as a scheduled connection was unavailable or a server went down. In either case, as soon as the connection was available, the databases would be re-synchronized at the next communication with each other.

If you have multiple databases, tables, or other entities that must be absolutely in sync, you'll need to work with the two-phase commit capabilities of SQL Server. This entails custom development on your part using the C API and working with DB-Library. Be sure to review the documentation provided with SQL Server for more information about this option.

There are three different key physical layer components to the SQL replication model. They are the Subscriber, Publisher, and Distributor. Each of these components must be in place in order to have a functioning replication model.

Is It Distributed or Replicated Information?

There is a distinct implementation difference between distributing your information and replicating it. In many installations, you'll probably find that you have both scenarios. When you decide to implement a system based on multiple SQL Servers, you'll need to be very careful to define what information will reside on which SQL Servers. This sounds obvious, but you need to take it to the next step—beyond just determining functional requirements.

In many systems that are disbursed, the requirement to implement replication comes from the desire to have information available at a central location. This requirement, combined with the information to be centralized having a remote starting location, provides the foundation for a replication scenario.

This is certainly a system that can be automated using the replication capabilities of SQL Server. The primary requirement is that you put a SQL Server at each location and initiate the replication from the remote locations to the central location. Another alternative may be available if you reverse the flow of information shown in Figure 20.1.

Fig. 20.1 - Basic replication model shows information flowing between different points in your system.

Adding a distributed approach to this scenario might be helpful to enhance your application's performance. Distributed systems, or components of systems, lend themselves to situations that include some or all of the following attributes:

The information required at each individual location is different and distinct.
The information from one location does not need to be available, in its most up to date form, at other locations. Somewhat aged information, if it's needed at all, must suffice at the alternate locations.
There is an intelligent agent running at the remote locations, and that agent must be able to respond to basic data request and manipulation requirements, such as copying working sets to a temporary database, and so forth.

A good example of a distributed system is a point-of-sale (POS) system. In a POS system, you'll often have sales information at each individual location, but other locations do not have to have that sales information until it has been posted to a central location, if at all (see fig. 20.2).

Fig. 20.2 - A typical wide area network point-of-sale system lends itself to a mixed distribution- and replication-based system.

In this configuration, you can see that information flow between the store locations is minimal, and, when it does occur, it's likely to come from the corporate headquarters. Systems like this lend themselves to both a distributed information approach and a replicated scenario. First, you'll need to understand that replicated information is read-only at the recipient location. We'll cover more about how to work with this later in the chapter in the troubleshooting section, but you can use the fact that things are read-only to your advantage.

In our POS example, you can use replication to send all inventory information, including pricing, store inventory levels, and other store's sales information, to the different remote locations. Because this information is reasonably static, you can use a scheduled exchange of this information, rather than a continuous update. You'll want to keep this in mind as we walk through setting up replication in the coming sections.

Distributed information can be maintained at the stores. This information will include sales information, customer update information about purchases, credit payments, etc. You can refer to this as distributed because the stores operate on this information separately from the corporate headquarters. They can create, update, delete, and otherwise manage this information on their own.

Using SQL Server to manage distributed information at the remote locations is not a requirement. You can consider using less expensive options if needed. Other databases that use ODBC are key candidates for the remote locations, as they'll be able to connect most easily to SQL Server later to update the main database systems.

In short, you should use a distributed environment to provide a fault-tolerant, independent system. The information produced under these routines, and the routines designed to create the distributed information, may or may not be required at the central location. Replicated systems are ideal for information that doesn't change very frequently at the recipients' location. Keep this in mind as you review the different replication and installation options that we'll be reviewing throughout this chapter.

SQL Server Replication Fundamentals

SQL Server's replication features follow a magazine-type analogy to relate the different roles that are parts of the equation. The model follows an I'll Publish—You Subscribe approach that's easy to follow and implement. In the next sections, we'll walk through setting up the different options for replication on both sides of the equation.

SQL Server 6.5 has added the ability to replicate to ODBC subscribers, such as Access and Oracle. As stated in the SQL Server documentation, the ODBC subscriber must meet the following criteria:

Must be ODBC Level 1-compliant.
Must be 32-bit, thread safe, and for the processor architecture (Intel, PPC, MIPS, Alpha) that the distribution process runs on.
Must be transaction-capable.
Must support the Data Definition Language (DDL).
Cannot be read-only.

Publishing: Providing Information to Other Systems

When you decide you're going to be using replication to provide information to other systems, you, in effect, become a publisher. You'll be publishing your information for other systems to receive. At the highest level, the information you provide to other systems is called a publication. Publications consist of articles, which are the items that will be provided to the other systems. Articles are discrete pieces of information that range from the entire contents of a database to a single row or the result of a query. As you'll see when we review setting up a publication a little later in the chapter, there are several different ways to dissect the information you'll be providing via replication, as follows:

Entire databases
Entire tables
Horizontal partitions of information
Vertical partitions of information
Custom views of information

Each of these has its advantages and disadvantages. Which you use will depend entirely on what types of information you're replicating and how it will be used by the remote system.

Any table you want to replicate must have a unique index defined for it. If you don't have one defined, it will not show up in the list of available tables when you define the publication.

Entire Databases

When you provide entire database articles to the remote system, you'll be asking SQL Server to monitor all activity for that database and update the users of the information with any changes that occur. This is probably the simplest form of replication to administer, as you're telling SQL Server to send everything.

Keep in mind the basic covenant of replicated tables, though, that the information is read-only at the recipient. It may be that this would be too restrictive on your system to implement this type of full-blown replication.

If you're creating a data warehousing application—one in which you simply provide snapshot type information to other locations—this may be just the ticket. In these types of applications, you want to provide a picture of the information at a given time. You can create a database of queriable information, provide it to these remote users, and need not to worry about the information being changed. You can also be assured of the most recent and updated information at these locations, as SQL Server will be monitoring it for you and initiate updates on the schedule you designate.

The replication of tables and specific partitions of tables is often the way to go when you have a modifiable, production environment that you're exporting information to.

Entire Tables

When you replicate entire tables, you specify a table that is monitored and sent out by the replication engine whenever changes are made. This table will be kept up to date based on the export timing and criteria you establish in setting up the publication article.

You can establish more than one article for a given table. This may be helpful if you want to provide all information to a production, administrative server, but only limited information to a reports-only server that is used by non-administrative personnel. Be aware, though, that each publication you establish will require resources on the server to process the replication event. If possible, consider publishing the table once; then create a view or other protected viewing mechanism on the receiving server as it works with the client-side software.

Horizontal and Vertical Partitions of Information, Custom Views

When you publish horizontal partitions, you're using a SELECT statement that will provide all columns of information, but is selective on the rows of information that will be considered. For example, consider the following simple SELECT statement:

Select * from Authors where au_lname like "W%"

This would select only those authors whose last name started with a W. These are the authors that would be included in published information.

On the other hand, it may be that you want to only publish certain columns of information to the remote system. For example, the Authors table in the Pubs database installed by default with SQL Server will include a Contract field indicating whether the author is under contract. If you're replicating a list of authors to a remote location, the users may not need to know whether you've established a contract with the author. In that case, you can select all fields except the Contract field to be included in the replication article.

You can also export a combination of a vertical and horizontal partition, or you can replicate the results of a view. The selective replication capabilities of SQL Server are the power behind the tool that really begins to broaden the appeal of replication.

Subscribing: The Recipients of Information

Once you've set up what information you'll be publishing with the articles and other information covered earlier, you'll need to move to the subscription system and let it know where to find the information that it will be the recipient of. The recipient is called the subscriber in the replication scenario. The subscriber sets up a connection to the distribution serverand receives the information at the intervals you established when you created the article to be replicated.

Subscriptions include the ability to designate where the information will go, which will enable you to control who has access to it by using the standard NT and SQL security capabilities.

The Log Reader Process

There are two silent partners in the replication process that are always running and performing their tasks, but are much less visible when compared to the subscriber and publisher roles. These are the log reader process and the distribution database.

Replication in SQL Server is transaction log-based. As changes are made to articles that are declared for replication, they show up in the transaction log. The log reader process, an automatic background task on the server, detects these changes and logs them to the distribution server's distribution database.

Once a transaction has been processed for replication and put in the distribution database, the replication engine will publish it and make it available to the other servers that need it. You can see the log reader process if you look at the running processes in SQL Server. You'll see the process in the list of active connections.

If there are any synchronization jobs pending, you'll also see those jobs listed in the pending tasks area of the running tasks list.

Distribution Server: The Source of Information

The distribution server and database serve as the go-between for the publication server and the subscription server. When you set up the system—as you'll see when you go through the installation process—you'll have the option to set up a local distribution server or a remote system.

While the distribution database is separate from the other databases in your SQL Server system, it can still be on the same physical server. When you set up a local distribution database on the publication server, SQL Server will use this new database as the mechanism to keep track of the different items that need to be provided to the subscription servers on the network.

The distribution database is also the storage location for the different stored procedures that make up the replication engine. These stored procedures are automatically called when you use replication and are used to automate the different processes that happen under the covers when replication is running.

If you choose to use a remote system as your distribution database, the database is created or referenced on a different server from the publication server. This can be beneficial on a system where you may have high transaction volumes or may otherwise have a server that is bogged down just in the normal processing of transactions or network requests. In these situations, it can be beneficial to have this other system—the distribution database hosting system—manage the replication of information to the subscription servers.

You must have 32M of RAM on the server that will be the distribution system if you combine the distribution and publication operations on a single server. In addition, you may need to install additional memory if a given distribution server is to be used by more than one publication server. It may also be beneficial to implement a multi-processor server in cases where workload on the server is substantial.

You generally won't be working directly with the log reader processes or the distribution database, but it's important to understand their function in order to correctly set up your system.

Different Server Configurations for Replication

When you establish replication, you will save yourself time later if you take some time now to figure out exactly what the best way to lay out your system is. As with database definitions and ERDs discussed in Chapter 2, "Understanding Data Modeling and Database Design" planning early in the development process will really pay off when it comes time to move your system into production.

The configuration options for replication—at least on a physical implementation level—boil down to three different approaches, although there are certainly variations on these that may work well for some situations. The following are the basic three topologies:

Combined single publisher/distribution, single subscribers
Single publisher, distribution system, subscribers
Single publisher, subscriber, publisher, subscribers (data warehousing)

Each of these is briefly described in the upcoming sections.

Combined Single Publisher/Distribution, Single Subscribers

The most basic type of installation is the combined publisher/distribution server and a single subscriber system. This enables you to configure a server where the transactions for your system take place and the information is replicated to the subscribing server. Figure 20.3 shows a simple replication configuration.

Fig. 20.3 - This is a simple system using a single server as the publication and distribution system and a single subscriber.

The figures and examples we're using indicate a one-way flow of information. You can also install replication's publication features on a subscribing server and effectively replicate other information back to the original publication server. To put it another way, because a system can be both a publication and a subscription server, information can flow in both directions. In this case, you just need to establish appropriate articles for publication.

It's likely that many systems that are not extremely high-volume will fall into this category, as it's the most straight-forward to set up and maintain. However, be sure to consider moving the distribution processes if your transaction volume begins to cause the server to become bogged down in processing both application and replication requests.

Single Publisher, Distribution System, Subscribers

By splitting apart the publication and distribution model, you can begin to address specific performance bottlenecks that may be apparent in your system. If you implement a model of this type, you'll be able to add processors, memory, or other resources to systems as they become over-utilized. As you can see in Figure 20.4, it may be helpful to distribute the configuration of your servers.

Fig. 20.4 - Moving the distribution processes enables you to optimize system configurations for optimum performance.

This system topology is also the first step toward a data warehousing system. By copying the information to the intermediary server, if you implement multiple subscription servers using this information, you'll be able to ensure that accurate, timely information is distributed without an overwhelming impact on the initial publication system.

In systems where you are publishing to multiple subscribers, especially in cases where the publications may be going out over a slow or remote link, a separate distribution server should be carefully considered. In each of these situations, more server attention will be required to complete the replication and is best handled by a separate system.

If you do implement this type of scenario, each system—the publication server, the distribution server, and each of the subscribers—is required to obtain separate SQL Server licenses. This cost will need to be part of the analysis when you are considering using a distribution server in this manner.

Single Publisher, Subscriber, Publisher, Subscribers

A big topic in the database world right now is data warehousing. As we mentioned in the last section, replication lends itself well to data warehousing implementations, as it provides you with an automated way to provide read-only access to users of your system. You know the information will be correct and as up-to-date as you need, and because no intervention will be required, SQL Server's replication engine can be set up to update the remote systems at intervals required by your application.

In this scenario, you are publishing the information to an intermediary server that is responsible for secondary replication to subscribers. This is an excellent way to leverage your servers for their distribution to the users of your information. As you can see in Figure 20.5, one approach may be to link servers to further distribute information.

Fig. 20.5 - Re-publishing information leverages your resources and can be helpful in mass distribution or slow-link to multiple subscriber installations.

This type of installation is also a good candidate for a separate distribution server if there are other operations to be required of the initial subscription and re-publication server. It's also important to keep in mind that memory requirements may increase on the re-publication server to work with the increased number of subscribers.

Installing the SQL Server Replication Services

The first step to installing replication is registering the remote server in the Enterprise Manager. There is nothing unique that you'll need to do to accomplish this, but you must have the server pre-defined prior to beginning the installation of the replication and subscription services.

When you define servers in the Enterprise Manager, it's important that you select the correct type of security to be installed at each server. Selecting the Trusted Connection indicates that Integrated Security is in use. If you select the standard logon connection, you'll need to be sure to specify a user that is a valid user on the remote system and one that has sufficient rights to work with the database tables you are replicating. If you do not do this, you'll receive errors when the replication service tries to connect to the remote server.

When you start the replication installation process, one of the first things SQL Server will do is to verify that you've declared enough memory for SQL Server processes. The minimum required memory allocated to SQL Server is 8196Kb or almost 8MB. You set this option in the Server Configuration dialog box, selected from the Server menu's Configurations option. An example of this is seen in Figure 20.6.

Fig. 20.6 - You must have a minimum of nearly 8M allocated to the SQL Server replication features of your system.

It's generally recommended that you allocate as much RAM as possible to SQL Server. A good guideline is that you should allocate all RAM except approximately 24M to the SQL Server process. For example, in a 64M system, you'd want to allocate approximately 40M to SQL Server. The 24M will handle the network operating system and other base operating requirements of the system.

Once you've established the memory configuration, you'll need to restart SQL Server. You can do this from the Enterprise Manager by selecting the server, and then right-clicking it. Select Stop from the menu. Once the server has been stopped, as indicated by the red traffic light, restart it. This will institute the changes to the memory configuration you just made.

The next step starts the process of installing replication as a publishing server on your system. Prior to setting up the distribution database, you'll need to have established both database and log devices. These must have sufficient disk space allocated to support the database that you'll be replicating. These are the devices that you'll be placing the distribution database and logs onto. The recommended minimum size for these items is 30M, but your size will depend significantly on several different factors, as follows:

What is the size of the databases you'll be replicating? Remember, for each subscriber, there is an initial synchronization that must take place. This means that your distribution database should be at least as large as the combined maximum table size for each table that will be replicated. This size represents the size to support a single article. If you will be publishing multiple articles, you will want to figure out a total size based on all articles combined.
What will the traffic on your system be? More transactions mean more staging of information as it's passed along to the subscribing servers. This will impact the database size requirements, and, therefore, the device size requirements.
How many different articles will you be publishing? Try to avoid using several different articles against a single information source. If you have a table that you need to provide to multiple subscribers, it's worth the effort to try to use only one publication/article combination to fulfill their needs. This is better utilization of disk resources, processing resources and general processing bandwidth at the server.

For more information about setting up devices and databases in SQL Server, see Chapter 4, "Creating Devices, Databases, and Transaction Logs"

Select Replication Configuration from the Server menu, then select Install Publishing from the submenu. This will begin the process of setting up the distribution database and other options that govern the server's operation for replication. Figure 20.7 shows the dialog box that you'll use to set up replication options.

Fig. 20.7 - Set the options for the distribution database and processes for replication.

This dialog box enables you to define whether the distribution process will be local or located on a remote system. If you select local, you'll need to indicate where the distribution database will be installed. If you indicate a remote server, you'll need to know the server name and the SQL Server process must be already running on that system.

When you click OK, the new database will be added to the system and you'll be ready to start setting up specific options to enable publishing on your system. If all goes well, you'll receive a prompt, as shown in Figure 20.8, enabling you to go directly to setting up publications on your system.

Fig. 20.8 - Once the distribution database is created, you can go directly to the definition of your publications and subscriptions.

If the installation of the distribution database fails for any reason, you'll need to address the problem before you can continue. A possible reason that you may experience a problem here is if you've registered a server with a user that is nonexistent or has insufficient rights on the remote system. Make sure you create an appropriate user prior to continuing if this is the problem you experience.

Once you've addressed any system problems that are presented, you may be faced with needing to completely uninstall replication before you can continue. For more information, see the Uninstalling Replication section at the end of this chapter.

Enabling Publishing

The first step to creating your own publications is to, in essence, turn on publication services, designate a frequency for the services, and indicate which databases are candidates for replication. You can get to the configuration options, shown in Figure 20.9, from the Yes/No dialog box after defining the distribution database, or you can select the Publishing option from the Server, Replication Configuration menu.

Fig. 20.9 - With the configuration dialog box, you indicate frequency of updates and which databases are available for replication.

For each server you have registered to the Enterprise Manager, you can indicate whether it is allowed to subscribe to this server by selecting or deselecting the Enable option from the Enable Publishing grid. For each server you will allow to subscribe, you can indicate at what frequency you'll be replicating information to the server. The default is to replicate information continuously starting at 12:00 AM and continuing throughout the day.

If you are using a remote link or are only providing snapshot information to the subscription servers, you may want to change this value to a time when the system is generally under less load. This can help decrease the impact on network traffic that will be incurred when the replication process kicks off. Of course, this places your data into a batch-type mode and may not be acceptable at the end user's application.

Selecting or deselecting databases will control which databases you'll have to select from when you define publications. If you have a secure database that you want to control, this is the ultimate protection. By not making the database available to the replication process, you can be assured that it will not be included in publications.

You may notice the working directory option shown in the dialog box just below the designation of the distribution database. This directory is where information is kept and staged as it is sent out to the subscription servers. You can change this directory if needed. When SQL replicates a database, it is using a method somewhat like a BCP to copy the data from one system to another. This directory is where the data files are built.

It may be desirable to indicate a secured location for this directory if you have concerns about other users modifying or reviewing this information when it's staged. The information is not readily discernible, but should be protected nonetheless.

If you want to create a custom distribution schedule or control how often information is sent based on transaction volume, you can click the Distribution Options button. As shown in Figure 20.10, this option also enables you to indicate how long information is maintained at the distribution server once it has been sent to the remote system.

Fig. 20.10 - You can create custom distribution schedules on a server-by-server basis. This can be helpful to take into account time zone changes and other loading factors.

If you do opt to establish a custom distribution schedule, a separate dialog box enables you to change a number of options. As shown in Figure 20.11, you'll be able to set up how often, on a macro level, the replication will take place. This option allows values that range from daily to monthly. As you select this highest level interval, the balance of the options will change to reflect values that enable you further to define how you would like to have the replication carried out.

Fig. 20.11 - Custom timing for replication can be defined for a number of time intervals.

Congratulations—you're now ready to set up the specific publications and articles that you'll be making available to other systems. You've defined the servers that have access to the information, the frequency at which it will be provided, and you've indicated where the information will come from in your system. The next step is to create the publications that will be replicated.

Publishing Databases: How to Replicate Entire Databases

You may recall that you have two different options when it comes to replicating information on your system. You can either publish the entire database or you can select specific information in the database to replicate. In this section, we'll explain setting up replication for an entire database. In the coming sections, we'll go over setting up targeted articles. To begin, go into the replication management options (see fig. 20.12).

Fig. 20.12 - Use the Manage menu to control all aspects of the replication process after you've completed the initial setup.

Remembering which option to use from the menus or the topology dialog box can be confusing. It can help to keep in mind that you don't generally use the options on the Server menu once you've installed and set up the basic configuration for replication on your system. All aspects that deal with the specific publications and subscriptions are controlled from the Manage menu.

Select Manage, Replication, Publications from the menus to begin working with the different publications that are on your system (see fig. 20.13). This will take you to the dialog boxes that enable you to define and configure the publications and articles that you will be offering.

Fig. 20.13 - The initial Manage Publications dialog box shows the databases you've selected to allow for replication.

The dialog box will show each database for which you've enabled replication during the setup phase. Of course, there will not be any publications listed initially because you've not defined them yet. You use the New, Change, and Remove buttons to manage the different replications in your system. The tree display will expand under each database, showing the different publications that are available once they are defined. Under each publication, you'll be able to see the articles that have been created. From this display, you'll be able to manage all active publications.

Selecting the Tables to Be Published

In the first example, we'll be setting up a AllPubs publication that will replicate all eligible tables in the PUBS database. Remember, you must have a primary key defined for each table that you need to replicate. You can use the Alter Table command to institute a primary key if needed.

For information about modifying table definitions, please refer to Chapter 5, "Creating Database Tables and Using Datatypes"

When you select New, you're presented with the dialog box that enables you to set up the publication and indicate the information that should be replicated. Figure 20.14 shows the dialog box that lists the different tables that are available and enables you to set up the specifics of the article.

Fig. 20.14 - Setting up the article to publish an entire database only entails selecting <all tables> for the article definition.

You'll need to first name the article that you're defining. When you do, make sure you do not include spaces or wild-card characters, such as an asterisk, in the name. If you do, you'll be prompted to change the name when you click Add to save the publication.

You may want to create a naming convention for your publications to make it easier to know what information they provide. This is completely optional, but if you're publishing numerous articles, this information will help the subscription systems determine which publications they need to be working with.

Because we're first setting up an article that publishes all eligible tables from the PUBS database, simply selecting the <all tables> option will suffice for the definition. Each table is copied to the Articles in Publication pane, and you can tell that they're included by the shared icon that you're familiar with in Explorer and File Manager.

Establishing Table Synchronization

When you establish a publication, you are guaranteed that the subscribing system's tables will be initialized to contain the same information as the replicated table. To do this, there is an initial synchronization step that is completed. You can control how and when this step occurs by selecting the Synchronization tab from the Edit Publications dialog box shown in Figure 20.15.

Fig. 20.15 - You can defer synchronization to a time that may suit your system requirements better, possibly saving money if remote connections are used.

If you are replicating information between SQL Servers, you'll want to make sure you leave the default copy method, using Native Format data files, selected. This will enable SQL Server to save the information in as optimized a format as possible when it creates the export file used to initialize the subscriber's database tables. The other option, Character format, will save the exported information in a format that you could use if you were importing to the receiving database using other utilities, or if you're using the information produced by the initial synchronization in a third party system.

You can also set up the schedule of times that controls when the synchronization will occur by clicking the Change button. Establishing these times is very similar to the time frames you set up during the initial configuration of the replication services.

Controlling the Recipients of Subscriptions

A key feature and concern regarding replication of databases is security. You've seen when you set up your system for replication that you can determine what subscribing systems will have overall access to your replication services. This presents a possible challenge, though, when you need to provide access to some, but not all, publications that reside on your replication server.

SQL Server enables you to indicate, on a case-by-case basis, what servers will have access to any given publication. By default, when you set up a new subscription, SQL Server will make it available to any server with overall access to the replication system of your server. To change this, as indicated in Figure 20.16, select the Security tab of the Edit Publications dialog box.

Fig. 20.16 - You can easily prevent access to specific publications depending on what servers are requesting the information.

When servers look to your system to select available publications, if their system has not be authorized to use a given publication, it will not show up in the list of available items. You can change this option later to allow or disallow access to a publication by selecting Security from the initial Publications dialog box.

Making Changes to Publications

Once you've set up publications, you can review them from the Manage Publications window. In Figure 20.17, you can see that each database will show its related publications and articles as they've been defined on the server.

Fig. 20.17 - You can use the tree view of the replication system to select a publication or article to review or modify.

If you want to make a change to a publication, simply highlight it and select Change. You'll be presented with the same dialog box that enabled you to define the publication in the first place. You'll be able to modify the tables that are included, how the information is synchronized, and the security aspects of the publication.

Publishing Partial Databases and/or Tables

Now that you understand how to publish entire databases, we'll review how to selectively publish information from your replication server. As you might imagine, publishing select information can be quite complicated, although it will depend a great deal on how you're trying to limit the information being replicated and how you go about defining that information.

Limiting Publications to a Single Table

The first, and most basic, selective publication is limiting the publication to a single table. You may have noticed that each table available for replication is listed when you define a given publication. In our first example, we published the entire database by selecting the <all tables> option at the top of the list of tables. The alternative to this is to select the individual tables that we want to include in the subscription from the Database Tables list. Once you've highlighted the different tables you want to include, click the Add button and they will be copied to the Articles list (see fig. 20.18).

Fig. 20.18 - You can select one or more individual tables to be included in the publication.

If you do nothing more for this publication, the entire table will be monitored and replicated as part of this publication. You have the same options for managing how the publication will be synchronized and what servers will have access to it as you do when setting up a full-database replication.

The alternative to publishing the entire database table is to partition it. You may recall from earlier sections in this chapter on "Publishing Partial Databases or Tables" that you can partition a table in one of two ways: either horizontally or vertically. The next section shows how you can control this functionality with your publication.

Partitioning Information to Be Included in a Publication

The two partitioning options, horizontal and vertical, correspond to looking at views of your information based on selectively retrieving rows (horizontal) or columns (vertical) from your database tables. Of course, you can combine these two techniques to provide a concise view of the information in the table if needed as well.

To establish a partition of information to be used in the article, you click the Edit button on the Edit Publications dialog box's Articles tab. This will call up the Manage Articles dialog box that shows two new tabs, Filters and Scripts (see fig. 20.19).

Fig. 20.19 - You can select which columns you want to be included in the replication process.

In the example, as discussed earlier in this section, we don't want to include the Contract column information in the replicated information that is provided to other sites with this publication. By scrolling down the list of columns, you can select and deselect the different columns that you want to include. When you've completed the selection, you will have vertically partitioned the database if you've deselected any columns because the publication will include only those fields that you allowed.

Normally, insert, delete, and update operations are managed by the SQL Server replication process automatically. This may be a problem if you're inserting information and need to have another process be kicked off by the action of inserting a new row, for example. In these cases, you can click the Advanced button and you can specify the Insert, Update, and Delete scripts that you'd like to run when records are modified with these operations.

In addition, if you need to do so, you can provide a table-creation script that you need to use in initializing tables. This script will be run instead of the standard table script generated by SQL Server based on the source table's definition. This can be helpful if you're defining tables that will receive replicated data, but perhaps not all columns that need to be in a table.

By default, when the table is created, it will be created with columns to match the article's definition. This means that, in our example of not copying the Contract column, the column would not exist at all in the recipients' databases. You can modify the creation script here and make sure all appropriate information and columns are included. Figure 20.20 shows how you can modify these values.

Fig. 20.20 - You can override SQL's default methods that are used to implement changes to the underlying database tables.

Unless you are very comfortable with modifying these types of SQL scripts, you should consider alternative means of making the changes you need. For example, perhaps you could create a series of triggers on the subscriber that would institute changes needed in the storage of the information so that only certain columns were reused in the end user's system. Remember that changes made here will impact all subscribing systems, and you should ensure that this will not cause unforeseen problems on systems that may vary in their implementation of database structures and table layouts.

For more information on triggers, see Chapter 14, "Creating and Managing Triggers"

You also have the option of controlling the synchronization scripts that will be run by SQL Server. Clicking the Advanced button will let you edit what actions will be taken on synchronization of tables between the publication and subscription systems. The dialog box is shown in Figure 20.21.

Fig. 20.21 - You have full control over how indexes and existing tables will be managed during the synchronization process.

If there is a chance that the table may already exist on the user's system, be sure to check the DROP Table First option. If you don't, you may have a database table in an unknown state.

If you want to see or edit the actual script that will be created when the table is synchronized, you can click the Auto-Generate button. You'll be presented with the listing that will be used to create the table when it's initialized.

After you've completed your changes, click OK from the Manage Article dialog box, then Add from the Manage Publications dialog box. The publication will be added to the list of available publications, and you'll be ready to allow other users to start accessing the publications on your system.

You'll notice in Figure 20.22 that the AllPubs publication shows a secure publication, one with additional limits on who can see and use the information. This is indicated by the key that is showing over the book icon next to the publication name. On the NonSecureAuthorsOnly publication, there is no key, indicating that the publication is open to subscription by any subscription server with access and overall replication authority on your server.

Fig. 20.22 - The new publication and articles are added to the list of active publications, ready for a subscriber to begin receiving the replicated information.

In the next section, we'll go over how you set up subscriptions on the recipient side of the equation. Now that you've set up the publications, it's a straightforward process to indicate which publications any given subscription server needs to begin receiving replication updates for.

Enabling Subscribing

Once the publications have been set up on the publication server, it's a simple task to set up the subscription server to begin receiving the information. Before you can subscribe to a given article, you must follow the same installation steps outlined earlier in this chapter. These steps set up the distribution database and prepare your system to begin working with the replicated information. The only difference is that, when it comes to specifying which databases are visible to the replication process, you'll be specifying which ones can be the recipients of information rather than the source of the information.

For installation instructions, see Installing SQL Server's Replication Services previously in this chapter.

When you're ready to initiate subscriptions, select Manage, Replication, Subscriptions from the menu in Enterprise Manager. As shown in Figure 20.23, this brings up a dialog box very similar to the Manage Publications dialog box used to define and create publications.

Fig. 20.23 - Select the publication that you want to subscribe to from the list of available items for the server and database you are interested in.

You'll be able to drill down into any given publication to see what articles are available and whether you're already subscribed to them or not. Once you've found the publication you want to subscribe to, highlight it and select Subscribe. You'll be able to specify where you want the information to go on your local system once it begins arriving from the replication server (see fig. 20.24).

Fig. 20.24 - When you select a publication, you need to indicate where you want to store the incoming information on your server.

You'll notice that there are three different options available to you for synchronization of the database. The last option is to turn off the synchronization altogether. This means that you know for a fact that the database and the concerned tables are in sync with the master system by some other means.

If you are unsure at all about the validity of the information in the database, it's best to allow the system to perform an initial synchronization on the database.

The first option enables you to let the system do the work for you and do the synchronization of information over the standard network connection between this system and the publication server. If you're on a locally attached LAN or a fast connection that is reliable, this is the best option to select and is the default.

If you're on a slower link, it may be that you've opted to bring the update to the system by another means. This includes copying the initial synchronization files from the publication server to tape or disk and then physically bringing them to the subscription system. If you have a slow link and initially populated tables, this may be a good option, as it will avoid the initial rush of information across an already slow connection between the two systems.

Remember, you can schedule the time at which synchronization takes place. Moving it to off-hours can ease network loading significantly.

After you've set up the subscription to the publication server, you'll notice that the display of subscriptions will be updated to show the active subscription. You can see in Figure 20.25 that, although the subscription is active, it has not yet been synchronized with the publication server.

Fig. 20.25 - The list of subscriptions will change to indicate that you're now subscribed to the publication. It also indicates the destination database.

Uninstalling Replication

There may be times where you need to uninstall replication, either on a server-specific basis or on an overall basis. Examples include a server changing physical locations and no longer needing access to sensitive information, or where you've installed several test scenarios and simply want to start over!

The next two sections will show how you can cover both of these types of situations using facilities within SQL Server.

Disallowing Specific Servers

If you find that you need to remove a server from the list of servers that are eligible to receive information from your publication server, you'll need to determine whether you need to restrict only individual publications or if you need to revoke access altogether. If your intent is to remove access completely, you simply remove the server from the list of eligible candidates that can see the publications on this server.

From the Server menu, select Replication, Publishing You'll be able to indicate which servers are able to subscribe to your server. Figure 20.26 shows the server selection dialog box that will appear.

Uninstalling Replication Completely

Carefully consider all options before you opt to remove replication entirely from the system. This operation will shut down all replication to all subscribing servers, and it does so in a somewhat brut-force manner by removing the replication option from the server and deleting the distribution database.

There are three steps for uninstalling replication in SQL Server, as follows:

Discontinue all replication services—all publication and subscription services—between this system and other servers with which it is working.
Turn off the replication option at the server engine level.
Drop the distribution database.

You will need to follow these steps—in order—to restore your system to its state prior to installing replication. Removing connections to other servers is covered in the section immediately preceding this one. If you're removing replication from a subscription server, select Server, Replication, Subscriptions instead of Publications, but the operations of removing references to foreign servers are the same.

The next step requires issuing an interactive SQL command, so you'll need to use ISQL/W or another utility to enter the command. Enter the following stored procedure call to turn off the replication option on your system:

You must be logged in as SA to complete this option.

sp_ServerOption "<servername>", "dist", false

Replace <servername> with the name of your server. In our example, because our server name is PLUTO, the command would be:

sp_ServerOption "pluto", "dist", false

No results are returned from the call.

This will remove the distribution option from your system, disabling replication. The distribution database you established when you first installed replication will still remain in the system. Dropping this database is the final step in removing replication from your system.

From the Enterprise Manger, right-click the table that you specified.

Be sure you're selecting the distribution database you specified, not your databases that contain active, important production information. If you have any doubt about which database to select, you can review the stored procedures for the database, and you'll be able to see the replication stored procedures there.

If you select an incorrect database, you will be permanently dropping production information, and your only recourse will be to restore from a backup.

From the menu, select Drop Database and confirm that you want to drop the database. If all has gone well, you can make a quick check by selecting Server, Replication from the menus. You should have the Install Replication option available once again.

I'm trying to define publications on my server, but when I get to the point of designating the table to replicate, the table I need is not listed. Why not?

Replication with SQL Server requires that you have a primary key defined for any table you need to replicate. Define a key for the table and retry the definition for the publication. The table will then appear in the list, ready for you to select it to be replicated.

I've created a publication, and I've subscribed to it. The table doesn't appear on the subscription machine. What could be wrong?

A number of things can be happening that can cause the replicated table to not yet appear on the subscription system. First, keep in mind that the replication may have been established on a scheduled basis. It may be that you set up the replication to occur, at least for synchronization tasks, during the night or some other off-peak time. Check the publication setup options on the publication server to verify when you expect table synchronization tasks to be performed.

Second, it's possible that enough time hasn't passed since the publication and subscription were set up. By default, synchronization occurs every five minutes. If your table is large, it may take some time to physically transfer the information to the subscription system, or it may be that the initial five minutes has not yet passed.

Third, the connection may be down between the two servers. Check to see that you can otherwise connect to the servers and make sure that they are up, running, and allowing logons. You'll also want to check utilization on the servers. Microsoft recommends 32M of RAM on the publication system. With less RAM, it's possible that you're running into memory constraint issues and that the server is swapping more information than it should have to while performing the operations demanded of it with replication.

I can see the replicated tables on the subscription server, but when my application runs, it generates errors indicating that I don't have sufficient rights to the information. I'm signed into SQL Server as SA, but still get these messages. What could the problem be?

SQL Server's replicated tables are read-only on the subscribing system. If needed, you'll have to create either triggers on the incoming table that insert the replicated information into a working table, or you'll need to modify your system to use the tables differently, allowing for the read-only nature of the replicated tables.

From Here...

Replication is a powerful tool, especially in distributed environments. By putting replication into your systems, you can provide an excellent level of usability for applications, a great security layer for the underlying database tables, and better overall data availability.

For additional information about the topics covered in this chapter, please refer to the following chapters:

Chapter 4, "Creating Devices, Databases, and Transaction Logs" provides additional information about creating the databases that you'll be using in the replication process. It also includes information about the management of the transaction log for your applications.
For information on managing databases and creating databases and tables, see Chapter 5, "Creating Database Tables and Using Datatypes"
In Chapter 10, "Managing and Using Indexes and Keys" information about defining primary keys is provided. You'll need this information to enable replication for your tables.
Chapter 14, "Creating and Managing Triggers" includes information that may help you create subsystems for further distributing information that has been replicated—all automatically.

QUE Home Page

For technical support For our books And software contact support@mcp.com

Table of Contents

19 - SQL Server Security

21 - Communicating with SQL Server