Chapter 20 - Setting Up and Managing Replication
In this chapter, we'll be going over the fundamentals of replication, including how to install and use it and how it might be a good fit for your projects. Replication is an intricate feature of SQL Server. The boundaries of it have not been defined yet and there will be innovative projects that stretch these boundaries still to come.
Replication is a broad term, and to understand how to install and use it, you first need to understand what it is and what it is not. To help with this, we'll first be taking a look at some of the terms that Microsoft is using, how they implement the replication capabilities, and what it means to your system.
Microsoft has employed what is called Loose integration in their replication model. You may have heard of the phrase, real-enough time, in systems implementation. Loose integration follows the real-enough time model. This means that information will flow across the system of servers not in real-time and not necessarily in batch mode, but as quickly as it can. The phrase, real-enough time, was first introduced when information was being distributed by e-mail to remote locations. It was certainly not real-time, as no live connection was maintained between sites.
Using e-mail didn't implement a batch approach, as often transactions were often still addressed on a transaction-by-transaction basis. With the e-mail system, you're certain that your transactions are going to make it to the destination when a connection is completed between their system and yours.
SQL Server will synchronize the database tablesoften almost immediately in many casesbut not concurrently with the transaction that made the original change that is to be replicated.
Certainly there are pros and cons to this approach. On the positive side, the user of the system will not be waiting for a remote server to respond before the application can continue. This alone can be a big time saver. Another benefit is that the system can manage some fault-tolerance in the queue of things to be done.
If you have a transaction that is going to a remote server and the remote system is down, the transaction can be queued for later processing. The server engine on the local side, the distribution engine, can retry the connection to, and update of, the remote server at regular intervals and make sure the transaction happens as soon as possible.
Another positive side-effect of this approach is a realistic approach to wide area networks that may need to be connected over slow links. Transactions can be handled as quickly as the connection allows, without bogging down applications that may be using the database tables that are to be replicated.
On the negative side, it's possible to have information in two databases that are using this schema that is out of date or different on different sides of the replication equation. With true replication as implemented in SQL Server at this time, this is a situation that you cannot control, but may have to address if this presents a problem. Remember, there are limited situations where this might occur, such as a scheduled connection was unavailable or a server went down. In either case, as soon as the connection was available, the databases would be re-synchronized at the next communication with each other.
There are three different key physical layer components to the SQL replication model. They are the Subscriber, Publisher, and Distributor. Each of these components must be in place in order to have a functioning replication model.
In many systems that are disbursed, the requirement to implement replication comes from the desire to have information available at a central location. This requirement, combined with the information to be centralized having a remote starting location, provides the foundation for a replication scenario.
This is certainly a system that can be automated using the replication capabilities of SQL Server. The primary requirement is that you put a SQL Server at each location and initiate the replication from the remote locations to the central location. Another alternative may be available if you reverse the flow of information shown in Figure 20.1.
Fig. 20.1 - Basic replication model shows information flowing between different points in your system.
Adding a distributed approach to this scenario might be helpful to enhance your application's performance. Distributed systems, or components of systems, lend themselves to situations that include some or all of the following attributes:
A good example of a distributed system is a point-of-sale (POS) system. In a POS system, you'll often have sales information at each individual location, but other locations do not have to have that sales information until it has been posted to a central location, if at all (see fig. 20.2).
Fig. 20.2 - A typical wide area network point-of-sale system lends itself to a mixed distribution- and replication-based system.
In this configuration, you can see that information flow between the store locations is minimal, and, when it does occur, it's likely to come from the corporate headquarters. Systems like this lend themselves to both a distributed information approach and a replicated scenario. First, you'll need to understand that replicated information is read-only at the recipient location. We'll cover more about how to work with this later in the chapter in the troubleshooting section, but you can use the fact that things are read-only to your advantage.
In our POS example, you can use replication to send all inventory information, including pricing, store inventory levels, and other store's sales information, to the different remote locations. Because this information is reasonably static, you can use a scheduled exchange of this information, rather than a continuous update. You'll want to keep this in mind as we walk through setting up replication in the coming sections.
Distributed information can be maintained at the stores. This information will include sales information, customer update information about purchases, credit payments, etc. You can refer to this as distributed because the stores operate on this information separately from the corporate headquarters. They can create, update, delete, and otherwise manage this information on their own.
In short, you should use a distributed environment to provide a fault-tolerant, independent system. The information produced under these routines, and the routines designed to create the distributed information, may or may not be required at the central location. Replicated systems are ideal for information that doesn't change very frequently at the recipients' location. Keep this in mind as you review the different replication and installation options that we'll be reviewing throughout this chapter.
Each of these has its advantages and disadvantages. Which you use will depend entirely on what types of information you're replicating and how it will be used by the remote system.
Keep in mind the basic covenant of replicated tables, though, that the information is read-only at the recipient. It may be that this would be too restrictive on your system to implement this type of full-blown replication.
The replication of tables and specific partitions of tables is often the way to go when you have a modifiable, production environment that you're exporting information to.
On the other hand, it may be that you want to only publish certain columns of information to the remote system. For example, the Authors table in the Pubs database installed by default with SQL Server will include a Contract field indicating whether the author is under contract. If you're replicating a list of authors to a remote location, the users may not need to know whether you've established a contract with the author. In that case, you can select all fields except the Contract field to be included in the replication article.
You can also export a combination of a vertical and horizontal partition, or you can replicate the results of a view. The selective replication capabilities of SQL Server are the power behind the tool that really begins to broaden the appeal of replication.
Subscriptions include the ability to designate where the information will go, which will enable you to control who has access to it by using the standard NT and SQL security capabilities.
Replication in SQL Server is transaction log-based. As changes are made to articles that are declared for replication, they show up in the transaction log. The log reader process, an automatic background task on the server, detects these changes and logs them to the distribution server's distribution database.
Once a transaction has been processed for replication and put in the distribution database, the replication engine will publish it and make it available to the other servers that need it. You can see the log reader process if you look at the running processes in SQL Server. You'll see the process in the list of active connections.
While the distribution database is separate from the other databases in your SQL Server system, it can still be on the same physical server. When you set up a local distribution database on the publication server, SQL Server will use this new database as the mechanism to keep track of the different items that need to be provided to the subscription servers on the network.
The distribution database is also the storage location for the different stored procedures that make up the replication engine. These stored procedures are automatically called when you use replication and are used to automate the different processes that happen under the covers when replication is running.
If you choose to use a remote system as your distribution database, the database is created or referenced on a different server from the publication server. This can be beneficial on a system where you may have high transaction volumes or may otherwise have a server that is bogged down just in the normal processing of transactions or network requests. In these situations, it can be beneficial to have this other systemthe distribution database hosting systemmanage the replication of information to the subscription servers.
You generally won't be working directly with the log reader processes or the distribution database, but it's important to understand their function in order to correctly set up your system.
"Understanding Data Modeling and Database Design" planning early in the development process will really pay off when it comes time to move your system into production.
The configuration options for replicationat least on a physical implementation levelboil down to three different approaches, although there are certainly variations on these that may work well for some situations. The following are the basic three topologies:
Each of these is briefly described in the upcoming sections.
Fig. 20.3 - This is a simple system using a single server as the publication and distribution system and a single subscriber.
It's likely that many systems that are not extremely high-volume will fall into this category, as it's the most straight-forward to set up and maintain. However, be sure to consider moving the distribution processes if your transaction volume begins to cause the server to become bogged down in processing both application and replication requests.
Fig. 20.4 - Moving the distribution processes enables you to optimize system configurations for optimum performance.
This system topology is also the first step toward a data warehousing system. By copying the information to the intermediary server, if you implement multiple subscription servers using this information, you'll be able to ensure that accurate, timely information is distributed without an overwhelming impact on the initial publication system.
In systems where you are publishing to multiple subscribers, especially in cases where the publications may be going out over a slow or remote link, a separate distribution server should be carefully considered. In each of these situations, more server attention will be required to complete the replication and is best handled by a separate system.
In this scenario, you are publishing the information to an intermediary server that is responsible for secondary replication to subscribers. This is an excellent way to leverage your servers for their distribution to the users of your information. As you can see in Figure 20.5, one approach may be to link servers to further distribute information.
Fig. 20.5 - Re-publishing information leverages your resources and can be helpful in mass distribution or slow-link to multiple subscriber installations.
This type of installation is also a good candidate for a separate distribution server if there are other operations to be required of the initial subscription and re-publication server. It's also important to keep in mind that memory requirements may increase on the re-publication server to work with the increased number of subscribers.
When you start the replication installation process, one of the first things SQL Server will do is to verify that you've declared enough memory for SQL Server processes. The minimum required memory allocated to SQL Server is 8196Kb or almost 8MB. You set this option in the Server Configuration dialog box, selected from the Server menu's Configurations option. An example of this is seen in Figure 20.6.
Fig. 20.6 - You must have a minimum of nearly 8M allocated to the SQL Server replication features of your system.
The next step starts the process of installing replication as a publishing server on your system. Prior to setting up the distribution database, you'll need to have established both database and log devices. These must have sufficient disk space allocated to support the database that you'll be replicating. These are the devices that you'll be placing the distribution database and logs onto. The recommended minimum size for these items is 30M, but your size will depend significantly on several different factors, as follows:
For more information about setting up devices and databases in SQL Server, see Chapter 4, "Creating Devices, Databases, and Transaction Logs"
Select Replication Configuration from the Server menu, then select Install Publishing from the submenu. This will begin the process of setting up the distribution database and other options that govern the server's operation for replication. Figure 20.7 shows the dialog box that you'll use to set up replication options.
Fig. 20.7 - Set the options for the distribution database and processes for replication.
This dialog box enables you to define whether the distribution process will be local or located on a remote system. If you select local, you'll need to indicate where the distribution database will be installed. If you indicate a remote server, you'll need to know the server name and the SQL Server process must be already running on that system.
When you click OK, the new database will be added to the system and you'll be ready to start setting up specific options to enable publishing on your system. If all goes well, you'll receive a prompt, as shown in Figure 20.8, enabling you to go directly to setting up publications on your system.
Fig. 20.8 - Once the distribution database is created, you can go directly to the definition of your publications and subscriptions.
Once you've addressed any system problems that are presented, you may be faced with needing to completely uninstall replication before you can continue. For more information, see the Uninstalling Replication section at the end of this chapter.
Fig. 20.9 - With the configuration dialog box, you indicate frequency of updates and which databases are available for replication.
For each server you have registered to the Enterprise Manager, you can indicate whether it is allowed to subscribe to this server by selecting or deselecting the Enable option from the Enable Publishing grid. For each server you will allow to subscribe, you can indicate at what frequency you'll be replicating information to the server. The default is to replicate information continuously starting at 12:00 AM and continuing throughout the day.
If you are using a remote link or are only providing snapshot information to the subscription servers, you may want to change this value to a time when the system is generally under less load. This can help decrease the impact on network traffic that will be incurred when the replication process kicks off. Of course, this places your data into a batch-type mode and may not be acceptable at the end user's application.
Selecting or deselecting databases will control which databases you'll have to select from when you define publications. If you have a secure database that you want to control, this is the ultimate protection. By not making the database available to the replication process, you can be assured that it will not be included in publications.
You may notice the working directory option shown in the dialog box just below the designation of the distribution database. This directory is where information is kept and staged as it is sent out to the subscription servers. You can change this directory if needed. When SQL replicates a database, it is using a method somewhat like a BCP to copy the data from one system to another. This directory is where the data files are built.
If you want to create a custom distribution schedule or control how often information is sent based on transaction volume, you can click the Distribution Options button. As shown in Figure 20.10, this option also enables you to indicate how long information is maintained at the distribution server once it has been sent to the remote system.
Fig. 20.10 - You can create custom distribution schedules on a server-by-server basis. This can be helpful to take into account time zone changes and other loading factors.
If you do opt to establish a custom distribution schedule, a separate dialog box enables you to change a number of options. As shown in Figure 20.11, you'll be able to set up how often, on a macro level, the replication will take place. This option allows values that range from daily to monthly. As you select this highest level interval, the balance of the options will change to reflect values that enable you further to define how you would like to have the replication carried out.
Fig. 20.11 - Custom timing for replication can be defined for a number of time intervals.
Congratulationsyou're now ready to set up the specific publications and articles that you'll be making available to other systems. You've defined the servers that have access to the information, the frequency at which it will be provided, and you've indicated where the information will come from in your system. The next step is to create the publications that will be replicated.
Fig. 20.12 - Use the Manage menu to control all aspects of the replication process after you've completed the initial setup.
Select Manage, Replication, Publications from the menus to begin working with the different publications that are on your system (see fig. 20.13). This will take you to the dialog boxes that enable you to define and configure the publications and articles that you will be offering.
Fig. 20.13 - The initial Manage Publications dialog box shows the databases you've selected to allow for replication.
The dialog box will show each database for which you've enabled replication during the setup phase. Of course, there will not be any publications listed initially because you've not defined them yet. You use the New, Change, and Remove buttons to manage the different replications in your system. The tree display will expand under each database, showing the different publications that are available once they are defined. Under each publication, you'll be able to see the articles that have been created. From this display, you'll be able to manage all active publications.
For information about modifying table definitions, please refer to Chapter 5, "Creating Database Tables and Using Datatypes"
When you select New, you're presented with the dialog box that enables you to set up the publication and indicate the information that should be replicated. Figure 20.14 shows the dialog box that lists the different tables that are available and enables you to set up the specifics of the article.
Fig. 20.14 - Setting up the article to publish an entire database only entails selecting <all tables> for the article definition.
You'll need to first name the article that you're defining. When you do, make sure you do not include spaces or wild-card characters, such as an asterisk, in the name. If you do, you'll be prompted to change the name when you click Add to save the publication.
Because we're first setting up an article that publishes all eligible tables from the PUBS database, simply selecting the <all tables> option will suffice for the definition. Each table is copied to the Articles in Publication pane, and you can tell that they're included by the shared icon that you're familiar with in Explorer and File Manager.
Fig. 20.15 - You can defer synchronization to a time that may suit your system requirements better, possibly saving money if remote connections are used.
If you are replicating information between SQL Servers, you'll want to make sure you leave the default copy method, using Native Format data files, selected. This will enable SQL Server to save the information in as optimized a format as possible when it creates the export file used to initialize the subscriber's database tables. The other option, Character format, will save the exported information in a format that you could use if you were importing to the receiving database using other utilities, or if you're using the information produced by the initial synchronization in a third party system.
You can also set up the schedule of times that controls when the synchronization will occur by clicking the Change button. Establishing these times is very similar to the time frames you set up during the initial configuration of the replication services.
SQL Server enables you to indicate, on a case-by-case basis, what servers will have access to any given publication. By default, when you set up a new subscription, SQL Server will make it available to any server with overall access to the replication system of your server. To change this, as indicated in Figure 20.16, select the Security tab of the Edit Publications dialog box.
Fig. 20.16 - You can easily prevent access to specific publications depending on what servers are requesting the information.
When servers look to your system to select available publications, if their system has not be authorized to use a given publication, it will not show up in the list of available items. You can change this option later to allow or disallow access to a publication by selecting Security from the initial Publications dialog box.
Fig. 20.17 - You can use the tree view of the replication system to select a publication or article to review or modify.
If you want to make a change to a publication, simply highlight it and select Change. You'll be presented with the same dialog box that enabled you to define the publication in the first place. You'll be able to modify the tables that are included, how the information is synchronized, and the security aspects of the publication.
Fig. 20.18 - You can select one or more individual tables to be included in the publication.
If you do nothing more for this publication, the entire table will be monitored and replicated as part of this publication. You have the same options for managing how the publication will be synchronized and what servers will have access to it as you do when setting up a full-database replication.
The alternative to publishing the entire database table is to partition it. You may recall from earlier sections in this chapter on "Publishing Partial Databases or Tables" that you can partition a table in one of two ways: either horizontally or vertically. The next section shows how you can control this functionality with your publication.
To establish a partition of information to be used in the article, you click the Edit button on the Edit Publications dialog box's Articles tab. This will call up the Manage Articles dialog box that shows two new tabs, Filters and Scripts (see fig. 20.19).
Fig. 20.19 - You can select which columns you want to be included in the replication process.
In the example, as discussed earlier in this section, we don't want to include the Contract column information in the replicated information that is provided to other sites with this publication. By scrolling down the list of columns, you can select and deselect the different columns that you want to include. When you've completed the selection, you will have vertically partitioned the database if you've deselected any columns because the publication will include only those fields that you allowed.
Normally, insert, delete, and update operations are managed by the SQL Server replication process automatically. This may be a problem if you're inserting information and need to have another process be kicked off by the action of inserting a new row, for example. In these cases, you can click the Advanced button and you can specify the Insert, Update, and Delete scripts that you'd like to run when records are modified with these operations.
In addition, if you need to do so, you can provide a table-creation script that you need to use in initializing tables. This script will be run instead of the standard table script generated by SQL Server based on the source table's definition. This can be helpful if you're defining tables that will receive replicated data, but perhaps not all columns that need to be in a table.
By default, when the table is created, it will be created with columns to match the article's definition. This means that, in our example of not copying the Contract column, the column would not exist at all in the recipients' databases. You can modify the creation script here and make sure all appropriate information and columns are included. Figure 20.20 shows how you can modify these values.
Fig. 20.20 - You can override SQL's default methods that are used to implement changes to the underlying database tables.
For more information on triggers, see Chapter 14, "Creating and Managing Triggers"
You also have the option of controlling the synchronization scripts that will be run by SQL Server. Clicking the Advanced button will let you edit what actions will be taken on synchronization of tables between the publication and subscription systems. The dialog box is shown in Figure 20.21.
Fig. 20.21 - You have full control over how indexes and existing tables will be managed during the synchronization process.
If you want to see or edit the actual script that will be created when the table is synchronized, you can click the Auto-Generate button. You'll be presented with the listing that will be used to create the table when it's initialized.
After you've completed your changes, click OK from the Manage Article dialog box, then Add from the Manage Publications dialog box. The publication will be added to the list of available publications, and you'll be ready to allow other users to start accessing the publications on your system.
You'll notice in Figure 20.22 that the AllPubs publication shows a secure publication, one with additional limits on who can see and use the information. This is indicated by the key that is showing over the book icon next to the publication name. On the NonSecureAuthorsOnly publication, there is no key, indicating that the publication is open to subscription by any subscription server with access and overall replication authority on your server.
Fig. 20.22 - The new publication and articles are added to the list of active publications, ready for a subscriber to begin receiving the replicated information.
In the next section, we'll go over how you set up subscriptions on the recipient side of the equation. Now that you've set up the publications, it's a straightforward process to indicate which publications any given subscription server needs to begin receiving replication updates for.
For installation instructions, see Installing SQL Server's Replication Services previously in this chapter.
When you're ready to initiate subscriptions, select Manage, Replication, Subscriptions from the menu in Enterprise Manager. As shown in Figure 20.23, this brings up a dialog box very similar to the Manage Publications dialog box used to define and create publications.
Fig. 20.23 - Select the publication that you want to subscribe to from the list of available items for the server and database you are interested in.
You'll be able to drill down into any given publication to see what articles are available and whether you're already subscribed to them or not. Once you've found the publication you want to subscribe to, highlight it and select Subscribe. You'll be able to specify where you want the information to go on your local system once it begins arriving from the replication server (see fig. 20.24).
Fig. 20.24 - When you select a publication, you need to indicate where you want to store the incoming information on your server.
You'll notice that there are three different options available to you for synchronization of the database. The last option is to turn off the synchronization altogether. This means that you know for a fact that the database and the concerned tables are in sync with the master system by some other means.
The first option enables you to let the system do the work for you and do the synchronization of information over the standard network connection between this system and the publication server. If you're on a locally attached LAN or a fast connection that is reliable, this is the best option to select and is the default.
If you're on a slower link, it may be that you've opted to bring the update to the system by another means. This includes copying the initial synchronization files from the publication server to tape or disk and then physically bringing them to the subscription system. If you have a slow link and initially populated tables, this may be a good option, as it will avoid the initial rush of information across an already slow connection between the two systems.
After you've set up the subscription to the publication server, you'll notice that the display of subscriptions will be updated to show the active subscription. You can see in Figure 20.25 that, although the subscription is active, it has not yet been synchronized with the publication server.
Fig. 20.25 - The list of subscriptions will change to indicate that you're now subscribed to the publication. It also indicates the destination database.
The next two sections will show how you can cover both of these types of situations using facilities within SQL Server.
From the Server menu, select Replication, Publishing You'll be able to indicate which servers are able to subscribe to your server. Figure 20.26 shows the server selection dialog box that will appear.
There are three steps for uninstalling replication in SQL Server, as follows:
You will need to follow these stepsin orderto restore your system to its state prior to installing replication. Removing connections to other servers is covered in the section immediately preceding this one. If you're removing replication from a subscription server, select Server, Replication, Subscriptions instead of Publications, but the operations of removing references to foreign servers are the same.
The next step requires issuing an interactive SQL command, so you'll need to use ISQL/W or another utility to enter the command. Enter the following stored procedure call to turn off the replication option on your system:
This will remove the distribution option from your system, disabling replication. The distribution database you established when you first installed replication will still remain in the system. Dropping this database is the final step in removing replication from your system.
From the Enterprise Manger, right-click the table that you specified.
If you select an incorrect database, you will be permanently dropping production information, and your only recourse will be to restore from a backup.
From the menu, select Drop Database and confirm that you want to drop the database. If all has gone well, you can make a quick check by selecting Server, Replication from the menus. You should have the Install Replication option available once again.
Replication with SQL Server requires that you have a primary key defined for any table you need to replicate. Define a key for the table and retry the definition for the publication. The table will then appear in the list, ready for you to select it to be replicated.
A number of things can be happening that can cause the replicated table to not yet appear on the subscription system. First, keep in mind that the replication may have been established on a scheduled basis. It may be that you set up the replication to occur, at least for synchronization tasks, during the night or some other off-peak time. Check the publication setup options on the publication server to verify when you expect table synchronization tasks to be performed.
Second, it's possible that enough time hasn't passed since the publication and subscription were set up. By default, synchronization occurs every five minutes. If your table is large, it may take some time to physically transfer the information to the subscription system, or it may be that the initial five minutes has not yet passed.
Third, the connection may be down between the two servers. Check to see that you can otherwise connect to the servers and make sure that they are up, running, and allowing logons. You'll also want to check utilization on the servers. Microsoft recommends 32M of RAM on the publication system. With less RAM, it's possible that you're running into memory constraint issues and that the server is swapping more information than it should have to while performing the operations demanded of it with replication.
SQL Server's replicated tables are read-only on the subscribing system. If needed, you'll have to create either triggers on the incoming table that insert the replicated information into a working table, or you'll need to modify your system to use the tables differently, allowing for the read-only nature of the replicated tables.
For additional information about the topics covered in this chapter, please refer to the following chapters: