In our series dedicated to SQL Server 2005 Express Edition, we have been focusing mainly on characteristics that emphasize its unique capabilities, making it a preferred choice in variety of scenarios, despite the apparent superiority of its full-featured counterparts. However, it is fairly obvious that its small footprint and free-of-charge use also have negative implications, imposing restrictions on functionality available in the Standard or Enterprise editions. This is especially conspicuous in the case of replication, which we will explore starting with this installment.
Before we turn our attention to aspects of replication specific to SQL Server 2005 Express Edition, let's briefly review its terminology and principles that are important to our discussion. The main purpose of replication is to provide a mechanism for creating multiple copies of the same data (including underlying metadata) and keeping them synchronized, regardless of where updates originate. While its basic terminology is derived from the world of traditional magazine publishing, some of its concepts appear to be better represented by Web 2.0-based online technologies, such as wikis or blogs (which allow partial or multi-directional updates to publications). In general, though, both of these analogies work well when analyzed from the point of view of three main entities participating in the replication:
- publisher - a database instance, which contains the original data. Its portions that are designated for replication are known as publications and consist of one or more units called articles, representing a database object such as a table, view, or stored procedure (which, incidentally, can be further filtered to limit the amount of replicated data).
- subscriber - a database instance, which receives a copy of data that was made available by a publisher. Depending on configuration, a subscriber might be allowed to make changes to the local data and replicate it either back to the publisher or to other subscribers (in this way assuming the dual role of the publisher and subscriber)
- distributor - a database instance responsible for hosting metadata needed to control and monitor data copy and its synchronization between a publisher and its subscribers.
Note that, as we mentioned, some of these objectives can be assigned (for example a subscriber can function as a publisher) to the same database, although distributor is always separate and unique per publisher. (However, this still allows for publisher databases to be hosted on the same SQL Server 2005 instance as the distributor, which, in such configurations, is known as "local"). In addition, it is possible to intermingle multiple replication arrangements (e.g. a single subscriber can receive distinct publications from several, independent publishers). Despite the wide range of possible variations, there are three basic replication categories, determined by the roles of participants in the process of data modification as well as the mechanism and frequency of updates:
- snapshot - the simplest but also the least efficient model, which involves copying entire publication (in the form of its snapshot) from a publisher to its subscribers. This is applicable mainly in scenarios where the frequency and number of changes is very limited (or one-time data copy is needed). Since updates are propagated unidirectionally, any changes to subscribers are automatically overwritten (hence, their data is assumed to be read-only).
- transactional - relies on transactions to apply individual modifications to subscribers shortly after they take place on a publisher (by monitoring the transaction log of the original database and replaying each entry that affects content of the publication). In the most standard form, changes are only allowed on the publisher, but there are a couple of variations to this basic scheme known as updating subscribers and peer-to-peer that allow bidirectional updates. (Potential conflicts that occur when two replication partners attempt to simultaneously modify the same article are resolved according to an arbitrarily defined resolution policy)
- merge - the most complex but offering the highest degree of flexibility, with changes that occur independently on a publisher and its subscribers (accommodating scenarios when connection between them is available only occasionally). Any intermediate updates are stored in system tables, which are subsequently used during periodical replication to propagate changes and apply conflict resolution on the row-by-row basis (rather than by employing transactional mechanisms).
All the replication arrangements described above can be configured as either
push (with publisher and distributor controlling the frequency of updates) or
pull (changes are applied following explicit requests from a subscriber).
Individual tasks that facilitate data exchange are handled by specialized
processes called agents (implemented as separate executables located in the
Program Files/Microsoft SQL Server/90/COM
folder and operating typically within the confines of SQL Server Agent jobs),
and include the following:
- Snapshot Agent - running on distributor, it is responsible for creating a copy of a publication (residing on a publisher) and transferring it to an intermediary location (from which it is subsequently delivered by the Distribution Agent to its subscribers via network or removable media, depending on configuration). Note that this agent plays an active role in all three replication types (although its use is considerably less frequent when employing a transactional or merge approach).
- Distribution Agent - delivers a copy of the snapshot (in case of snapshot and transactional replication) and transactions (for transactional replication) from the distributor to subscribers. In the case of push replication, it runs on the distributor, but with pull subscriptions, it operates on subscribers. It is not involved in merge replication.
- Log Reader Agent - residing on distributor, handles parsing of the transaction log on the publisher in transactional replication (it is not used with other replication categories) in search for relevant entries (marked for replication) and copies them to the distributor.
- Queue Reader Agent - is specific to the transactional replication model with updating subscribers. It operates on the distributor, facilitating updates initiated by subscribers by reading transactions recorded in a queue hosted in a subscriber database and applying them to the publisher database.
- Merge Agent - applicable to merge replication only, it collects incremental changes from all subscribers and the publisher since the most recent reconciliation, resolves any potential conflicts, and produces a consistent outcome that is applied to all replication partners. In the case of the push replication scenarios, the agent runs on distributor, but with pull implementations, it is hosted on subscribers. It also performs the initial copy of a snapshot to subscribers, eliminating the need for involvement of the Distribution Agent (which is responsible for this activity in snapshot and transactional replication).
Following this brief introduction to replication, let's concentrate on its characteristics that are relevant within the context of our series. First of all, it is important to realize that SQL Server 2005 Express Edition instances are limited to the role of a subscriber, although with its ability to participate as a subscriber in all types of replication topology, you can actually publish its changes in a bi-directional merge replication. (Incidentally, this makes it inferior to its predecessor - Microsoft SQL Server Desktop Engine, better known as MSDE, which, in addition to functioning as a subscriber, supported the roles of a publisher and a local distributor in snapshot or merge replication arrangements). Furthermore, since SQL Server 2005 Express Edition does not include SQL Server Agent (also present in MSDE), it does not directly support replication agents running on subscribers (such as merge or distribution agents in the pull configuration). In order to emulate their functionality, you will need to resort to workarounds that involve the use of such mechanisms as Windows Synchronization Manager, invoking replication-specific executables (such as replmerg.exe), or coding with .NET-based Replication Management Objects (RMO). The Windows Synchronization utility has been available since the release of Windows Server 2000 and Internet Explorer 5.0, via Synchronize shortcut (rebranded as Sync Center in Windows Vista) in the Accessories program group. While its primary purpose was to provide a central point of administering synchronization for e-mail, offline files, and Web pages, its capabilities have been extended to accommodate such aspects of SQL Server 2005 Express Edition replication as synchronizing, reinitializing, deleting, or modifying an update mode (when dealing with updating subscribers) for its subscriptions. Replication Management Objects assembly, combined with managed code, offers programming methods of manipulating responsibilities of replication agents.
You should keep in mind that replication functionality is not incorporated by default in the SQL Server 2005 Express Edition installation. The option controlling this behavior is accessible by expanding the Database Services node on the Feature Selection page of the setup wizard and can be modified by assigning "Will be installed on local hard drive" value to its Replication entry. In addition, if you intend to take advantage of the connectivity and Replication Management Objects (RMO), you should apply the same setting to the Connectivity Components subnode of the Clients Components node on the same page of the wizard. In case you missed these steps during initial setup, simply launch SQLEXPR32.EXE (or SQLEXPR.EXE for 64-bit systems) to modify an existing instance (for the background information regarding this process, refer to our earlier article).
By leveraging technologies described in this article, you can deploy arbitrarily filtered portions of a database residing on a designated SQL Server 2005 Standard or Enterprise instance to multiple SQL Server 2005 Express Edition systems and maintain consistency between them. This approach provides a cost-efficient solution in situations where remote users must be able to not only remain fully operational even if network connectivity to a central office is limited, but also maintain access to up-to-date data through on-demand synchronization. In our next article of this series, we will review a few examples illustrating such implementations.