SQL DBA Internals: Publication

Showing posts with label Publication. Show all posts

Wednesday, October 25, 2017

Configuring a Subscription to an Azure SQL

This is my first post where I utilize an hybrid cloud, item that I personally find very interesting and useful so expect more posts like this to come.

Microsoft Azure released at the beginning of this week the availability to add an Azure SQL database as a subscriber for a transactional replication (https://t.co/WlZSmOQhhT), so I was preparing the post to show how to create a subscription to a transactional publication so instead of adding the subscription of an on-premises server I set it up to the Azure version.

The steps to configure a Subscriber are the following:

1. Start the subscription wizard: You will go and look for the "Replication" Folder, expand the folder and go to "Local Subscriptions" in there right click and choose "New Subscriptions…"

2. Choose your publication: the first step is connecting to a SQL Server that is configured as a publisher and the published databases lastly select the publication you want to subscribe, in this case, I'm connecting to the publication created in a previous post

3. Choose the distribution agent location You have two options here, creating Push or Pull subscriptions, what this means? With Push subscriptions, you will rely on the Publisher to send (Push) the replicated data to the subscribers, on the other side, with the Pull subscriptions, the subscribers will go an connect to the Distribution database and download the changes, this second one is particularly recommended with environments that the minimum increase of workload affects dramatically and if you plan to have multiple subscribers you can reduce that overhead in the publisher by configuring Pull subscriptions, however with the Azure SQL release it is recommended (for now) to add it as Push, mainly the reason why is because you don't have (yet) SQL Server agent to configure the agent jobs to run, so you need to use the one from the publisher.

4. Select your subscribers In this step you will be adding the subscribers to the publisher you have chosen, initially you will only see listed the current server where you are connected, however you can add more, in order to do so, you need to click on "Add Subscriber" and select "Add SQL Server Subscriber…"

5. Connect to your Subscriber server After you click in the Add Subscriber button, you will need to connect to the server or database that you will be adding, in this particular case, I'm connecting to my Azure SQL Database as you can see by the URL.

Once that you have connected successfully, you will see that server with the box at the left checked, telling that you will be adding that as a subscriber as well as the database that will receive the replicated data.

6. Configure the agents security: You will then need to configure the accounts that you will be using to connect to the distribution agent, in this particular case as I don't have an Active directory configured I choose to connect using a local account from my VM, important note, remember you will need to choose

7. Select the Synchronization Schedule once that you have chosen the subscriber database, you configure the Distribution agent security, you need to choose the Schedule, you can select "Run continuously" or schedule to be run at a given times, this all depends on how fast you need the updates to be replicated, when using transactional replication I recommend use continuously as you would want each transaction to be replicated immediately to the subscribers.

8. Initialize subscriptions In this step you will specify if your subscription will be initialized after finalizing the wizard steps or if you want to do it later. My recommendation do it Immediately, there are just few scenarios where you need to do that later.

9. Confirm the actions Confirm if you want to execute the commands immediately or you want to generate the scripts so you can review it and run them at a later time.

And you are all set, you just need to validate with the Replication monitor the progress of your initialization process as well as if it is synchronizing the data:

You can see that I had some errors in there, but they were from another test cases, just wondered that Triggers are not yet available in Azure SQL and I included some tables that had triggers before, but if you don't have that set, it would be straight forward.

The steps above applies the same as if you are setting a subscription to an On-Premises server or a SQL VM created in azure.

Thanks for reading

Configuring a Transactional replication

Hello,

This post is intended to show you all the steps you will need to follow when you setup a transactional replication, in fact the first 5 steps are one time only, assuming this is the first time you will be creating a publication in your database server, if you already have one set (Except for Log-Shipping, Mirroring or Always On) you can skip the first 5 steps.

Step 1: Configuring the Distribution database

As discussed previously (Replication Basics) SQL Server will use this database to keep a track and distribute all the changes that happen in your published objects (tables) to your subscribers, so the first steps are in order to create it. So in order to initialize that wizard, you connect to you SQL Server, and look for the "Replication" folder, over there you do a right click and select the first option "Configure Distribution"

Step 2 Selecting the type of distributor that you will have: You can select the type of distributor that you would like for your database server to use, the default selection is that each server serves as its own distributor, however, there are highly transactional environments where the little overhead can severely impact the performance and for that you will choose to use the distribution services from another server. For this case, we will use the first option (own distributor).

Step 3: Configure the SQL Server agent: By default the SQL Server agent is configure to start manually, if you haven't change that option you will get a message like this, If you already configure it to start automatically you won't see this screen.

Step 4: Configure the snapshot folder You will need to choose a location where SQL Server will be creating your Publication snapshots, remember this are important as from this you will be initializing your subscriptions. This folder can be later reconfigure in the case that you need to do so to improve the performance, but that will be review in another post.

Step 5 choosing the name for the distribution database and the location of the files: In this step you need to configure the name that you would like to give to the database that the distribution agent will be using, the default name is "distribution" however you can choose the name that you like, for instance you can use Customer_distribution to specify the name, this is useful if you will be using a separate server as your distributor. After this screen you will get a summary of the setup and will let you finish the wizard that will create and configure you Distribution database and agent. With this you are all set to start publishing you databases.

Verification: if you like to verify if your database got created properly you can go to the "System Databases" folder and look for the database name you choose in step 5 to verify.

Creating a publication

Every time that you need to create a Transactional replication publication you will need to follow this steps:

1 Choose new publication Go to the replication Folder (same folder you used to configure the distribution database), expand the Tree and over the "Local Publications" hit right click and select "New publication", that will prompt the "New publication Wizard"

2 Choose the database to be published Once that you are on the "New publication Wizard", select the database that you want to publish

3 Select the Publication type As discussed before, there are 4 different types of publications that you can choose from, in this post (Replication Basics) I explain a little bit more each case of use, and in that screen you can see a brief explanation of each one for you to choose, for this particular case we will select "Transactional Publication"

4 Choose your articles The articles are the database objects that you want to get replicated, you can select from tables, views or stored procedures, the only condition is that the table must have a Primary key.

5 Filter table rows You can choose to filter the data that will be replicated, in here you can write a query that will select the conditions the data needs to meet to be replicated, e.g. filtered by a date range, a status, age, etc. You can also leave this blank and will replicate the entire table.

6 Snapshot Agent Each publication has its own snapshot agent, this agent will be the one in charge of generating the schema to be applied on each of your subscriptions, you can choose to generate the snapshot immediately at the end of the wizard or schedule it to run at a given time. You can also choose both, in the case that you would like to reinitialize your subscriptions at a given day of the week although for transactional replications that is not recommended.

7 Agents setup you will need to configure the security for the Snapshot and Log Reader agent. When clicking on "Security Settings" it will drive you to another window where it will ask you to configure the account you will like to use to connect to the distribution agent to a) Generate the snapshot of the data and queue it and b) Read the changes that are made to the published objects.

8 Choose the next action In this step it lets you choose which action you would like to do, if you want to create the publication at that exact moment, or if you like to get the script generated to create and configure and you will be executing it later.

9 Confirmation In this step it shows you the summary of what you have chosen, but it is also an important step, where you will be giving the name to the publication so you can identify later what is it about, this is incredibly useful when you have several replications configure in the same server, or when you have 2 or more publication from the same database that each serves for different purposes (Reports, DR, etc.)

Once you have complete all of this steps and after clicking finish you don't see any error, you are all set, your publication is created and you are ready to start creating the subscriptions.

Tuesday, October 3, 2017

Introduction to SQL Server replication

This time I'll talk about some basis with SQL Server replication, whether you need it for High availability (HA), Disaster Recovery (DR), load balance your applications or any other reasons, you have ever considered one of the different types of replication available with SQL Server, some of the components that come in play when you will it setup as well as a brief description as to how each one works and the benefits. So let's get started:

Requirements:

Distribution DB: This is a system database you need to setup before configuring three types of replication (Transactional and Merge). This database comes into play when you need the changes to be distributed across the different subscribers you have to your publications. This is the database used by the Distribution agents to distribute the pending transactions you can also query this database to troubleshoot any errors shown in the replication monitor.

Publisher: This is the database server supporting your database publications, this applies the same as if you think of a book, you can find from a publisher different books (Publications).

Publications: This are the published databases for the replications in there you configure the articles (Tables) that will be part of the publication.

Subscriber: This are the servers that will receive the data from the publisher, the subscriber can be the same server as the publisher or can be different.

Snapshot: This is a copy sort of speak of the publication, it is used to initialize the subscribers or in the case that you setup that type of replication (Snapshot) it synchrony you set it up.

Replication Types

Snapshot: This type of replication is usually used when you need to move some data for reporting purposes or in the case that you will only read that data to run some extensive analysis and that processing will impact the transactional server.

Transactional Replication: This replication ships (depending on your settings) all the transactions from the publisher to the subscriber, this type of replication is helpful when you have reports that needs to be done online and you can't add more stress to your current production database, this is also useful in scenarios of HA or DR because it ships the data fast to the subscribers. This replication depends of the Distributor database to send replicate the data.

Peer-to-Peer replication: This publication enables multi-master replication. The publisher streams transactions to all the peers in the topology. All peer nodes can read and write changes and the changes are propagated to all the nodes in the topology. With this type each of the servers work as publishers and subscribers and you need to configure a Distribution database on each of the servers. This replication depends of the Distributor database to send replicate the data.

Merge replication: This replication "merges" the changes made at the publisher and the subscribers only after they have been initialized with a snapshot, you need to setup a merge schedule so that they can combine the changes from one site to another. This replication depends of the Distributor database to send replicate the data.

Log Shipping: This type of replication is one of the oldest but most trusted methods for Disaster recovery, what this replication does is it backups the transaction log of your database, copies and restore the file in your subscribers, all this based on the schedule that you setup, the more frequent, the closer to the failure you can restore. The good thing about this, is that it also helps you to keep your log files to a managed size so they don't grow indefinitely as you are constantly running this backups. You won't find this type of replication inside the replication folder in the management tree, instead you will find it on the "Properties" of each of the databases.

Mirroring: Database mirroring involves redoing every insert, update, and delete operation that occurs on the principal database onto the mirror database as quickly as possible. Redoing is accomplished by sending a stream of active transaction log records to the mirror server, which applies log records to the mirror database, in sequence, as quickly as possible. Unlike replication, which works at the logical level, database mirroring works at the level of the physical log record (Microsoft). This type of replication is extremely useful when you seek scenarios of HA or DR, because the fail-over time is considerably fast if you do it manually or you can set up to do it automatically. I like to say, this is the preview version of what Always On does now, that is one of the reasons why you will see Microsoft's comment that this feature will be deprecated.

Always On: This is the newest feature in terms of replication that was added to SQL Server, and thus the one that has received a lot of enhancements with each of the versions. It was introduced with SQL Server 2012 and received a lot of improvements with each version even the name has changed, originally it was "AlwaysOn" and in SQL 2016 changed to "Always On availability groups". This takes the functionality described with the mirroring replication and the benefits of the Log Shipping feature where you can have your replicas with Read-Only. An specific post for Always On will be released were we will review all the features as well as how to get it setup. This type is frequently used to support both scenarios HA and DR.

My wish is that you can give a better idea as to which replication will work better for each of the needs you have so you can get the better of what the SQL engine can provide to you.

Thanks for reading

Friday, August 25, 2017

SQL Server Replication snapshot initialization taking long time to be applied

Hello all,
In today's post I want to share a tip that helped me troubleshoot an issue I was facing and couldn't find much documentation about it. To my friends that read and don't have much knowledge in databases, this entry will look like an estrange language but I promise I'm working in some post to walk you through the database world and understated some of the terms used here.

Issue: Had to reinitialize a subscription from a transactional replication. When doing the initialization with the snapshot, the process was taking a huge amount of time, about 13 hours and it has only applied the snapshot for 12 (out of 24) tables, based on that speed the entire process would have taken about 26-30 hours to complete.

The snapshot was generated in 12 minutes and it was generated taking advantage of parallelism, however I noticed that when the snapshot was getting applied it was processing one file at the time (due to the parallelism of the snapshot generation, it split the snapshot files in 48 chunks for each table) so I started to wonder why the initialization was sequential (Serialized) whereas the generation was in parallel.

By doing a deep research on the web, I found lots of explanations such as this one (https://blogs.msdn.microsoft.com/sqlsakthi/2011/08/07/maxbcpthread-parameter-for-snapshot-and-distribution-agent-in-sql-server-replication/) which explain how the parameter "-MaxBcpThreads" work, however it doesn't explains how to get it implemented. You can find that parameter in the Distribution Agent for the publisher server, however you can't change the value from the user interface (UI) of SSMS.

So I stop trying to find for an specific solution through the web and started to build my own. Time to remove the dust and start managing the replications with T-SQL, so I looked for any stored procedures that would help me change that parameter so I found this "sp_change_agent_parameter" in there you can update the parameters for all the database agents. You can query your current Distributors but using the following query. lets call it Query1 for future reference:
SELECT
[publication] as Publication
,c.srvname as SubscriberName
,b.profile_name as Profile
,[name] as DistributionJobName
,b.profile_id
FROM [distribution].[dbo].[MSdistribution_agents] a
inner join msdb.[dbo].[MSagent_profiles] b
on a.profile_id = b.profile_id
inner join master..sysservers c
on a.subscriber_id = c.srvid
order by b.profile_name

From those results, validate if the publication you want to update is listed there, and take the value from the profile_id column.
Important note before updating this values, in the URL mentioned above, there is an important note: "When applying a snapshot that was generated at the Publisher using the concurrent snapshot option, one thread is used, regardless of the number you specify for MaxBcpThreads."
Meaning that you will need to verify the Sync_method that is configured with your publication, if it is Concurrent, then you will have to update your sync method before updating your Distribution agent properties, you can validate the method with this command "EXEC sp_helppublication @publication = 'YourPublicationName'" and look for the 7th column, this command will need to be executed on your published database. The column value will be a number, you can match each result to this property:
Synchronization mode:

0 = Native bulk copy program (bcp utility)
1 = Character bulk copy
3 = Concurrent, which means that native bulk copy (bcputility) is used but tables are not locked during the snapshot
4 = Concurrent_c, which means that character bulk copy is used but tables are not locked during the snapshot.
source(https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-helppublication-transact-sql)

Once that you have verify and noticed that you need to update the sync method to your publication, you can do it with the following command:

sp_changepublication
@publication = 'YourPublicationName',
@property = 'sync_method',
@force_invalidate_snapshot =1,
@force_reinit_subscription =1,
@value = 'native';
GO

Once that you have that set, you can use the following command to update the value you want for the maximum Bcp Threads where Profile_ID is the one you got from the Query1:

sp_change_agent_parameter
@profile_id = 4,
@parameter_name = '-MaxBcpThreads',
@parameter_value = '4'
go

What would this change do? Enable you to use x number of threads (based on what you configure in @parameter_value) to read the snapshot files and get it apply in parallel. One thing that worth's mentioning, when you check for your active executions (e.g. sp_who) you will notice that some of the parallel threads will get blocked, don't worry about it, is because you are dealing with the same table, you will gain that parallelism when you process various tables at the same time, so in here my advice is to configure that parameter_value to a number close to the half of the files that were generated always taking in account your available processors, you don't want to affect the performance for the other databases when trying to re-mediate one issue.

The result? In my environment I was able to apply the snapshot (24 tables) in 6hrs and 13 minutes, the Published database was a 1.5TB database and the snapshot weighted 185GB, without the change the process ran for more than 13hrs and only 12 tables (50% in number 35% in data) were applied, hope this tip helps you get the snapshot applied faster.

Thanks for reading!

SQL DBA Internals