SQL DBA Internals: BCP

Showing posts with label BCP. Show all posts

Monday, December 4, 2017

Configure Always on AG with SQL 2017

One of the new features that we have with SQL 2017 is that you no longer need a Windows cluster to enable the AlwaysOn feature with SQL server (remember we discuss the requirements to configure that here those are valid for versions 2012 to 2016) , if you want to review what's new in SQL 2017 you can check it here.

In this post I'll be showing you how to configure an Always On availability group with SQL Server 2017, most of the steps are the same for the older versions and I'll be telling the differences on each of the steps so lets get started.
First I'll amuse you already have SQL Server installed in two servers as stand alone instances, that is a requirement for this post.

1. Enabling AlwaysOn: In order to do that, you need to go to SQL Server Configuration Manager, and on the SQL Server Services, hit properties over the MSSQLSERVER

2. In the properties window, look for "AlwaysOn High Availability", if you are running with an older version, the checkbox "Enable AlwaysOn Availability Groups" won't be availabile until you make the machine where SQL Server part of a cluster but just part of the windows cluster, the SQL Server instance remains as an Stand Alone type. Once that you have the checkbox enabled, check it, and click apply and Ok. Enabling this feature requires a Service restart so take that in account. Also, you need to do this in all the SQL instances that will be part of the Availability Group.

3. Go to SQL Management studio, connect to any of the SQL instances that will be part of the group, look for the "Always On High Availability" folder, expand it and you will see a folder with the name "Availability Groups" do a right click on it and select "New Availability Group Wizard…"

4. In the availability group wizard, configure the name you want for your availability group.

a. Cluster Type (New with SQL 2017): you have 2 options here (the official documentation shows 3), External or None, if you choose External, it means that you will joining this availability group to an external cluster, E.g. if you have a windows cluster on your primary datacenter that is on a different network or you want to add it to a Linux cluster.
b. Database health level detection (New with SQL 2017), this will enable a constant validation of the databases that are part of the availability group that if anything goes wrong with any of the databases it will trigger an automatic failover
c. Per Database DTC support (New with 2016 SP1) Allows the Distributed Transaction Coordinator through the availability group, feature that was not available in older versions of SQL (2012 and 2014)

If you don't have a cluster created and just like this scenario, choose NONE

5. Select the databases that will be part of the availability group. Requisites for a database to be considered:
a. Be on Full recovery model
b. Have a full backup

6. In here you will be choosing the replicas that you want to add to your group. Important features here:

Initial Role: You will set the role that each replica will have once that the group becomes available
Failover Mode: Manual or Automatic
Availability Mode: Synchronous commit or Asynchronous commit
Readable Secondary: No, Yes - Read Intent Only and Yes. If you choose No or Yes - Read Intent Only, you won't be available to query your replicas, the Read Intent is when you enable that feature that routes the read operations to your available replicas. With the Yes option you will be able to query the databases in your replica.
The endpoints tab will show you the URL and ports each replica is set to.
Backup preferences tab: in here you can configure if you want to take the backups from the primary or the secondary servers.
Listener tab: In this window it lets you set the parameters to configure the availability group listener, however my advice is to do it later, configure your group first and once that is done, configure it later.
Read-Only Routing: tab This lets you configure your read only routing for the read-intent setup, this allows you to load balance the queries so you have only the queries that will Insert, Delete or Update on the primary and all the Select queries routed to your secondary's so they don't consume resources the Primary will use (I'll explain it in another post)

Automatic seeding (Starting SQL 2016): With this type, SQL Server will do everything for you, it will use the default folders configured so make sure you have enough space available on it.
Full database and log backup: Same as the Automatic seeding, however in this case it lets you choose where do you want to generate the backups
Join Only: With this one, you do prepare everything in your replica, important notice, you do a restore with norecovery in your replica and apply at least one log backup. This option is useful with really large databases.
Skip initial data synchronization: Same as before, however in this particular one, you need to restore everything just at the time before initializing the synchronization otherwise it will tell you that there are items pending to be restored.
Failover Mode: Manual or Automatic
Availability Mode: Synchronous commit or Asynchronous commit
Readable Secondary: No, Yes - Read Intent Only and Yes. If you choose No or Yes - Read Intent Only, you won't be available to query your replicas, the Read Intent is when you enable that feature that routes the read operations to your available replicas. With the Yes option you will be able to query the databases in your replica.
The endpoints tab will show you the URL and ports each replica is set to.
Backup preferences tab: in here you can configure if you want to take the backups from the primary or the secondary servers.
Listener tab: In this window it lets you set the parameters to configure the availability group listener, however my advice is to do it later, configure your group first and once that is done, configure it later.
Read-Only Routing: tab This lets you configure your read only routing for the read-intent setup, this allows you to load balance the queries so you have only the queries that will Insert, Delete or Update on the primary and all the Select queries routed to your secondary's so they don't consume resources the Primary will use (I'll explain it in another post)

8. Verify that the validation runs successfully, this warning is because I didn't setup the listener in step 6, but that is fine, every time that I've tried to configure it from the very first page it fails, so my advice is that you configure it later (part of this post).

9. After hitting finish and if none of the steps have failed you will see the screen just like this.

Setup of the listener

The listener will be your single point of entry for your availability group, no matter which server is primary and which secondary, you will only need to configure your applications to use the Listener fqn or the ip and that will give you that High Availability you are looking with Always On.
1. Go to the "Always On High Availability" folder, expand your availability group and look for the "Availability group listener" and right click on it.
1. In the Configuration screen, Configure the domain name you want your listener to respond onto, this will be like another computer in the domain, so be sure that you have permissions to create objects in the active directory, if you don't ask your domain admin to provision that name and grant you permissions over it so you can enable it.
You are all set, you can start configuring your applications to the listener and taking the advantages always On provides.

Wednesday, October 25, 2017

Configuring a Transactional replication

Hello,

This post is intended to show you all the steps you will need to follow when you setup a transactional replication, in fact the first 5 steps are one time only, assuming this is the first time you will be creating a publication in your database server, if you already have one set (Except for Log-Shipping, Mirroring or Always On) you can skip the first 5 steps.

Step 1: Configuring the Distribution database

As discussed previously (Replication Basics) SQL Server will use this database to keep a track and distribute all the changes that happen in your published objects (tables) to your subscribers, so the first steps are in order to create it. So in order to initialize that wizard, you connect to you SQL Server, and look for the "Replication" folder, over there you do a right click and select the first option "Configure Distribution"

Step 2 Selecting the type of distributor that you will have: You can select the type of distributor that you would like for your database server to use, the default selection is that each server serves as its own distributor, however, there are highly transactional environments where the little overhead can severely impact the performance and for that you will choose to use the distribution services from another server. For this case, we will use the first option (own distributor).

Step 3: Configure the SQL Server agent: By default the SQL Server agent is configure to start manually, if you haven't change that option you will get a message like this, If you already configure it to start automatically you won't see this screen.

Step 4: Configure the snapshot folder You will need to choose a location where SQL Server will be creating your Publication snapshots, remember this are important as from this you will be initializing your subscriptions. This folder can be later reconfigure in the case that you need to do so to improve the performance, but that will be review in another post.

Step 5 choosing the name for the distribution database and the location of the files: In this step you need to configure the name that you would like to give to the database that the distribution agent will be using, the default name is "distribution" however you can choose the name that you like, for instance you can use Customer_distribution to specify the name, this is useful if you will be using a separate server as your distributor. After this screen you will get a summary of the setup and will let you finish the wizard that will create and configure you Distribution database and agent. With this you are all set to start publishing you databases.

Verification: if you like to verify if your database got created properly you can go to the "System Databases" folder and look for the database name you choose in step 5 to verify.

Creating a publication

Every time that you need to create a Transactional replication publication you will need to follow this steps:

1 Choose new publication Go to the replication Folder (same folder you used to configure the distribution database), expand the Tree and over the "Local Publications" hit right click and select "New publication", that will prompt the "New publication Wizard"

2 Choose the database to be published Once that you are on the "New publication Wizard", select the database that you want to publish

3 Select the Publication type As discussed before, there are 4 different types of publications that you can choose from, in this post (Replication Basics) I explain a little bit more each case of use, and in that screen you can see a brief explanation of each one for you to choose, for this particular case we will select "Transactional Publication"

4 Choose your articles The articles are the database objects that you want to get replicated, you can select from tables, views or stored procedures, the only condition is that the table must have a Primary key.

5 Filter table rows You can choose to filter the data that will be replicated, in here you can write a query that will select the conditions the data needs to meet to be replicated, e.g. filtered by a date range, a status, age, etc. You can also leave this blank and will replicate the entire table.

6 Snapshot Agent Each publication has its own snapshot agent, this agent will be the one in charge of generating the schema to be applied on each of your subscriptions, you can choose to generate the snapshot immediately at the end of the wizard or schedule it to run at a given time. You can also choose both, in the case that you would like to reinitialize your subscriptions at a given day of the week although for transactional replications that is not recommended.

7 Agents setup you will need to configure the security for the Snapshot and Log Reader agent. When clicking on "Security Settings" it will drive you to another window where it will ask you to configure the account you will like to use to connect to the distribution agent to a) Generate the snapshot of the data and queue it and b) Read the changes that are made to the published objects.

8 Choose the next action In this step it lets you choose which action you would like to do, if you want to create the publication at that exact moment, or if you like to get the script generated to create and configure and you will be executing it later.

9 Confirmation In this step it shows you the summary of what you have chosen, but it is also an important step, where you will be giving the name to the publication so you can identify later what is it about, this is incredibly useful when you have several replications configure in the same server, or when you have 2 or more publication from the same database that each serves for different purposes (Reports, DR, etc.)

Once you have complete all of this steps and after clicking finish you don't see any error, you are all set, your publication is created and you are ready to start creating the subscriptions.

Friday, October 6, 2017

Introduction to SQL Server Always On

Today I'll write with one of the features I like the most and is Always On. There will be two post of this subject, this first one is an overview of the changes it has gone through, benefits and the scenarios where you can get it implemented, the second will detail the steps on how you can get it implemented.

So I'll try to answer this questions What is Always On? When was it Released? And How can it be used?

What is Always On?

Always On or Always On availability groups (Microsoft) is an integrated high availability (HA) and disaster recovery (DR) solution that provides redundancy within a datacenter and across datacenters to help enable fast application failover during planned and unplanned downtime. Specifically, Always On delivers failover cluster instances, multiple secondary, active secondary, and connection director capabilities (multi-subnet and read-only intent).

In English, this means, Always On provides a highly available and Disaster recovery solution for you database environment. As mentioned before is the next evolution of database mirroring. Database mirroring is relatively easy to set up, and it provides an almost instant failover when it’s used in High Safety mode. However, database mirroring has several important limitations. First, database mirroring is limited to a single failover partner. In addition, database mirroring is capable of failing over only a single database at a time. Database mirroring must be implemented either synchronously or asynchronously you have to choose one or the other. Finally, the databases on the mirroring server are in a state of recovery, so they can’t be directly accessed. To use the data, you must take point-in-time snapshots of the

databases.

This function was introduced with SQL Server 2012 and has received a lot of transformations with the different releases of SQL Server

This are some of the enhancements:

Secondary replicas With 2012 you could only have 4, with 2014 it was increased to 8
You were not allowed to create a replica in Azure, with 2014 that option was introduced
In 2012 and 2014 it was limited to the Enterprise version, in 2016 SP1 you can use "Basic Availability groups since the standard version"
In the previous versions you were limited to servers that belong to the same domain, with 2016 you can combine groups form different domains.
In 2016 Databases with Transparent Data Encryption (TDE) are now supported
If you were using Grouped Managed Service accounts (gMSA) you were not able to configure the availability groups, it is now supported with 2016
In 2016 the Direct Seeding got introduced. Direct seeding takes all the initialization steps for you, it does a full backup and synchronizes the database in the background for you.
In 2014 the read intent was introduced, meaning that all of your read operations will be routed to your replicas so that you have the primary server to run I/U/D (Insert, Update, Delete), In 2016 the engine automatically load balance this across your replicas.

How can it be used?

Always On needs some components to run:

Two sets of IPs: You will need one IP for the Windows failover cluster and one for your Listener.
Windows Failover cluster: You need to configure (create) a windows failover cluster (WSFC) so that you can enable the Always on Functionality in the SQL Server service, why? Because the failover cluster validates if the servers that will be part of the availability groups are running without problems.
Availability Groups: After you create your cluster and install SQL Server as Stand Alone instances, you will have to configure an availability group and add both SQL Server instances, this is where you Always On starts to takes life.
Listener: This will be you single point of entry to you Availability group (AOG) it could be either by the domain name or the IP you will be assigning to it in the setup section, this will always route you to your primary server, unless you configure the Read-Intent routing and it will route your read operations to your replicas.

Think of the an Always On environment as an "Share almost nothing" environment" why? You are only sharing the failover clustering part, other than that you will have independent hardware for each of your SQL Servers. Also when you setup your environment with this type of setup, it reduces the changes (or steps) that you need to take when doing a planned DR because everything is already setup (At least the database part) and you only need to worry if the synchronization was up to date.

Friday, August 25, 2017

SQL Server Replication snapshot initialization taking long time to be applied

Hello all,
In today's post I want to share a tip that helped me troubleshoot an issue I was facing and couldn't find much documentation about it. To my friends that read and don't have much knowledge in databases, this entry will look like an estrange language but I promise I'm working in some post to walk you through the database world and understated some of the terms used here.

Issue: Had to reinitialize a subscription from a transactional replication. When doing the initialization with the snapshot, the process was taking a huge amount of time, about 13 hours and it has only applied the snapshot for 12 (out of 24) tables, based on that speed the entire process would have taken about 26-30 hours to complete.

The snapshot was generated in 12 minutes and it was generated taking advantage of parallelism, however I noticed that when the snapshot was getting applied it was processing one file at the time (due to the parallelism of the snapshot generation, it split the snapshot files in 48 chunks for each table) so I started to wonder why the initialization was sequential (Serialized) whereas the generation was in parallel.

By doing a deep research on the web, I found lots of explanations such as this one (https://blogs.msdn.microsoft.com/sqlsakthi/2011/08/07/maxbcpthread-parameter-for-snapshot-and-distribution-agent-in-sql-server-replication/) which explain how the parameter "-MaxBcpThreads" work, however it doesn't explains how to get it implemented. You can find that parameter in the Distribution Agent for the publisher server, however you can't change the value from the user interface (UI) of SSMS.

So I stop trying to find for an specific solution through the web and started to build my own. Time to remove the dust and start managing the replications with T-SQL, so I looked for any stored procedures that would help me change that parameter so I found this "sp_change_agent_parameter" in there you can update the parameters for all the database agents. You can query your current Distributors but using the following query. lets call it Query1 for future reference:
SELECT
[publication] as Publication
,c.srvname as SubscriberName
,b.profile_name as Profile
,[name] as DistributionJobName
,b.profile_id
FROM [distribution].[dbo].[MSdistribution_agents] a
inner join msdb.[dbo].[MSagent_profiles] b
on a.profile_id = b.profile_id
inner join master..sysservers c
on a.subscriber_id = c.srvid
order by b.profile_name

From those results, validate if the publication you want to update is listed there, and take the value from the profile_id column.
Important note before updating this values, in the URL mentioned above, there is an important note: "When applying a snapshot that was generated at the Publisher using the concurrent snapshot option, one thread is used, regardless of the number you specify for MaxBcpThreads."
Meaning that you will need to verify the Sync_method that is configured with your publication, if it is Concurrent, then you will have to update your sync method before updating your Distribution agent properties, you can validate the method with this command "EXEC sp_helppublication @publication = 'YourPublicationName'" and look for the 7th column, this command will need to be executed on your published database. The column value will be a number, you can match each result to this property:
Synchronization mode:

0 = Native bulk copy program (bcp utility)
1 = Character bulk copy
3 = Concurrent, which means that native bulk copy (bcputility) is used but tables are not locked during the snapshot
4 = Concurrent_c, which means that character bulk copy is used but tables are not locked during the snapshot.
source(https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-helppublication-transact-sql)

Once that you have verify and noticed that you need to update the sync method to your publication, you can do it with the following command:

sp_changepublication
@publication = 'YourPublicationName',
@property = 'sync_method',
@force_invalidate_snapshot =1,
@force_reinit_subscription =1,
@value = 'native';
GO

Once that you have that set, you can use the following command to update the value you want for the maximum Bcp Threads where Profile_ID is the one you got from the Query1:

sp_change_agent_parameter
@profile_id = 4,
@parameter_name = '-MaxBcpThreads',
@parameter_value = '4'
go

What would this change do? Enable you to use x number of threads (based on what you configure in @parameter_value) to read the snapshot files and get it apply in parallel. One thing that worth's mentioning, when you check for your active executions (e.g. sp_who) you will notice that some of the parallel threads will get blocked, don't worry about it, is because you are dealing with the same table, you will gain that parallelism when you process various tables at the same time, so in here my advice is to configure that parameter_value to a number close to the half of the files that were generated always taking in account your available processors, you don't want to affect the performance for the other databases when trying to re-mediate one issue.

The result? In my environment I was able to apply the snapshot (24 tables) in 6hrs and 13 minutes, the Published database was a 1.5TB database and the snapshot weighted 185GB, without the change the process ran for more than 13hrs and only 12 tables (50% in number 35% in data) were applied, hope this tip helps you get the snapshot applied faster.

Thanks for reading!

SQL DBA Internals