Saturday, September 16, 2017

SQL Server database basics

So today I'll talk about some of the basic items for SQL server that helps to understand the reason of some performance recommendations. This first post will list and give an explanation of each of the components and later on will explain in detail how it works and what are the best practice to each one.

  1. System databases
    1. master - This database is the brain of SQL Server, in here you have the metadata for each object that is created in the SQL Server instance, that includes Logins, SQL Users, Databases, tables, startup parameters, etc.
    2. model - This database has all the default templates for SQL Server, when the SQL instance first starts, it takes the all the setup for the databases from here, for example, every time that you restart your SQL Server instance, the tempdb is created base on a template stored in here.
    3. msdb -  This database is used to store all the information related with the SQL Server jobs, every time that a job is created, an entry is inserted here, you can also find the execution log for all those jobs in this database. The maintenance plans are also stored here.
    4. distribution - This database is created and utilized every time that you setup a replication, except in the cases when you use Mirroring or Always On Availability groups, however if you use Transactional or Merge you will use this database to distribute the commands to your subscribers.
    5. ssisdb - This database was introduced in SQL Server 2012, you won't find this database under the "system databases" folder as it needs to be created on demand. This database was introduced in order to provide portability to the SSIS catalog and improve the security, in the past, in a Disaster recovery or High availability solutions, you had to create your packages on each of your sites, whether on the SSIS instance or on disk, with this you can add the database to a mirroring or Always On solution and all your packages will be replicated.
  2. User Databases
The user databases regularly follow this structure:
  1. Data file (mdf) - This is the file where all the data is stored, this is your key file, if this gets corrupted, lost, or whatever, you will be losing all of your data if you don't have a backup, so make sure where this is stored and be sure that you backup your database regularly.
  2. Log file (ldf) - This is the file that supports and stores all the transaction made in the database, this file plays a key role when: a. You setup your recovery model as Full, b. You replicate the database. One common mistake that is made is not to configure t-log backups, if you do so, you gain several benefits,
    1. You can manage the size of your log file,
    2. Prevent that your file gets highly fragmented (vlf's)
    3. You can do a restore in time
  3. Filegroups - By default the databases get created with a Primary filegroup, what does this filegroup? Think of this as an index at the beginning of a book, where you list all the Chapters. The Filegroup plays a key role when your database is becoming larger in time, because you can create an Archive filegroup, assign that filegroup to a different physical file, and move that ancient data into the new group, so that you Primary group remains supporting your daily operations. There are also more benefits you can gain when you setup different filegroups, but that will be discussed in a different post.
  4. Recovery Model - You can have different recovery models set to a database, by default when you create a new database, the template from the model sets as full, however you have the following options:
    1. FULL - This means that it will store all the data in the mdf file, and will write all the transactions in the ldf so that when you restore your database from a backup, it applies rolls forward or back the transactions based on their state in the Log to remain the integrity of the data. This also allows you to do in time restores.
    2. Simple - This model is usually used when you only care of the data that is committed in the database, also some use it as a high performance model, because that millisecond that takes to be written in the log, can create an impact with the highly transaction applications that requires a millisecond response time, or in scenarios where you setup a Dev or QA environment and you don't need to restore in time.
    3. Bulk-logged - With this model there are certain operations that won't be fully logged in the transaction log, thus this won't make it grow, operations such as BULK INSERT, CREATE INDEX, SELECT INTO.

  1. Backups
    1. Full backup - I would say this is the most frequently used type, however most of the times this is missed of failed to be reviewed and leads to loss of data. With this type, you are backing all the data that is stored in your database, at a given time.
    2. Transaction Log backup - As discussed before, this type of backups helps to prevent the transaction log to grow indefinitely and also will allow you to restore in time.
    3. Differential backup - This backup will take all the changes made to the database since the last full backup.

SQL Server 2008 end of support