Wednesday 27 June 2012

DQS Cleansing Transform in SSIS 2012

Data Quality Services is a new SQL Server product in Denali which provides data cleansing capabilities. This post describes how to make use of the DQS Cleansing Transformation which is part of the default set of SSIS data flow components.

The DQS Cleansing transform can be found in the Data Flow Toolbox.


The first thing we need to set in the DQS Cleansing Transformation Editor is the Data Quality Connection Manager. This will point to our DQS installation residing on a SQL instance.



Once the connection manager has been created, you select the Knowledge Base (KB) you want to use. The current CTP comes with a default “DQS Data” KB, which contains domains that you’d use for address validation (at TechEd they mentioned that the final release may include additional domains out of the box). Selecting the Knowledge Base you want to use will bring up its list of domains.


There are two types of domains in this list; regular Domains (ex. City, State, Zip), and Composite Domains (also called CDs), which are made up of two or more regular domains. Composite Domains allow you to validate multiple fields as a single unit. For example, the “Company – US” composite domain in the default DQS Data KB is made up of Company Name, Address Line, City, State and Country. This lets you validate that “Microsoft Corporation” (Company Name) exists at “One Redmond Way” (Address Line), “Redmond” (City), “WA” (State), “USA” (Country) (given that you have the appropriate CD rules in your knowledge base, or the CD is attached to an online reference data provider). DQS would flag the entry as incorrect if you had “Seattle” as the City – even though Seattle is a valid city name, that’s not where the Microsoft office is.

There are two ways to make use of Composite Domains in SSIS:
  1. A single (string) column – for this to work, all values must appear in the same order as the domains do. So using the “Company – US” example above, your column values would need to look like this: Microsoft Corporation, One Redmond Way, Redmond, WA, USA
  2. Multiple columns – If you map a column to each domain of a composite domain, the row will be cleansed using the composite domain logic. If you have not mapped each domain from the composite domain, the columns will be cleansed individually
The Mapping tab allows you to select the columns you want to cleanse, and map them to domains in your knowledge base.



Note that the Domain drop down will automatically filter out columns with incompatible data types (for example, it won’t show domains with a String data type if you are using a DT_I4 column).
The Advanced tab has number of different options – most of which control the columns that will be included in the output.


OptionDescription
Standardize outputEnables to standardize output according to domain settings (that can be defined in the DQS client application). There are two kinds of standardization. First is formatting output to Upper/Lower/Capitalize. Second is correcting to leading value (this is relevant in case of synonyms). You can see how this is defined in the client, Domain Management | Domain Properties tab.
ConfidenceA score that is given to any correction or suggestion. This score reflects to what extent the DQS server (or the relevant Reference Data Source) has confidence in the correction/suggestion.
ReasonIn case that the output is different than the original value, this field explains why. For example, it can be Invalid because of domain rule. Or, it can be Corrected because of utilizing DQS Cleansing algorithm, standardized to leading value, etc.
Appended Data
(additional data received from the reference data provider)
This setting is used when there are domains attached to a Reference Data Source (RDS). In this case, sometimes the RDS returns additional information – not only values associated with the mapped domains. For example, when sending address, it can return also Lat/Long. The Appended Data field includes this additional information.
Appended Data SchemaIf the RDS returned additional information, the schema field consists of the schema of how to interpret this data.
Encrypt connectionThis determines whether the connection to SQL Server will be encrypted (using SSL). (Note, this setting will most likely be moved to the Connection Manager in the future)


After mapping one of my input columns (StateProvinceName) to one of the domains (State), I check the Confidence and Reason boxes on the Advanced tab, and click OK to save the changes. If I connect the path to another transform and look at it’s metadata (right click on the path, Edit … and click the Metadata tab), I can see that four columns were added for each input column I chose to cleanse – Corrected_<column>, Status_<column>, Confidence_<column> and Reason_<column>. The first two show up by default – the last two are there because of the options I selected on the Advanced tab.

The Corrected column will contain the value for the field, whether or not it was corrected by DQS. This is most likely the field you’ll want to use later on in your data flow.

The Status column will contain the result of the correction operation. Possible values are

StatusDescription
CorrectThe value was already correct, and was not modified
InvalidThe value was marked as invalid for this domain
CorrectedThe value was incorrect, but DQS was able to correct it. The Corrected column will contain the modified value.
UnknownThe value wasn’t in the current domain, and did not match any domain rules. DQS is unsure whether or not it is valid.
SuggestionThe value wasn’t an exact match, but DQS has provided a suggestion. If you include the Confidence field, you could automatically accept rows above a certain confidence level, and redirect others to a separate table for later review.


Runtime

Some things to note about the DQS Cleansing transform:
  • It is implemented as an Asynchronous component
  • It does not expose an Advanced Editor (although this might change for the final release)
  • It sends incoming data to the DQS server for cleansing
    • When using an RDS, this data may be in turn sent to the SQL Azure Data Market provider


Data Quality Services Server/Client in SQL Server 2012

DQS Server is installed as part of the SQL Server Setup. It includes database for storing the knowledgebase information and the policies and rules that are applied to business data during cleansing and matching activities. It also includes the DQS engine, which is what applies all the information and rules in the databases.

DQS Client is a stand-alone client that can be used by people in different roles to clean data, maintain the knowledge bases or configure DQS Server. You have to have sufficient permissions, either on the SQL Server as a whole or as a specific DQS role, in order to open the DQS Client and connect to the appropriate DQS Server.

DQS Client allows users and administrators to perform key duties: knowledge base management, data quality projects and administration.

Knowledge Base Management

As described in the previous post, the concept of a knowledge base is core to DQS. Within the knowledge base, you have a number of domains. A domain is a representation of a type of data that is used by the business, such as a customer name, an address, or a contact phone number. For each of these domains, there are certain values which can be trusted and certain values that are invalid. The example I used in the previous post was of counties. County would be a domain, with values like Nottinghamshire and Berkshire as trusted values because they are known to be real counties. You can include synonyms, so that the value of Notts is known to be referring to the same thing as Nottinghamshire.

Within the Knowledge Base Management part of the DQS Client, you can create and manage domains, set the trusted values and give examples of invalid values. Each knowledge base contains one or more managed domain.

Defining your knowledge base and all the domains, values and rules within it can be done manually through the DQS Client. It can also be done through the Knowledge Discovery process to automatically generic rules that can then be approved and included in the knowledge base.

Data Quality Projects

Once you’ve built your knowledge base and created the rules which are applicable to your data, you need to use them. The Data Quality Projects section of the client tool lets you apply your knowledge base for either cleansing of data or data matching.

Cleansing projects apply rules to find data that is incomplete, inaccurate or not in line with your policies. DQS applies the rules from your knowledge base where applicable and can come up with suggestions if there are some values which are not in the domain, but which are similar to domain values. These suggestions allow you to catch typos without having to think of a rule for every single mistake that might possibly be made.

Matching projects pull in the data and look for records which are similar. DQS applies the matching rules defined in the client and, using things like synonyms and similarity of phrases, highlights duplicate records. It can combine records where there is a definite match or highlight ones where there is a possible match, including a confidence score based on the similarity of the records and the specifics of the rules.

Both of these projects then allow you to export the cleaned data so that it can be used by your systems.

Administration

This section of the tool lets you track on-going activities within DQS and update the configuration settings.

A data steward can use this part of the tool to view what processes are in progress within DQS. That person can then choose to stop processes if required.
This is also the place to define the settings for using reference data. The idea of reference data is that you can pull in information from an external source, for example the Azure data marketplace, and use that as a comparison to clean up internal data (e.g. matching addresses to postcodes). You would use the administration part of the DQS Client to configure these connections.
You can also change some settings in this section, for things like logging and notifications.

So you can see that the DQS Client tool is a vital component of a data quality project because this is where so much of DQS functionality is implemented from. 

 Installing Data Quality Services

Data Quality Services is an option in the main install, but when you go to run the Data Quality Client, you’ll get a message stating that DQS is not installed. As far as I can tell, DQS needs to be installed manually, by running the DQSInstaller.exe, which you can find in the SQL Server Binn directory. This will create two SQL Server databases:


Data Quality Client

Once DQS is configured, you’ll be in a position to use the Data Quality Client, which is a windows application, available in 32 or 64 bit. Once you connect, you’ll get the following screen:


The knowledge base is the key to how DQS works, being utilised to determine the data quality of your source data. You can create your own knowledge base, using your own data, or even cloud-based data. For this example, I’m going to use the built in knowledge base called DQS Data.
As an example, I’ve created some data that I want to cleanse. It’s adventure works country data that I’ve put into a separate table that I’ve called dbo.Country.
My task now is is to clean this data, which I can do by creating a data quality project:


I’ve called my project Countries, and I’ve picked the built-in DQS Data knowledge base, which I noticed contains reference data for countries. The activity that I’ve selected is Cleansing, and then I’ve clicked Create.

DQS will then prompt for mapping the source data to one of the domains in the Knowledge Base. Here I’ve mapping my country name to the Country domain from the knowledge base:


After clicking the Start button on the next screen, the cleaning process starts, which gives the following results:

I put a couple of typos into my source data, such as changing Algeria to ‘Algerian’ and Albania to ‘Albana’. These were picked up by DQS, along with a few others, but a user has the opportunity to approve or reject, via the radio buttons shown above. I chose to approve the first two, and then clicked next. In the final screen, DQS allows you to output the cleaned data to a separate table, as shown below:


If you go into the table that DQS has created, you will see that there is a Name_Status column, which holds the cleaning status on a per record basis:


This can be used to update the source data, if required, and therefore address the data quality issues.


Data Quality in SQL Server 2012

Data Quality Services is a new set of functionality in SQL Server 2012, based around the concept of knowledge-driven quality.


The idea behind this concept is that there is a big difference between what’s invalid according to a computer system and what’s invalid according to common sense. Some system might have a field for a person’s age and the system thinks a value is valid if the age is given in numerical form as an integer. The system wouldn’t have a problem with an age listed as 222. A human being looking at this would spot this as an anomaly and guess that the person entering the age mistyped 22. Another example could be counties, stored as string values. A human being would know that Nottinghamshire is a valid county but Narnia isn’t. An automated system wouldn’t spot the mistake.

This is where a knowledge-driven solution comes in. The idea is to take these pieces of knowledge that we think of as common sense and store them in such a way that the business applications and databases can check values against a knowledge base. This knowledge base might include a list of counties so you can have the systems check all values entered in the County field of a database and look for anomalies.

You then have various rules for how these anomalies are treated. You might create automatic rules, for example to change the abbreviation Notts to the full Nottinghamshire so that you can have consistent values without needing a human being to make any changes. You might also have people, known as data stewards, who are alerted to suspected anomalies so that they can check whether values need to be fixed.

A data quality project tends to have two elements to it. One is an initial fix to clean up bad data. This is known as a data cleansing project. As the name implies, the end goal is to have a set of clean data. The tools look through the data, transforming values to match a standard, flagging outlying values that might be anomalies and suggesting changes that could be made. It also hunts for possible duplicates through data matching, applying policies to look for entries in the database that might refer to the same thing. For example, the entries for Jess Meats and Jessica Meats might be flagged up as a possible match. After the cleansing and matching, the output is delivered as a set of data that has been cured of as many inaccuracies as possible.

The second part of a data quality project is what happens next to keep the data clean. As with Master Data Management, this isn’t a fix-once act. It’s very easy for data quality issues to creep back in after the cleansing has taken place so an implementation of Data Quality Services needs to bear in mind what should happen next. The processes and policies need to be defined to ensure that the data quality knowledgebase is used in future to maintain the quality of the data. It’s also important to identify the data stewards who will be responsible for fixing any problems the knowledgebase flags.

It’s also important to think of the knowledgebase as an on-going project. Things change. Data changes. The knowledgebase should therefore also change. The set of knowledge and rules within the knowledgebase can grow over time, bringing more control and accuracy to your data. As more data passing through the knowledgebase, it becomes more tuned to picking out anomalies and better at identifying what the correct value should be.

A Data Quality Services project should include both the plan for how to clean the data initially and how to maintain quality moving forward.



Before either of these can start, however, you need to define what you want your data to look like. A common example is a male/female field. This seems like it should be simple, but one system might store this as male or female, another might have m or f, another might have 0 or 1, yet another might have 0 and 1 again, but with the digits referring to the opposite gender. If you’re merging systems, moving data between systems or pulling together data from multiple systems for reporting, you can have chaos. A key part of a data quality project is working out what you want to be the correct value. Once you’ve done that, you can start applying the rules to change the other values so you end up with consistency across your whole data set.

So when you’re starting to work with Data Quality Services, first take a look at your existing data and decide what you’d like it to look like. Then you can do your data cleansing and data matching to give yourself a clean and accurate set of data to start with. Then you need to hook your knowledgebase into your processes to ensure data quality moving forward.


PowerShell in SQL Server 2012

PowerShell is the latest scripting language designed for windows administrators, system administrators and database administrators. SQL Server 2012 allows administrators to perform management and daily tasks via PowerShell. Administrator can also script SQL Server 2012 tasks with PowerShell. Admins can easily build complicated scripts with powershell since it allows for more roust logic compare to transactSQL.

 SQL Server 2012 PowerShell module loads a list of cmdlets that allows administrator of SQL Server 2012. This addin “sqlps” enable your powershell console to run specific SQL Server cmdlets, run powershell scripts and manage SQL Server objects. To load the SQL Server 2012 PowerShell add-in / module:

  1. Open PowerShell console.
  2. Run “Set-ExecutionPolicy unrestricted”
  3. Import-module sqlps
If you wish to use SQL Objects and SMO assemblies, you need to load them manually. Use below PowerShell script to load them.

$sql2011registry="HKLM:\SOFTWARE\Microsoft\PowerShell\1\ShellIds\Microsoft.SqlServer.Management.PowerShell.sqlps"
$item = Get-ItemProperty $sql2011registry
$powershelldir = [System.IO.Path]::GetDirectoryName($item.Path)
$smoarray =
"Microsoft.SqlServer.Management.Common",
"Microsoft.SqlServer.Dmf ",
"Microsoft.SqlServer.Smo",
"Microsoft.SqlServer.Instapi ",
"Microsoft.SqlServer.SqlWmiManagement ",
"Microsoft.SqlServer.SmoExtended ",
"Microsoft.SqlServer.ConnectionInfo ",
"Microsoft.SqlServer.SqlTDiagM ",
"Microsoft.SqlServer.SString ",
"Microsoft.SqlServer.Management.Sdk.Sfc ",
"Microsoft.SqlServer.SqlEnum ",
"Microsoft.SqlServer.RegSvrEnum ",
"Microsoft.SqlServer.WmiEnum ",
"Microsoft.SqlServer.Management.DacEnum",
"Microsoft.SqlServer.ServiceBrokerEnum ",
"Microsoft.SqlServer.Management.Collector ",
"Microsoft.SqlServer.Management.CollectorEnum",
"Microsoft.SqlServer.ConnectionInfoExtended ",
"Microsoft.SqlServer.Management.RegisteredServers ",
"Microsoft.SqlServer.Management.Dac",
"Microsoft.SqlServer.Management.Utility"
foreach ($assembly in $smoarray)
{
$assembly = [Reflection.Assembly]::LoadWithPartialName($assembly)
}
Push-Location
cd $powershelldir
update-FormatData -prependpath SQLProvider.Format.ps1xml
Pop-Location


Now that all the prep work are done, you can use “Invoke-SQLcmd –query “select * from table”” to run your SQL queries. For full list of commands, run get-command.
let’s start talking about interactions between management studio and powershell. Note that you can run Powershell directly from SQL Server 2012 management studio. This is done by simply rightclicking the database of your choice and choose “Start PowerShell”. This will set the selected database as the default path. When you run powershell directly from management studio, “sqlps” module is loaded automatically and no need to perform prep work described earlier. Prep work mentioned about is only need when running SQL commands from plain old powershell and running SQL 2011 PowerShell Scripts.

SQL Server 2012 Agent and PowerShell

You may schedule sql agent tasks using powershell scripts by including the ps1 script in your jobs. Basically you are launching powershell, loading the sqlps module and running the sql cmdlets. To do this:
Open SQL Server Management Studio
  1. Open “SQL Server Agent” Node.
  2. In SQL Server 2012 agent, create a new job and open the properties, go to “steps”, then choose “New”
  3. Enter the name of the job, choose “PowerShell” as the type list.
  4. In the Command pane, paste the powershell script content.
  5. Follow on screen instructions.

Working with SQL Management Object Assemblies

SQL server Powershell allows browsing of SQL server objects as how you would browse the file system, objects can be located by path and perform actions by using its methods. Below are list of paths:

SQLSERVER:\SQL - Database objects, such as tables, views, and stored procedures.

SQLSERVER:\SQLPolicy - Policy-based management objects, such as policies and facets.

SQLSERVER:\SQLRegistration - Registered server objects, such as server groups and registered servers.

SQLSERVER:\Utility - Utility objects, such as managed instances of the Database Engine.

SQLSERVER:\DAC - Data-tier application objects such as DAC packages, and operations such as deploying a DAC

SQLSERVER:\DataCollection - Data collector objects, such as collection sets and configuration stores.


    Tuesday 26 June 2012

    Column Store Index in SQL Server 2012

    A Column Store Index stores a single column in a separate SQL Server Page. As opposed to Row Store Index mentioned above where collection of rows reside in a Page, in a Column Store Index, a column is allocated a separate Page.

    Column Store Indexes are designed for Business-Intelligence (BI) queries in Online Analytical Processing (OLAP) environment where you have to play with a huge volume of data. In such BI requests, data often needs to be queried as filtered columns from a data warehouse.

    The question now arises what are the benefits of allocating a separate Page to each column? Here are the benefits:
    (1) Only required pages can be fetched from the disk resulting in improved performance of query processing. As each column can be accessed independently, IO is reduced.
    (2) Column Store Index technology makes use of compression algorithm to eliminate redundancy in column values.
    (3) Frequently accessed columns remain in memory.
    The Column Store Index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which it shares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have to fit in main memory, but they can effectively use as much memory as is available on the server. Portions of columns are moved in and out of memory on demand.

    Limitations:


    • At this point of time, Column Store Indexes are read-only in nature and therefore DML statements are not possible in regular OLTP environment.
    • Computed columns cannot be part of a Column Store Index.
    • Column Store Index cannot be created on indexed views.

    Creating a Column Store Index

    CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_CustomerOrders_ColumnStore] ON [CustomerOrders] (TransactionDate, StockQty, ItemID)

    Performance Test

    AdventureWorks sample database is used for performing tests.

    --Create the Test Table
    USE [AdventureWorks2008R2]
    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[Test_Person](
     [BusinessEntityID] [int] NOT NULL,
     [PersonType] [nchar](2) NOT NULL,
     [NameStyle] [dbo].[NameStyle] NOT NULL,
     [Title] [nvarchar](8) NULL,
     [FirstName] [dbo].[Name] NOT NULL,
     [MiddleName] [dbo].[Name] NULL,
     [LastName] [dbo].[Name] NOT NULL,
     [Suffix] [nvarchar](10) NULL,
     [EmailPromotion] [int] NOT NULL,
     [AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
     [Demographics] [xml](CONTENT [Person].[IndividualSurveySchemaCollection]) NULL,
     [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
     [ModifiedDate] [datetime] NOT NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO
    -- We Populated this table with the Data Stored in Table Person.Person.
    -- As we need Plenty of data so we ran the loop 100 times.
    INSERT INTO [dbo].[Test_Person] 
    SELECT P1.*
    FROM Person.Person P1
    GO 100
    -- At this point we have 1,997,200 rows in the table.
    -- Create Clustered Index  on Coloun [BusinessEntityID] 
    CREATE CLUSTERED INDEX [CL_Test_Person] ON [dbo].[Test_Person]
    ( [BusinessEntityID])
    GO
    -- Creating Non - CLustered Index on 3 Columns
    CREATE NONCLUSTERED INDEX [ColumnStore__Test_Person]
    ON [dbo].[Test_Person]
    ([FirstName] , [MiddleName],[LastName])
    
    -- Creating Non - CLustered  ColumnStore Index on 3 Columns
    CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]
    ON [dbo].[Test_Person]
    ([FirstName] , [MiddleName],[LastName])
    

    At this point we have created the ColumnStore Index on our test table. Now we will run the SELECT query with and without the ColumnStore Index and analyze performance.  

    Query Without ColumnStore Index

    select [LastName],Count([FirstName]),Count([MiddleName])
    from dbo.Test_Person 
    group by [LastName]
    Order by [LastName]
    OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

    We have used the OPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) query hint to not to use the ColumnStore Index this time.  

    Query With ColumnStore Index

    select [LastName],Count([FirstName]),Count([MiddleName])
    from dbo.Test_Person 
    group by [LastName]
    Order by [LastName]
    


    Here are the Actual Execution Plans for both queries:



    We can see the cost when using the NonClustered Index is 59 % while using the ColumnStore index is 13%.
    Now if we hover the mouse over the Index Scans we can see details for these operations. The below is a comparison:


    It is clear from the results that the query performs extremely fast after creating the ColumnStore Index as the column needed for the query is stored in the same page and the query does not have to go through every single page to read these columns.

    Performing INSERT, DELETE or UPDATE Operations

    We cannot perform DML ( Insert\ Update \ Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.
    For example, if you perform a DELETE operation on a table with a ColumnStore Index you will get this error:

    Msg 35330, Level 15, State 1, Line 1
    DELETE statement failed because data cannot be updated
    in a table with a columnstore index. Consider disabling the
    columnstore index before issuing the DELETE statement,
    then rebuilding the columnstore index after DELETE is complete.

    However, to perform the operation we would need to disable the ColumnStore Index before issuing the command as shown below:
     
    ALTER INDEX 'Index Name' on 'table name' DISABLE


    Creating a ColumnStore Index using Management Studio

    Right click and select New Index and select Non-Clustered Columnstore Index...


    Click add to add the columns for the index.



    After selecting the columns click OK to create the index.



    Monday 25 June 2012

    Contained Database in SQL Server 2012

    Contained Databases have been an oft requested feature which has finally arrived in SQL Server 2012. In prior SQL Server versions, database were not 100% portable. In case you need to move a database from one server to another using backup and restore, the data in the database will of course be preserved. However, to work with the database, you need several more objects such as logins etc. However, these objects are outside the database and so would not be included in any backup.

    With Contained databases, there is only a small functional dependency on SQL Server Instance or in other words, the database is self-contained and all the objects will reside in the database itself.

    There are three Contained types.
    <><> <><> <><> <><> <><>
    Contained TypeDescription
    NONEDefault contained type is what we are used to have in SQL Server 2008 R2 and prior versions.
    PARTIAL Partially Contained Databases provides some isolation from the instance of SQL Server but not full containment.
    FULL(This is currently not available in SQL Server 2012 but in FULL mode, users will not be not permitted to cross the database)


    Configuring Contained Databases

    Firstly, you will need to enable the Contained Database feature at the aerver level either from the user interface or from a script.

    sp_configure 'show advanced options',1
    GO
    RECONFIGURE WITH OVERRIDE
    GO
    sp_configure 'contained database authentication', 1
    GO
    RECONFIGURE WITH OVERRIDE
    GO


    Alternatively you can use the SSMS user interface:
    Right click the server, select Properties > Advanced and set Enabled Contained Databases to True.






    Next you will need to configure Contained Database at the database level, to do this from using code:

    CREATE DATABASE [ContainedDatabase]
    CONTAINMENT = PARTIAL
    ON PRIMARY
    ( NAME = N'ContainedDatabase', FILENAME = N'C:\myPath\ContainedDatabase.mdf')
    LOG ON
    ( NAME = N'ContainedDatabase_log', FILENAME = N'C:\myPath\ContainedDatabase_log.ldf')
    GO
    As you can see in the above script, the CREATE DATABASE syntax is tweaked to enable the contained database feature.
    You can use the SSMS user interface to enable contained database at the database level:


    In SQL Server 2012, you will see only two options i.e. None and Partial however
    we may see the Full contained type in future releases of SQL Server 2012.
    The new Contained Databases feature will be reflected in the sys.databases but not in sys.sysdatabases:

    SELECT Name,containment
    FROM sys.databases
    As you can see in the above image, the new column is included to identify whether the database is contained database or not.
    You can change the containment to PARTIAL or to NONE by executing following script.

    USE [master]
    GO
    ALTER DATABASE [ContainedDatabase]
    SET CONTAINMENT = PARTIAL WITH NO_WAIT
    GO
    Let us see how user login works with Contained Databases.
    If you are creating a user in any of SQL Server 2012 prior versions, this is the syntax you need to use.

    USE [master]
    GO
    CREATE LOGIN [None_Contained_Login]
    WITH PASSWORD=N'Qwerty123',
    DEFAULT_DATABASE=[master]
    GO
    USE [AdventureWorks2008R2]
    GO
    CREATE USER [None_Contained_Login]
    FOR LOGIN [None_Contained_Login]
    GO
    This script will create a login with s password and then create a user by adding the previously created login.
    In Contained databases this is how you create a user.

    USE [ContainedDatabase]
    GO
    CREATE USER [Contained_User]
    WITH PASSWORD=N'Qwerty123'
    GO
    Note the two two differences:
    • Unlike in the standard (pre SQL Server 2012) user creation, you have only one create statement. In this, you will have only a CREATE USER statement and no CREATE LOGIN statement.
    • In SQL server 2012 you assign the password to the USER and not to the LOGIN unlike in the standard user creation.
    The above query cannot be run on ‘None’ contained or standard databases. If you attempt to run this statement you will trigger the following error.

    Msg 33233, Level 16, State 1, Line 1
    You can only create a user with a password in a contained database.

    However, you can create a standard user in contained databases since you have used PARTIAL contained database type (Note that in future this will fail in the FULL contained database type).
    For UI lovers, right click the users under the Security node in the Contained database and select New User… where you will be taken to the following screen


    In the above screen, you will see that the user type is a SQL user with password which is what contained database type requires.

    When you login as a contained user, make sure you enter the correct database as the default database. In previous SQL Server versions, logins are unique for the server instance. However, one contained user name can be in another database. Therefore, it is essential to enter database name at the login as shown below.


    After login into the server, you will see following in SQL Server Management Studio.



    Which means you can only interact with the relevant database and its objects.
    After creating contained specific objects like the contained user that we created, you cannot be revert the database back to a ‘None’ contained database. Note the below error which will be encountered if you attempt to create a user using the standard method:

    Msg 33233, Level 16, State 1, Line 1
    You can only create a user with a password in a contained database.
    Msg 5069, Level 16, State 1, Line 1
    ALTER DATABASE statement failed.

    Apart from users, tables or temporary tables will be created within the database. In previous editions, if your database collation is different from server collation, there will be some issues with temporary tables. The temporary table collation is a server collation therefore joining user tables and temporary tables will result following error:

    Cannot resolve the collation
    conflict between "SQL_Latin1_General_CP1_CI_AS" and
    "Danish_Norwegian_CI_AI" in the equal to operation.

    With contained databases, all objects are in the database itself, even temporary tables will be created inside the database. So you won’t have such issues.
    There is a new DMV named sys.dm_db_uncontained_entitieswhich shows any uncontained objects in the database. If sys.dm_db_uncontained_entitiesis empty, your database does not use any uncontained entities.

    Migrating Users

    In Contained Databases you also have the option of converting a non-partially contained database to a partially contained database. However, after converting the database, you need to manually convert users since existing database users are not automatically migrated as well.


    The following query will give you existing users in the database.


    SELECT
    name, type_desc, authentication_type, authentication_type_desc
    FROM sys.database_principals





    Note that, though the database is changed to Containted, the users are still at INSTANCE level or server level. These users can be converted by following query.


    EXECUTE sp_migrate_user_to_contained
    @username = N'Dev1',
    @rename = N'keep_name',
    @disablelogin = N'disable_login'

    You can also rename the existing user as well as disable the existing login.
    After running the above query, if you execute the query you executed before, you will see that the Dev1 user is now changed to the DATABASE level which is s contained user.







    Sunday 24 June 2012

    What’s new in SQL Server 2012

    What’s new in SQL Server 2012

    Business intelligence (BI) is critically important to organizations both large and small. In the latest release of SQL Server, there is no shortage of BI enhancements in addition to many other enhancements. This article will give you a preview of the requirements, new features and enhancements in SQL Server 2012 (code named Denali) including:

    Always On

    A new feature called Availability Groups that is an improved version of database mirroring

    Improved failover clustering which supports multi-site clustering across subnets – above to failover across data centres.

    ColumStore Index

    A new type of index for data warehouses. This is actually pretty simple: the name says it all. Here is Microsoft’s illustration:


    A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored.

    Why do this? Because it is more efficient when the query only requests a a few columns from the table. Microsoft claims performance improvements from 6X to 100X in cases where the the data can be cached in RAM, and thousand-fold improvements where the working set does not fit in RAM.

    SQL Server Data Tools

    This is my favourite feature, probably because it is developer-focused. These are the tools that were code-named “Juneau” and which install into Visual Studio 2010. There are some visual tools, but this is essentially a code-centric approach to database design, where you design your database with all its tables, queries, triggers, stored procedures and so on. You can then build it and test it against a private “localdb” instance of SQL Server. What I like is that the database project includes the entire design of your database in a form that can be checked into source control and compared against other schema versions. Here is the Add New Item dialog for a database project:


    Data Quality Services

    Data Quality Services (DQS) lets you check your data against a Data Quality Knowledge Base (DQKB), the contents of which are specific to the type of data in the database and may be created and maintained by your business or obtained from a third-party. If your data includes addresses, for example, the DQKB might have all valid city names to prevent errors. Features of DQS include data cleansing, de-duplication through data matching, profiling a database for quality, and monitoring data quality.


    Updated SQL Server Management Studio

    SQL Server Management Studio now runs in the Visual Studio 2010 shell.

    LocalDB

    LocalDB is a local instance of SQL Server aimed at developers and for use as an embedded database in single-user applications. It is a variant of SQL Server Express, but different in that it does not run as a service. Rather, the LocalDB process is started on demand by the SQL native client and closed down when there are no more connections. You can attach database files at runtime by using AttachDBFileName in the connection string. LocalDB is intended to replace user instances which are now deprecated.

    FileTables

    The FileTable feature brings support for the Windows file namespace and compatibility with Windows applications to the file data stored in SQL Server … In other words, you can store files and documents in special tables in SQL Server called FileTables, but access them from Windows applications as if they were stored in the file system, without making any changes to your client applications.
    Enterprises can move this data from file servers into FileTables to take advantage of integrated administration and services provided by SQL Server. At the same time, they can maintain Windows application compatibility for their existing Windows applications that see this data as files in the file system.
    Integration of the file system and the database is not a new idea, and Microsoft has tried variants before, such as the “M” drive that was once part of Exchange, the aborted WinFS feature planned for Windows Longhorn (Vista), and SharePoint, which can store documents in SQL Server while presenting them as Windows file shares through WebDAV.
    That said, FileTables in SQL Server 2012 are not an attempt to reinvent the file system, but presented more as a way of supporting legacy applications while managing data in SQL Server. It is an interesting feature though, and it would not surprise me if users find some unexpected ways to exploit it.

    Power View

    Codenamed “Project Crescent”, this is a web-based reporting client for businesses that have embraced Microsoft’s platform, because it has several key dependencies:
    • SharePoint Server Enterprise Edition
    • SQL Server Reporting Services
    • Silverlight on the client
    Power View reports that I have seen do look good, and have an Office ribbon style designer for designing customising the report. That said, I would guess that Microsoft now wishes it had used HTML 5 rather than Silverlight for this – there are those Apple iPad and Windows 8 Metro users to think of, after all.
    Microsoft emphasises that Power View is not a replacement for Report Designer or Report Builder, but an ad-hoc reporting tool.

    Hardware and Software Requirements

    Microsoft recommends using NTFS file format instead of FAT32. FAT32 will work but you should probably not use it.

    You can't install SQL Server 2012 (code-named Denali) on mapped drives or compressed drives.

    You have to have the "no-reboot" package installed prior to installing SQL Server 2012 (code-named Denali). This is included in Windows 7 SP1 and Windows Server 2008 R2. Otherwise, you can download the no-reboot package from Microsoft.

    SQL Server 2012 (code-named Denali) requires the .NET Framework 4.0.
    Virtualization is supported using Microsoft's Hyper-V technology.
    You will need at least 3.6 GB of free disk space.
    Microsoft recommends that you do not install SQL Server 2012 (code-named Denali) on a domain controller.
    Recommended Processors & RAM
    64-bit version: AMD Opteron, AMD Athlin 64, Intel Xeon with Intel EM64T Support or Intel Pentium IV with EM64T support running 2.0 GHz or faster. Recommended RAM is maximum that operating system supports or at least 2 GB.
    32-bit version: Pentium III or compatible running at 2.0 GHz of faster. Recommended RAM is maximum that operating system supports or at least 2 GB.
    Windows PowerShell 2.0 is a pre-requisite for installing SQL Server 2012 (code-named Denali). You can get this software from the Windows Management Framework page.

    Multi-Subnet Failover Clustering

    With SQL Server 2012 (code-named Denali), you can configure SQL Server where failover cluster nodes can be connected to a completely different subnet. The subnets can be spread out to different geographical locations providing disaster recovery along with high availability. In order for this to work correctly, you will need to replicate the data across the databases involved in this configuration. The SQL Server failover cluster is dependent on the Windows Server failover cluster so this has to be set up first. Keep in mind that all of the subnets involved in this configuration must be in the same Active Directory domain.

    Programming Enhancements


    Sequences: Sequences have been requested by the SQL Server community for years, and it's included in this release. Sequence is a user defined object that generates a sequence of a number. Here is an example using Sequence.

    /****** Create Sequence Object ******/

    CREATE SEQUENCE MySequence

    START WITH 1
    INCREMENT BY 1;

    /****** Create Temp Table ******/
    DECLARE @Person TABLE
    (
    ID int NOT NULL PRIMARY KEY,
    FullName nvarchar(100) NOT NULL
    );

    /****** Insert Some Data ******/
    INSERT @Person (ID, FullName)
    VALUES (NEXT VALUE FOR MySequence, 'Jim Johnson'),
    (NEXT VALUE FOR MySequence, 'Bob Thompson'),
    (NEXT VALUE FOR MySequence, 'Tim Perdue');

    /****** Show the Data ******/
    SELECT * FROM @Person;

    The results would look like this:

    ID FullName
    1 Jim Johnson
    2 Bob Thompson
    3 Tim Perdue

    Ad-Hoc Query Paging

    Paging results in SQL Server has been discussed for years. The Order By option in the SQL SELECT statement has been enhanced in SQL Server 2012. Using a combination of OFFSET and FETCH along with ORDER BY gives you control of paging through a result set. Using this technique can really help performance by bring back only the results you want to show to your users when they are needed. The following TSQL code runs against the Person table in the AdventureWorks sample database (available from Microsoft). In the sample query below, SQL Server would return 10 records beginning with record 11. The OFFSET command provides a starting point for the SELECT statement in terms of paging, and the FETCH command provides how many records to return at a time.

    SELECT BusinessEntityID, FirstName, LastName
    FROM Person.Person
    ORDER BY BusinessEntityID
    OFFSET 10 ROWS
    FETCH NEXT 10 ROWS ONLY;

    Full Text Search

    The Full Text Search in SQL Server 2012 has been enhanced by allowing you to search and index data stored in extended properties or metadata. Consider a PDF document that has "properties" filled in like Name, Type, Folder path, Size, Date Created, etc. In the newest release of SQL Server, this data could be indexes and searched along with the data in the document itself. The data does have to be exposed to work, but it's possible now.

    BI and Web Development Environment Improvements

    Microsoft moved BI (Business Intelligence) closer to the end user with SQL Server 2008 R2. The Excel PowerPivot tool help users by creating a self-service reporting model.

    The good news is PowerPivot is being enhanced in SQL Server 2012 (code-named Denali). Microsoft is adding KPIs and drill through, which will be really useful for all users.

    Analysis Services will include a new BI Semantic Model (BISM). BISM is a 3-layer model that includes:
    • Data Model
    • Business Logic
    • Data Access
    BISM will enhance Microsoft's front end analysis experiencing including Excel, Reporting Services and SharePoint Insights. Microsoft has said that BISM is not a replacement for the current BI Models but more of an alternative model. In simple terms, BISM is a relation model that includes BI artifact such as KPIs and hierarchies.