SQLServerBrains: SQL Server 2012

Showing posts with label SQL Server 2012. Show all posts

Thursday, 2 August 2012

Microsoft SQL Server 2012 Takes on Big Data with Hadoop

At PASS Summit 2011 in Seattle, one of the biggest surprise announcements was Microsoft's support for Apache Hadoop as a part of its SQL Server 2012 announcements.

To understand why Hadoop is important and how it relates to SQL Server, we need to get an idea of what Hadoop actually is and what it isn’t. First, Hadoop isn’t a relational database system, so it’s not a replacement or substitute for SQL Server. Hadoop is an open-source project that’s managed by the Apache Software Foundation. It was designed to solve a somewhat different problem—the problem of handling large amounts of unstructured data. SQL Server and other relational databases primarily store structured data. Data can be stored by using the XML and FileStream data types, but there can be limitations to the size, as well as the amount of processing power that can be applied to access the data. The basic technology behind Hadoop was originally developed by Google so that it could index all types of textual information. Google’s ideas were then incorporated into an open-source project named Nutch and later Yahoo! worked to transform Hadoop into an enterprise application. Hadoop is used by several notable companies, perhaps the most recognizable company is Facebook. In 2010, Facebook had the largest Hadoop cluster in the world, with more than 20PB of storage.

Hadoop is written in Java and runs on a collection of commodity shared-nothing servers. You can add or remove servers from a Hadoop cluster at anytime without disrupting the service. The more servers you use, the more computing power you get. A Hadoop implementation consists of two key components: the Hadoop Distributed File System (HDFS), which provides data storage across multiple servers, and high-performance parallel data processing, which uses a technique called MapReduce. MapReduce essentially splits up data discovery and indexing tasks by sending different parts to all of the servers in your cluster. Each server works on its own piece of the data. The results are then delivered back to the user as a complete set. In essence, MapReduce maps the operation out to all of the servers in the cluster and reduces the results into a single result set.

To implement Hadoop you can buy a collection of commodity servers and run the Hadoop software on each server to create a high-performance Hadoop cluster. For better scalability, you can add more servers. When you load all your data into Hadoop, the software breaks the data into pieces and distributes it across all the available servers. There's no central location in which you access your data. The Hadoop cluster keeps track of where the data resides and automatically stores multiple copies of the data. If a server fails or is removed from the cluster, Hadoop automatically replicates the data from a known copy.

Being an open-source product you might wonder what Hadoop has to do with Windows. At PASS Summit 2011, Microsoft announced that the company had created a Windows version of Hadoop that's able to run on Windows Server for on-premises implementations or on Windows Azure for cloud implementations. In addition, Microsoft is working with HortonWorks to develop bi-directional connectors for Hadoop and SQL Server. The SQL Server connector for Apache Hadoop lets customers move large volumes of data between Hadoop and SQL Server 2008 R2 or SQL Server 2012. There will also be a SQL Server Parallel Data Warehouse (PDW) connector for Hadoop that transfers data between Hadoop and SQL Server PDW. These new connectors will enable customers to work with both structured SQL Server data and unstructured data from Hadoop.

Hadoop isn’t a replacement for SQL Server’s relational database. Instead, it provides new capabilities that weren’t previously unavailable. I think Microsoft’s view is that Hadoop will be used in conjunction with SQL Server’s relational and analytic capabilities to enable enterprises to deploy Hadoop implementations alongside their exiting IT systems. This will extend the types of data that you can use in your applications similarly to how SQL Server Analysis Services (SSAS) does with the SQL Server relational database engine. In addition, it'll help SQL Server better compete with both Oracle and IBM’s DB2, which have also embraced Hadoop. Big data is a rapidly growing a trend and the ability to incorporate big data with SQL Server is a big deal. Click here for more information about Microsoft’s implementation of Hadoop and how it relates to SQL Server.

Sunday, 22 July 2012

Fulltext Search in SQL Server 2012

Fulltext Search codebase has been significantly revamped to address both query performance and throughput on large scale (millions of documents) with concurrent updates. With SQL2008 they moved all the index storage in the database file and majority of population logic in to the core engine to make fulltext search an integral and fully manageable engine component. However there was work to be done to make it perform and scale against best of the fulltext engines out there in the industry. With Denali CTP1, they are pleased to deliver this improvement.

they looked at the entire code base from how queries block while waiting an ongoing index update to release a shared schema lock, from how much memory is allocated during index fragment population, to how they could reorganize the query code base as a streaming Table Value Function to optimize for TOP N search queries, how they could maintain key distribution histograms to execute search on parallel threads, all the way to how they could take better advantage of the processor compute instructions (scoring ranks for example)… End result is that they are able to significantly boost performance (10X in many cases when it comes to concurrent index updates with large query workloads) and scale without having to change any storage structures or existing API surface. All our customers going from SQL 2008 / R2 to Denali will benefit with this improvement.

Besides performance and scale improvement, they also added support for property scoped searches over documents with file system properties stored with-in a fulltext enabled table. One can now issue a CONTAINS query looking for all documents containing a particular term and authored by a particular author without having to maintain a separate column for the Author name in the database.

they also improved NEAR operator in the CONTAINS predicate to allow users to specify distance between two terms and if the order of the term matters. It is important to note that distances between two words in a single sentence are much smaller compared to the same across two sentences (even if words are placed next to each other with a period in between), or across paragraphs, or across bullet points or across spreadsheet columns or worksheets.

Monday, 25 June 2012

Contained Database in SQL Server 2012

Contained Databases have been an oft requested feature which has finally arrived in SQL Server 2012. In prior SQL Server versions, database were not 100% portable. In case you need to move a database from one server to another using backup and restore, the data in the database will of course be preserved. However, to work with the database, you need several more objects such as logins etc. However, these objects are outside the database and so would not be included in any backup.

With Contained databases, there is only a small functional dependency on SQL Server Instance or in other words, the database is self-contained and all the objects will reside in the database itself.

There are three Contained types.

<><> <><> <><> <><> <><>

Contained Type	Description
NONE	Default contained type is what we are used to have in SQL Server 2008 R2 and prior versions.
PARTIAL	Partially Contained Databases provides some isolation from the instance of SQL Server but not full containment.
FULL	(This is currently not available in SQL Server 2012 but in FULL mode, users will not be not permitted to cross the database)

Configuring Contained Databases

Firstly, you will need to enable the Contained Database feature at the aerver level either from the user interface or from a script.

sp_configure 'show advanced options',1
GO
RECONFIGURE WITH OVERRIDE
GO
sp_configure 'contained database authentication', 1
GO
RECONFIGURE WITH OVERRIDE
GO

Alternatively you can use the SSMS user interface:
Right click the server, select Properties > Advanced and set Enabled Contained Databases to True.

Next you will need to configure Contained Database at the database level, to do this from using code:

CREATE DATABASE [ContainedDatabase]
CONTAINMENT = PARTIAL
ON PRIMARY
( NAME = N'ContainedDatabase', FILENAME = N'C:\myPath\ContainedDatabase.mdf')
LOG ON
( NAME = N'ContainedDatabase_log', FILENAME = N'C:\myPath\ContainedDatabase_log.ldf')
GO

As you can see in the above script, the CREATE DATABASE syntax is tweaked to enable the contained database feature.
You can use the SSMS user interface to enable contained database at the database level:

In SQL Server 2012, you will see only two options i.e. None and Partial however
we may see the Full contained type in future releases of SQL Server 2012.
The new Contained Databases feature will be reflected in the sys.databases but not in sys.sysdatabases:

SELECT Name,containment
FROM sys.databases

As you can see in the above image, the new column is included to identify whether the database is contained database or not.
You can change the containment to PARTIAL or to NONE by executing following script.

USE [master]
GO
ALTER DATABASE [ContainedDatabase]
SET CONTAINMENT = PARTIAL WITH NO_WAIT
GO
Let us see how user login works with Contained Databases.
If you are creating a user in any of SQL Server 2012 prior versions, this is the syntax you need to use.

USE [master]
GO
CREATE LOGIN [None_Contained_Login]
WITH PASSWORD=N'Qwerty123',
DEFAULT_DATABASE=[master]
GO
USE [AdventureWorks2008R2]
GO
CREATE USER [None_Contained_Login]
FOR LOGIN [None_Contained_Login]
GO
This script will create a login with s password and then create a user by adding the previously created login.
In Contained databases this is how you create a user.

USE [ContainedDatabase]
GO
CREATE USER [Contained_User]
WITH PASSWORD=N'Qwerty123'
GO
Note the two two differences:

Unlike in the standard (pre SQL Server 2012) user creation, you have only one create statement. In this, you will have only a CREATE USER statement and no CREATE LOGIN statement.

In SQL server 2012 you assign the password to the USER and not to the LOGIN unlike in the standard user creation.

The above query cannot be run on ‘None’ contained or standard databases. If you attempt to run this statement you will trigger the following error.

Msg 33233, Level 16, State 1, Line 1
You can only create a user with a password in a contained database.

However, you can create a standard user in contained databases since you have used PARTIAL contained database type (Note that in future this will fail in the FULL contained database type).
For UI lovers, right click the users under the Security node in the Contained database and select New User… where you will be taken to the following screen

In the above screen, you will see that the user type is a SQL user with password which is what contained database type requires.

When you login as a contained user, make sure you enter the correct database as the default database. In previous SQL Server versions, logins are unique for the server instance. However, one contained user name can be in another database. Therefore, it is essential to enter database name at the login as shown below.

After login into the server, you will see following in SQL Server Management Studio.

Which means you can only interact with the relevant database and its objects.
After creating contained specific objects like the contained user that we created, you cannot be revert the database back to a ‘None’ contained database. Note the below error which will be encountered if you attempt to create a user using the standard method:

Msg 33233, Level 16, State 1, Line 1
You can only create a user with a password in a contained database.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

Apart from users, tables or temporary tables will be created within the database. In previous editions, if your database collation is different from server collation, there will be some issues with temporary tables. The temporary table collation is a server collation therefore joining user tables and temporary tables will result following error:

Cannot resolve the collation
conflict between "SQL_Latin1_General_CP1_CI_AS" and
"Danish_Norwegian_CI_AI" in the equal to operation.

With contained databases, all objects are in the database itself, even temporary tables will be created inside the database. So you won’t have such issues.
There is a new DMV named sys.dm_db_uncontained_entitieswhich shows any uncontained objects in the database. If sys.dm_db_uncontained_entitiesis empty, your database does not use any uncontained entities.

Migrating Users

In Contained Databases you also have the option of converting a non-partially contained database to a partially contained database. However, after converting the database, you need to manually convert users since existing database users are not automatically migrated as well.

The following query will give you existing users in the database.

SELECT
name, type_desc, authentication_type, authentication_type_desc
FROM sys.database_principals

Note that, though the database is changed to Containted, the users are still at INSTANCE level or server level. These users can be converted by following query.

EXECUTE sp_migrate_user_to_contained
@username = N'Dev1',
@rename = N'keep_name',
@disablelogin = N'disable_login'
You can also rename the existing user as well as disable the existing login.
After running the above query, if you execute the query you executed before, you will see that the Dev1 user is now changed to the DATABASE level which is s contained user.