Saturday, 14 July 2012

Getting started with SSIS - Introduction to SSIS

SSIS is a tool used for ETL. Before we talk on SSIS, let me walk you through what is ETL. ETL stands for Extract Transform and Load. These are simple day-to-day words we use in our daily lives. The figure below depicts ETL in real world scenario

E - Extract data from various homogeneous or non-homogeneous source systems. Data could be stored in any of the following forms though not limited to them: Flat file, Database, XML, Web queries etc. When we can have sources of such variety, the job of extraction is to fetch the data from these sources and make it available for the next step.

T - Transform Data: As already discussed that the data are coming from various sources and we cannot assume that the data is structured in the same way across all the sources. Therefore, we need to transform the data to a common format so that the other transformations can be done on them. Once we have the data we need to perform various activities like:
  • Data cleansing
  • Mandatory check
  • Data type check
  • Check for foreign key constraints
  • Check for business rules and apply business rules
  • Creation of surrogate keys
  • Sorting the data
  • Aggregating the data
  • Transposing the data
  • Trim the data to remove blanks.
The list can go on as the business requirements get complex day by day and hence the transformations get complex. While transformations are on, we need to log the anomalies in data for reporting and corrective action to be taken.

L Load Data: Once the transformations are done and the data takes the form as per the requirement, we have to load the data to the destination systems. The destinations can also be as varied as the sources. Once the data reaches the destination, it is consumed by other systems, which either stores it as historical data, generate reports out of it, build modes to take business decisions etc.

SSIS stands for SQL Server Integration Services. Microsoft introduced Business Intelligence Suite, which includes SSIS, SSAS (SQL Server Analysis Server) and SSRS (SQL Server Reporting Services).

Now what’s this Business Intelligence (BI)? Let me take some time to explain that. As the name suggests, it helps Business run across the globe. It provides the business with data and ways to look into the data and make business decisions to improve the business.

So, how do the 3 products work in the BI world or how are they organized? To start any business analysis we need data, and as I explained earlier, ETL would be used here to get the data from varied sources and put the data to tables or create Cubes for data warehouse. To do this, we make use of SSIS.

Once we have the data with us, SSAS comes into picture to organize the data and store them to cubes.

Next, we need to report the data so that it makes sense to the end user. This is where SSRS comes into picture for report generation.

The order of SSIS and SSAS could change, as both can come first. Having said this, SSIS makes the backbone of this entire domain as all the data is assembled using SSIS.
Now we dive in to what exactly SSIS is. Is it another coding language? The answer to the question is NO. SSIS is not another coding language. In fact very little coding is required in SSIS that too in very few cases. SSIS is a tool used for ETL. It is based on the paradigm shift where we need to focus more onto the business requirement and less on the actual coding to achieve the goals.

SSIS is a visual tool with drag and drop feature, which enables us to create SSIS, packages to perform ETL in a very short amount of time. The development time is widely reduced as compared to legacy systems where each aspect of ETL had to be coded and then of course, tested. Once the package is created, the visual look is good enough to give us an idea of what the ETL is doing.

SSIS provides us many tasks and transforms which help us build our ETL packages. A few examples would be Execute SQL Task which enables us to execute any T-SQL script from SSIS, Send Mail Task which enables us to send mail to any recipient. Likewise, there are lots of tasks and transforms available in SSIS to take care of most of the ETL scenarios that you can think of.

Note: I will explain what is Tasks and transforms in later chapters.

However, what if I have a scenario that cannot be accommodated in the existing tasks and transforms? SSIS has an answer to that too. In that case you can use the entire .NET library and code in either C# or VB to achieve your requirement. Though such scenarios would be very few and even if required will be of very basic nature where you need to know the basics of any programming language to get going. Let’s say that you are not satisfied with the logging provided by SSIS, you can always go ahead and write code to log the errors or activities the way you choose. Taking this a step further, if you want this new type of logging in many packages, you could convert your code to create a custom component and add it to all SSIS packages. It will now work as any other task or transform for you. Is that not great!!!

Note: C# is available only in SSIS 2008 and not 2005.
Let us peep into a simple SSIS package and get to know how SSIS looks:

This is how an SSIS package looks and feels. I will get into the details of what you see above in the coming chapters.

No comments:

Post a Comment