Sunday 15 July 2012

SSIS - Data Flow Task


Data Flow Task that literally means a task where DATA flows. That’s exactly what ETL does, Extract Transform & Load of data. Most of the ETL is done in the Data Flow Task.
A Data Flow Task has to be a part of a ‘Control Flow’. One control flow can have a number of DFTs in it, as the Control Flow is the place where we can decide the ways the multiple DFTs will be placed in our solution.
Each DFT has to have a Source, may or may not have a Transform, may or may not have a Destination. However a DFT needs to have at least a transform or Destination followed by the Source. Source cannot just get data from somewhere and do nothing with it. If that’s the case, DFT will fail.
Ideally, DFT has a Source, Transform (optional) and a Destination. Source is used to extract the data, Transform is used to transform the data whileDestination is used to Load the data to a destination. That's how ETL is implemented in DFT.
The way the Control Flow contains tasks, DFT contains Source,Transforms and destinations and they are not called Tasks as many people call them. Let list down a few examples from each:

Source:
  • OLEDB Source
  • Flat File Source
  • XML Source
  • Excel Source
  • ADO NETSource
Transformations:
  • Aggregate
  • Conditional Split
  • Derived Column
  • Fuzzy Lookup
  • Merge
  • Merge Join
  • Look up
  • Row Count
  • Sort
  • Union All
  • Script Component
  • OLEDB Command
Destinations:
  • OLEDB Destination
  • Flat File Destination
  • Excel Destination
  • Recordset Destination
  • Raw File Destination
As you can see from the list above, there are many sources & transforms, which can help you to achieve your requirement. I have selected just a few elements of DFT. There are others as well. We would talk about DFT and parallel execution later in the advanced section where we talk about the SSIS Execution tree.
In the figure below, you can have a look at how a DFT looks once we have various Sources, Transforms and Destinations attached. You would notice the Red circle on each transform, it is because they are not yet configured and SSIS is showing that this will error out if executed now.


Just remember the way we connect the tasks in Control Flow with the use of precedence constraints. Similarly, we use arrows in DFT but they are not termed as precedence constraints. Here they are available in just two colours: Green for valid records and Red in case the data is incorrect as per our setup.
Unlike Control Flow in Data Flow, the various transforms cannot have any number of inputs or outputs from them. They are predefined barring the Multicast Transform which can have any n number of outputs configured. OLEDB Source takes no input and gives one success (green) and one error output (red), we have similar rule for each source, transform and destination.








No comments:

Post a Comment