Please wait, loading...

 
What is Data Factory in Microsoft Fabric

Microsoft Fabric is an end-to-end data analytics solution in the cloud, and one of its workloads is called Data Factory. In this article, you will learn what Data Factory is, how it works with the rest of Microsoft Fabric, and what are elements and functions of Data Factory.

Video

Microsoft Fabric

To understand Data Factory, it is best to understand Microsoft Fabric first. Microsoft Fabric is an end-to-end Data Analytics software-as-a-service offering from Microsoft. Microsoft Fabric combined some products and services to cover an end-to-end and easy-to-use platform for data analytics. Here are the components (also called workloads) of Microsoft Fabric.

Microsoft Fabric

To learn more about Microsoft Fabric and enable it in your organization, I recommend reading the articles below;

Data Factory Origin

Microsoft Fabric has a workload for Data Integration. Any end-to-end data analytics system should have a data integration component. Microsoft has been a strong data integration tool and service leader for decades. This started with SQL Server tools such as DTS (Data Transformation Service) and SSIS (SQL Server Integration Services) and then stepped into cloud-based technologies such as ADF (Azure Data Factory). Microsoft also used a data transformation engine that first targeted citizen data analysts called Power Query.

Data Factory is the data integration component of Microsoft Fabric which brings the power of Azure Data Factory and Power Query Dataflows into one place. For many years, we had these two technologies doing data transformations separately. But now, these two are combined under Fabric, called Data Factory.

Power Query

Power Query Dataflows was first announced a few years ago as an additional component to Power BI for data transformation as a cloud technology that is simple to use for data analysts. But soon, it became more than just for Power BI; it became Power Platform Dataflows. These days, Power Query Dataflows are used for data transformations in Power BI projects and data migration in Power Apps projects.

Power Query

Although Power Query Dataflows is also on the dataflow side, it needed some enhancements on scalability and the control of execution with some control flow elements (such as loop structures, conditional execution, etc.).

Azure Data Factory

Azure Data Factory came into the market many years ago as the next generation of SSIS for in-the-cloud ETL. However, the data transformation engine of Azure Data Factory was not built on a strong basis, so most of the time, ADF was used for data ingestion, and then with the help of SQL stored procedures, etc., for doing the transformation afterward. ADF was not a tool for citizen data analysts. It was instead for data engineers and developers. ADF used data pipelines to execute a group of activities as a flow, and among those activities, there were tasks such as copy data, running a stored procedure, etc.

Azure Data Factory. Image sourced from: https://learn.microsoft.com/en-us/azure/data-factory/introduction

For the past few years, we have always had this split; If you wanted a simple-to-use data transformation engine but not much data, use Power Query Dataflows. If you want scalable data ingestion, then use Azure Data Factory.

Best of Both Worlds

Now in Microsoft Fabric, We combine the best things from Power Query Dataflows and Azure Data Factory Data Pipelines into one stream: Data Factory. Data Factory ensures that you still have a simple-to-use and powerful transformation engine of Power Query for data transformation, but on the other hand, you also have the scalability of Data Pipelines and can build a control flow for execution of the ETL using the Data Pipelines. In other words, Data Factory is a state-of-the-art ETL software-as-a-service offering for Microsoft Fabric.

Data Factory in Microsoft Fabric combines Azure Data Factory and Power Query Dataflows together.

Elements of Data Factory

Combining these two services brings great features that make the Data Factory an ultimate ETL service. Here are some of those below;

Data Connectors

For an ETL (Extract, Transform, Load) system, one of the most important aspects is what sources the data can be fetched from. Data Factory offers hundreds of data connectors, enabling you to get data from sources such as databases, files, folders, software-as-a-service systems, etc.

Data Factory Connectors

It is also possible to create your connector if you are keen.

Dataflows

Dataflows are the heart of Data Factory. This is where you get the data from the sources, define the data transformation and prepare it in any shape needed, and finally load it into destinations. Dataflows use the Power Query data transformation engine and the user interface for creating it using the simple-to-use Power Query Editor online.

Dataflow

Power Query Editor online is not only powerful in the graphical interface, it also enables the developer to write code in M language, which is the data transformation language for Power Query.

Power Query Editor online

To learn more about Dataflows, I suggest reading my article below.

Dataflows support a few destinations at the time of writing this article which are;

  • Azure Data Explorer (Kusto)
  • Azure SQL Database
  • Data Warehouse
  • Lakehouse

Data Pipelines

Although Dataflows are the main ETL component of the Data Factory, they can be enhanced when wrapped by a control flow execution component. This control flow execution component is called Data Pipeline. A Data Pipeline is a group of activities (or tasks) defined by a particular flow of execution. The activities in a Pipeline can involve copying data, running a Dataflow, executing a stored procedure, looping until a certain condition is met, or executing a particular set of activities if a condition is met, etc.

Data Pipeline

Data Pipelines can then be scheduled, and there is a monitoring tool to check the execution stage of the pipeline in addition to the activity-state-outputs where you can define what happens if a certain task fails or succeeds.

As mentioned, one of the most important activities that can be done in a Pipeline is the execution of a Dataflow. This is where Dataflows and Data Pipelines work together in their best way.

Executing Dataflows from Data Pipeline

To learn more about Data Pipelines, read my article below;

Summary

Data Factory is an ETL-in-cloud solution that is the data integration workload of Microsoft Fabric. Data Factory is not a new product or service; it comes from many years of Microsoft data transformation tools and services. It is built on top of Power Query and Azure Data Factory. Data Factory uses two main components to deliver the best ETL scenarios possible; Dataflows and Data Pipelines. Dataflows are for the main get data, transform, and load process, and the Data Pipeline can control the rest of the execution with control flow activities.

I highly recommend reading the articles below to study more about Data Factory;

The post What is Data Factory in Microsoft Fabric appeared first on RADACAD.

Follow Source

https://i0.wp.com/microsoftdynamics.in/wp-content/uploads/2020/04/Microsoftdynamics365.png?fit=640%2C651
Microsoft Dynamics Community Profile

Learn more