Exam DP203 Course Tybul 09 Azure Data Factory
https://www.youtube.com/watch?v=nnAgdqrZZGM
Note that Azure SQL Database is not in the exam.
"SQL Databases" service in Azure is the implementation of a SQL Server in the cloud. In a server instance you can create databases. An example of PAAS - Platform as a service.
"Azure SQL Database" is the implementation of a SQL server single database in the cloud. You don't get the rest of the SQL Server there. An example of SAAS - Software as a service.
In the "Compute and storage" section when you are creating the SQL databases server, click to change the setting to "Basic" to generate a lower spec machine that is cheaper.
When you have created the SQL Server resource, you need to go into Security - Networking, and change "Public Network Access" from "Disabled" to "Selected Networks". Then further down click "Allow Azure Services and Resources" access to it.
In Data Factory Studio, which can be opened from within the Azure data factory resource in the portal, under the Manage blade, you can create a linked server to connect to a data source. Here you choose between data store and compute.
11 - Dynamic Azure Data Factory
Parameters in dataset names
12 - Integration Runtimes part 1
You have a set of virtual machines within Azure that are used to run Azure Data Factory jobs for all Azure tenants.
In order to prevent these machines from being able to access your own Azure resources, you need to create an Azure Managed VNET Integration Runtime. This I think defines a set of these machines for your own use. These machines are then bound to a new virtual network, and you give only this virtual network access to your Azure resources.
You create the integration runtime. And then you create a linked service that uses it to connect to the source database. As part of this you must also create a managed endpoint.
Then you go into the Azure SQL source resource, and in Networking there is tab "Private Access" (as opposed to "public access") and the new endpoint above can be seen. If you want this endpoint to be able to connect to the Azure SQL server, click "Approve".
Azure Managed VNET integration runtimes cost $1/hour, as opposed to the built-in integration runtimes that cost $0.005/hour.
Note that Azure VNET integration runtimes are a little slower than the built-in one.
13 - Integration Runtimes part 2
Azure self-hosted integration runtime. This is an integration runtime that is run in Azure, but within a virtual machine. So you have to create a virtual machine resource within the portal, and run it. This is a windows machine, and costs around $1/hour to run.
You RDP onto the virtual machine, and download the installer for the above runtime, and install it.
When it is installed, run it.
It will ask for the connection string to connect to the Azure SQL Database.
You can also run the self-hosted integration runtime software on an on-premise machine too.
14 - Error handling in Azure Data Factory
If you click on a task within a pipeline, for example a copy data task, under the main General tab, you can set a value for Retry, and Timeout (in seconds).
The "leaf" tasks are those at the end of the chain.
If-Else commands: If you have a task A that runs, with success going to task B, and failure to task C, then if A fails, it will go to C which will be successful. But B is not run, so it looks to the parent (A). Since that parent failed, the pipeline fails.
Try-catch commands: If you have a task A that runs, with only a on-fail going to task C, then if A fails it will go to C that succeeds. And since C is the only leaf node and was successful, the overall pipeline succeeds.
Do if skip else: In the case of If-Else, you can add another task named task D to run after task B, but using the Skip line to connect the two (as opposed to the green success or red fail or white completed line). The skip line is followed if task B is not followed by the completion of task A (i.e. because task A has failed, it takes the red line to task C). This means that task D runs. And if task D is successful, then in fact the overall pipeline succeeds.
Try catch succeed: Here we have task A that we want to go forward to task D even if it fails, but that we want it to run task C if it fails (there is no task B). This is done by adding both a on success and on skip line from task C to task D. So if A succeeds, then C will be skipped, leading to D running. But if A fails, then C runs, and on success goes on to D. Assuming that D succeeds, then the pipeline succeeds.
AND logic in common error handling: If you have three tasks A, B, and C in series, and you add the red output of each to task D to handle error handing, then if one of them fails then task D will in fact not run. The reason for this is that the red lines (all lines in fact) use AND logic - they must all apply for the route to be followed. To resolve this problem the way to set it up is to replace the three red lines with one red line from task C to D, and one "skip" line from C to D. This way if C fails it will go to D, and if A or B fails, then C will be skipped, leading it to go to D. And if A, B, C do not fail, then the pipeline will end. In all situations the pipeline succeeds (as the errors have been handled).
The same principles work with tasks that run pipelines.
Alerts
Under Monitor - Alerts and Metrics you can set up alerts for failed pipelines.
Click New Alert Rule, give it severity 2, and choose Failed Pipelines Runs Metrics from the popup list.
In the Name and Failure Type popup lists, select All.
Then accept the defaults on Alert Logic (when condition > ?) and Evaluate based on (how often to check).
Then under Configure Email/SMS/Push/Voice notification click + Configure notification. Give the new Action Group a name, and click + next to Notification, and click to add an action - to send an email or SMS.
This will give an error that the subscription is not registered to use Microsoft.Insights. To fix, run the following powershell command at the Azure command prompt:
Register-AzureRmResourceProvider -ProviderNamespace Microsoft.Insights.
The emails sent are not very user-friendly, because they don't contain much relevant information.
Within Data Factory there is no option for a task to send an email.
So to send an email with better information, you need to create a logic app within the Azure portal.
Logic Apps
Go into Logic Apps within the portal, and click to create new.
Choose a Consumption rather than Standard hosting plan (the latter you have to pay circa $145/month for CPU).
Once created, go into Development Tools - Logic App Designer
First you define the trigger - what makes it run. Search for http, and one of the search results should be: When a http request is received
In the properties, enter the following JSON code sample, to let it know in what format to expect the http request to be received:
Then click Save, and then the + button to add an action to undertake when it is triggered.