Exam DP203 Synapse Data Pipelines: Difference between revisions
Line 37: | Line 37: | ||
Then the "Lookup Conditions" section has two popup menus where you select the join columns in each source. | Then the "Lookup Conditions" section has two popup menus where you select the join columns in each source. | ||
Click OK, and the Lookup object will appear to the right of the two source stream objects. | |||
Next we will edit a column value. This is done by clicking the + button to the bottom right of the lookup object, and selecting item '''Alter Row'''. This allows you to alter a row (how?) with commands that are selected from a popup list like "insert if", "update if" and a condition. |
Revision as of 12:10, 18 November 2024
Data Pipelines in Synapse Studio
In the Synapse Studio Integrate tab.
integration runtime
For pipelines to be able to do anything, you have to define Linked Services to the resources they will need to access.
To create a pipeline, click the + button and select "Pipeline"
When you create a pipeline in Synapse Studio, like in SSIS the Control Flow items are shown graphically when they are created, with green and red arrows stating where to go next on success, failure, or completion.
Once the pipeline has been created, click the curly brackets on the top right {} to display the JSON code for it. The template can also be generated from supplied JSON code (this is done by clicking the + button and selecting "Import from Pipeline Template" (it needs the JSON file as a .zip).
Source, Transformation, Sink
Triggers to run the pipeline. Immediate, Scheduled, In response to an event
The Monitor tab of Synapse Studio shows the run progress of a pipeline. This progress information is used by Microsoft Purview to inform it about data lineage.
Create Pipeline
Under Activities - Move and Transform, select Data Flow. (another option is Copy Data). Name it "LoadProducts".
On the Settings tab of this data flow object, to the right of the "Data Flow" popup list, click + New. This opens a new window for this data flow.
In this window, on the top left, click the greyed-out "Add Source" graphic to add a new Source Stream object. Then at the bottom click the + New button to select a data source. Search for "Gen2" to find the Gen2 type. Click CSV. Then in the "Linked Service" popup, select the existing linked service to the Gen2.
Then click the browse button to choose the source file here - which puts in the path to products.csv and click OK to go back to the dataflow window.
With the new source stream object selected, give it a name (output stream name) of "ProductsText", verify that "Source Type" is set to "Integration Dataset" (as opposed to "Inline" or "Workspace DB"), and that the "Dataset" popup list is correctly set to the dataset just created (for products.csv).
Next click the Projection tab for the source object. This lists the names of the columns, with the datatypes as popup lists.
Next create a second source stream object for ProductCategories in the same way as above. This will sit alongside the first one.
Next these two tables can be joined using a Lookup object. To do this, on the graphic for the Products source stream, click the + button on the bottom right, and select "Lookup". In the config for this, there are pop-up lists for the "Primary Stream" and "Lookup Stream". Set each to the Products and ProductCategory respectively.
Then the "Lookup Conditions" section has two popup menus where you select the join columns in each source.
Click OK, and the Lookup object will appear to the right of the two source stream objects.
Next we will edit a column value. This is done by clicking the + button to the bottom right of the lookup object, and selecting item Alter Row. This allows you to alter a row (how?) with commands that are selected from a popup list like "insert if", "update if" and a condition.