Exam DP203 Serverless SQL Pool
Serverless SQL Pool. Reads from data lake files.
In Synapse Studio Data - Linked tab, go into the Azure Data Lake Gen2 header, and drill down and right-click a file there. Right-click and select "New SQL Script".
This generates the following script:
SELECT TOP 100 * FROM OPENROWSET( BULK 'https://datalake9cxqfg2.dfs.core.windows.net/files/product_data/products.csv', FORMAT = 'CSV', PARSER_VERSION = '2.0' HEADER_ROW = TRUE ) AS result
Click the "Connect to" popup list, and select "Built-in", and run it.
"result" is the alias name of the output table.
Give the script a name on the right hand side, and click Publish. Thenafter it can be seen under the "Develop" tab.
Output the results instead as a chart by clicking the "Chart" button in the output pane, setting appropriate values in the "Category" and "Legend" popup lists.
Three file formats can be read: CSV, JSON, Parquet.
No charge for resources reserved, you're only charged for the data processed by queries you run.
Not recommended for OLTP.
you can define external objects such as tables and views in a serverless SQL database.
The BULK parameter includes the full URL to the location in the data lake containing the data files. These paths can include the * wildcard:
https://mydatalake.blob.core.windows.net/data/files/*
: All files in the files folder.https://mydatalake.blob.core.windows.net/data/files/**
: All files in the files folder, and recursively its subfolders.
FIELDTERMINATOR
ROWTERMINATOR
FIELDQUOTE - string in quotes
FIRSTROW = 2 - how many rows to skip
HEADER_ROW = TRUE (only available when using parser version 2.0)
WITH clause
Use the "WITH" clause to define the column names and datatypes:
SELECT TOP 100 * FROM OPENROWSET( BULK 'https://mydatalake.blob.core.windows.net/data/files/*.csv', FORMAT = 'csv', PARSER_VERSION = '2.0') WITH ( product_id INT, product_name VARCHAR(20) COLLATE Latin1_General_100_BIN2_UTF8, list_price DECIMAL(5,2) ) AS rows
JSON source files
If JSON files (.json) are used instead of .CSV files, use the following parameters in the above:
BULK 'https://mydatalake.blob.core.windows.net/data/files/*.json', FORMAT = 'csv', FIELDTERMINATOR ='0x0b', FIELDQUOTE = '0x0b', ROWTERMINATOR = '0x0b'
Note that FORMAT still says csv, not json.
You can run the JSON_VALUE function on the JSON output by the query, like:
SELECT JSON_VALUE(doc, '$.product_name') AS product, JSON_VALUE(doc, '$.list_price') AS price FROM OPENROWSET( BULK 'https://mydatalake.blob.core.windows.net/data/files/*.json', FORMAT = 'csv', FIELDTERMINATOR ='0x0b', FIELDQUOTE = '0x0b', ROWTERMINATOR = '0x0b' ) WITH (doc NVARCHAR(MAX)) as rows
Parquet source files
SELECT TOP 100 * FROM OPENROWSET( BULK 'https://mydatalake.blob.core.windows.net/data/files/*.*', FORMAT = 'parquet') AS rows
Partitioning Parquet files
If the data in the Parquet source files is partitioned into multiple files in multiple directories, it can be referenced in the following manner - which specific parameter directories can be referenced:
SELECT * FROM OPENROWSET( BULK 'https://mydatalake.blob.core.windows.net/data/orders/year=*/month=*/*.*', FORMAT = 'parquet') AS orders WHERE orders.filepath(1) = '2020' AND orders.filepath(2) IN ('1','2');
Create Database
CREATE DATABASE SalesDB COLLATE Latin1_General_100_BIN2_UTF8
External Data Source
Instead of putting the full path into the BULK clause every time you reference a source file as a table, you can create an External Data Source with the path:
CREATE EXTERNAL DATA SOURCE files WITH ( LOCATION = 'https://mydatalake.blob.core.windows.net/data/files/' )
and then run the main select as:
SELECT * FROM OPENROWSET( BULK 'orders/*.csv', DATA_SOURCE = 'files', FORMAT = 'csv', PARSER_VERSION = '2.0' ) AS orders
where the "files" datasource is referenced. And the BULK parameter only then needs to have the relative path to the file(s).
Scoped Credentials
CREATE DATABASE SCOPED CREDENTIAL sqlcred WITH IDENTITY='SHARED ACCESS SIGNATURE', SECRET = 'sv=xxx...'; GO CREATE EXTERNAL DATA SOURCE secureFiles WITH ( LOCATION = 'https://mydatalake.blob.core.windows.net/data/secureFiles/' CREDENTIAL = sqlcred ); GO