Exam DP203 Serverless SQL Pool: Difference between revisions
No edit summary |
|||
Line 88: | Line 88: | ||
ROWTERMINATOR = '0x0b' | ROWTERMINATOR = '0x0b' | ||
) WITH (doc NVARCHAR(MAX)) as rows | ) WITH (doc NVARCHAR(MAX)) as rows | ||
</pre> | </pre>Parquet source files |
Revision as of 18:26, 16 November 2024
Serverless SQL Pool. Reads from data lake files.
In Synapse Studio Data - Linked tab, go into the Azure Data Lake Gen2 header, and drill down and right-click a file there. Right-click and select "New SQL Script".
This generates the following script:
SELECT TOP 100 * FROM OPENROWSET( BULK 'https://datalake9cxqfg2.dfs.core.windows.net/files/product_data/products.csv', FORMAT = 'CSV', PARSER_VERSION = '2.0' HEADER_ROW = TRUE ) AS result
Click the "Connect to" popup list, and select "Built-in", and run it.
"result" is the alias name of the output table.
Give the script a name on the right hand side, and click Publish. Thenafter it can be seen under the "Develop" tab.
Output the results instead as a chart by clicking the "Chart" button in the output pane, setting appropriate values in the "Category" and "Legend" popup lists.
Three file formats can be read: CSV, JSON, Parquet.
No charge for resources reserved, you're only charged for the data processed by queries you run.
Not recommended for OLTP.
you can define external objects such as tables and views in a serverless SQL database.
The BULK parameter includes the full URL to the location in the data lake containing the data files. These paths can include the * wildcard:
https://mydatalake.blob.core.windows.net/data/files/*
: All files in the files folder.https://mydatalake.blob.core.windows.net/data/files/**
: All files in the files folder, and recursively its subfolders.
FIELDTERMINATOR
ROWTERMINATOR
FIELDQUOTE - string in quotes
FIRSTROW = 2 - how many rows to skip
HEADER_ROW = TRUE (only available when using parser version 2.0)
WITH clause
Use the "WITH" clause to define the column names and datatypes:
SELECT TOP 100 * FROM OPENROWSET( BULK 'https://mydatalake.blob.core.windows.net/data/files/*.csv', FORMAT = 'csv', PARSER_VERSION = '2.0') WITH ( product_id INT, product_name VARCHAR(20) COLLATE Latin1_General_100_BIN2_UTF8, list_price DECIMAL(5,2) ) AS rows
JSON source files
If JSON files (.json) are used instead of .CSV files, use the following parameters in the above:
BULK 'https://mydatalake.blob.core.windows.net/data/files/*.json', FORMAT = 'csv', FIELDTERMINATOR ='0x0b', FIELDQUOTE = '0x0b', ROWTERMINATOR = '0x0b'
Note that FORMAT still says csv, not json.
You can run the JSON_VALUE function on the JSON output by the query, like:
SELECT JSON_VALUE(doc, '$.product_name') AS product, JSON_VALUE(doc, '$.list_price') AS price FROM OPENROWSET( BULK 'https://mydatalake.blob.core.windows.net/data/files/*.json', FORMAT = 'csv', FIELDTERMINATOR ='0x0b', FIELDQUOTE = '0x0b', ROWTERMINATOR = '0x0b' ) WITH (doc NVARCHAR(MAX)) as rows
Parquet source files