Exam DP203 Serverless SQL Pool: Difference between revisions

Revision as of 18:26, 16 November 2024

Serverless SQL Pool. Reads from data lake files.

In Synapse Studio Data - Linked tab, go into the Azure Data Lake Gen2 header, and drill down and right-click a file there. Right-click and select "New SQL Script".

This generates the following script:

SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://datalake9cxqfg2.dfs.core.windows.net/files/product_data/products.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0'
        HEADER_ROW = TRUE
    ) AS result

Click the "Connect to" popup list, and select "Built-in", and run it.

"result" is the alias name of the output table.

Give the script a name on the right hand side, and click Publish. Thenafter it can be seen under the "Develop" tab.

Output the results instead as a chart by clicking the "Chart" button in the output pane, setting appropriate values in the "Category" and "Legend" popup lists.

Three file formats can be read: CSV, JSON, Parquet.

No charge for resources reserved, you're only charged for the data processed by queries you run.

Not recommended for OLTP.

you can define external objects such as tables and views in a serverless SQL database.

The BULK parameter includes the full URL to the location in the data lake containing the data files. These paths can include the * wildcard:

https://mydatalake.blob.core.windows.net/data/files/*: All files in the files folder.
https://mydatalake.blob.core.windows.net/data/files/**: All files in the files folder, and recursively its subfolders.

FIELDTERMINATOR

ROWTERMINATOR

FIELDQUOTE - string in quotes

FIRSTROW = 2 - how many rows to skip

HEADER_ROW = TRUE (only available when using parser version 2.0)

WITH clause

Use the "WITH" clause to define the column names and datatypes:

SELECT TOP 100 *
FROM OPENROWSET(
    BULK 'https://mydatalake.blob.core.windows.net/data/files/*.csv',
    FORMAT = 'csv',
    PARSER_VERSION = '2.0')
WITH (
    product_id INT,
    product_name VARCHAR(20) COLLATE Latin1_General_100_BIN2_UTF8,
    list_price DECIMAL(5,2)
) AS rows

JSON source files

If JSON files (.json) are used instead of .CSV files, use the following parameters in the above:

        BULK 'https://mydatalake.blob.core.windows.net/data/files/*.json',
        FORMAT = 'csv',
        FIELDTERMINATOR ='0x0b',
        FIELDQUOTE = '0x0b',
        ROWTERMINATOR = '0x0b'

Note that FORMAT still says csv, not json.

You can run the JSON_VALUE function on the JSON output by the query, like:

SELECT JSON_VALUE(doc, '$.product_name') AS product,
           JSON_VALUE(doc, '$.list_price') AS price
FROM
    OPENROWSET(
        BULK 'https://mydatalake.blob.core.windows.net/data/files/*.json',
        FORMAT = 'csv',
        FIELDTERMINATOR ='0x0b',
        FIELDQUOTE = '0x0b',
        ROWTERMINATOR = '0x0b'
    ) WITH (doc NVARCHAR(MAX)) as rows

Parquet source files