Exam DP203 Storage 3 Azure Lake Database

From MillerSql.com
Revision as of 21:33, 16 November 2024 by NeilM (talk | contribs)

The third Storage type is: Azure Lake Database.

The underlying data is in Parquet or CSV files only.

Serverless SQL Pool, or Spark.

To create a Lake Database, go into Synapse Studio Data tab, and click the + button and select Lake Database under "Workspace". This creates a new section "Lake Database" under which the database is created, under Data - Workspace.

On the right-hand side you choose the Linked Service from the pop-up menu. This needs to be an existing Linked Service to Azure Gen2 storage already configured under section Data - Linked - Azure Data Lake Storage Gen2.

In Input Folder you give a path within this linked service where the files holding the data will be held. Data Format popup is set to CSV or Parquet

In the design section, you can choose a template from different business scenarios, or choose to create it as empty. Then after you can add tables to it, including key relationships between tables.

You can run select statements on the tables either with the Serverless pool using standard SQL syntax like:

USE RetailDB;
GO

SELECT CustomerID, FirstName, LastName
FROM Customer
ORDER BY LastName;

or by running a Spark notebook and running:

%%sql
INSERT INTO `RetailDB`.`Customer` VALUES (123, 'John', 'Yang')

%%sql
SELECT * FROM `RetailDB`.`Customer` WHERE CustomerID = 123

Note it may be necessary in the Azure Data Lake Storage Gen2. configuration, to have changed the authentication method from Access key, to Switch to Entra User Account).