Exam DP203 Storage 3 Azure Lake Database

From MillerSql.com
Revision as of 21:41, 16 November 2024 by NeilM (talk | contribs)

The third Storage type is: Azure Lake Database.

The underlying data is in Parquet or CSV files only.

Serverless SQL Pool, or Spark.

To create a Lake Database, go into Synapse Studio Data tab, and click the + button and select Lake Database under "Workspace". This creates a new section "Lake Database" under which the database is created, under Data - Workspace.

On the right-hand side you choose the Linked Service from the pop-up menu. This needs to be an existing Linked Service to Azure Gen2 storage already configured under section Data - Linked - Azure Data Lake Storage Gen2.

In Input Folder you give a path within this linked service where the files holding the data will be held.

In Data Format popup is set to CSV or Parquet

Next, you create the tables required, from the + Tables menu. You choose whether to create a blank table ready for you to assign columns, or instead choose a template from different business scenarios.

Once the tables are created, key relationships can be added between the tables.

Select statements

You can run select statements on the tables either with the Serverless pool using standard SQL syntax like:

USE RetailDB;
GO

SELECT CustomerID, FirstName, LastName
FROM Customer
ORDER BY LastName;

or by running a Spark notebook and running:

%%sql
INSERT INTO `RetailDB`.`Customer` VALUES (123, 'John', 'Yang')

%%sql
SELECT * FROM `RetailDB`.`Customer` WHERE CustomerID = 123

Note it may be necessary in the Azure Data Lake Storage Gen2. configuration, to have changed the authentication method from Access key, to Switch to Entra User Account).