Exam DP203 Storage 3 Azure Lake Database
The third Storage type is: Azure Lake Database.
The underlying data is in Parquet or CSV files only.
Serverless SQL Pool, or Spark.
Create the Lake Database
To create a Lake Database, go into Synapse Studio Data tab, and click the + button and select Lake Database under "Workspace". This creates a new section "Lake Database" under which the database is created, under Data - Workspace.
On the right-hand side you choose the Linked Service from the pop-up menu. This needs to be an existing Linked Service to Azure Gen2 storage already configured under section Data - Linked - Azure Data Lake Storage Gen2.
In Input Folder you give a path within this linked service where the files holding the data will be held. Note it has to be an existing directory. And at the point the database is created, nothing is added to the directory to show that it is being used for a database.
In Data Format popup is set to CSV or Parquet
Create the tables
Next, you create the tables required, from the + Tables menu. This gives three choices:
- Custom: Creates a blank table ready for you to assign columns. Note the location for the underlying files will default to the /tablename subdirectory of the directory for the database (though this can be changed).
- Template: Choose a template from different business scenarios.
- From Data Lake: Where you provide the path to the files in the Linked Service. Presumably this can be different from the file path to the database?
Once the tables are created, key relationships can be added between the tables.
Select statements
You can run select statements on the tables either with the Serverless pool using standard SQL syntax like:
USE RetailDB; GO SELECT CustomerID, FirstName, LastName FROM Customer ORDER BY LastName;
or by running a Spark notebook and running:
%%sql INSERT INTO `RetailDB`.`Customer` VALUES (123, 'John', 'Yang') %%sql SELECT * FROM `RetailDB`.`Customer` WHERE CustomerID = 123
Note it may be necessary in the Azure Data Lake Storage Gen2. configuration, to have changed the authentication method from Access key, to Switch to Entra User Account).
Loading data
Following 1. above, the first way to load data is to upload a back-end CSV or Parquet file directly into the correct location for the underlying table in the Gen2 storage (the file is the table).
The key here is that the table already exists, and you are uploading a file into it at the back end.
Then after this, probably after a refresh, the table can be queried.
Create a table from an existing file
From 3. above, the table can be created from an existing file by selecting the existing file in the "From Data Lake" option.