Exam DP203 Hybrid Transactional Analytical: Difference between revisions
Line 88: | Line 88: | ||
</pre> | </pre> | ||
then you can use a select statement like: | then you can use a select statement like the following, with the methods of parsing these levels given with the '''WITH''' statement in the select: | ||
<pre> | <pre> |
Revision as of 22:13, 20 November 2024
Hybrid Transactional / Analytical Processing (HTAP). This is where Azure Synapse Link replicates transactional data into an analysis data store.
From CosmosDB: Azure Synapse Link for CosmosDB
For Azure SQL Database or a SQL Server instance: Azure Synapse Link for SQL (to a dedicated pool, and a spark pool can connect to that dedicated pool).
Azure Synapse Link for Dataverse. Where Dataverse (the Power Platform database) is the OLTP database. Note that here the data goes into Gen2, not a dedicated pool for some reason.
CosmosDB
First need to Enable Azure Synapse Link in the CosmosDBb resource in the Portal by clicking a button.
Alternatively this can be done in the Azure CLI by running:
az cosmosdb update --name my-cosmos-db --resource-group my-rg --enable-analytical-storage true
or Powershell:
Update-AzCosmosDBAccount -Name "my-cosmos-db" -ResourceGroupName "my-rg" -EnableAnalyticalStorage 1
Once enabled it cannot be disabled.
Dynamic Schema Maintenance
As schema changes are made in the upstream OLTP, these get replicated down to the Analysis database. JSON.
There are two types
WellDefined: First instance of data in the JSON determines the data type
FullFidelity: Each instance of data is sent with its data type, so allowing for changes in data type. Is only used with MongoDB.
I think CosmosDB is implemented with containers. You can enable Azure Synapse Link in them either by configuring this in these containers, or by (as above) configuring it as the resource directly in the portal.
Or in the Azure CLI:
az cosmosdb sql container create --resource-group my-rg --account-name my-cosmos-db --database-name my-db --name my-container --partition-key-path "/productID" --analytical-storage-ttl -1
or Powershell:
New-AzCosmosDBSqlContainer -ResourceGroupName "my-rg" -AccountName "my-cosmos-db" -DatabaseName "my-db" -Name "my-container" -PartitionKeyKind "hash" -PartitionKeyPath "/productID" -AnalyticalStorageTtl -1
Like with the resource, this config cannot be removed without deleting the container.
Linked Service
Next, in Synapse Studio, you need to create a linked server to the CosmosDB, to be able to pull the data into Synapse. "Connect to external data".
Query from Spark
You can apparently use a Spark pool to query the CosmosDB using the linked service.
df = spark.read .format("cosmos.olap")\ .option("spark.synapse.linkedService", "my_linked_service")\ .option("spark.cosmos.container", "my-container")\ .load() display(df.limit(10))
It says "The data is loaded from the analytical store in the container, not from the operational store"
You can also write data using: mydf.write.format
and you can run SQL code on it.
Query from Serverless pool
In addition to using a Spark pool, you can also query an Azure Cosmos DB analytical container by using a built-in serverless SQL pool in Azure Synapse Analytics
SELECT * FROM OPENROWSET( 'CosmosDB', 'Account=my-cosmos-db;Database=my-db;Key=abcd1234....==', [my-container]) AS products_data
If the source JSON contains multi-level data, like:
{ "productID": 126, "productName": "Sprocket", "supplier": { "supplierName": "Contoso", "supplierPhone": "555-123-4567" } "id": "62588f072-11c3-42b1-a738-...", "_rid": "mjMaAL...==", ... }
then you can use a select statement like the following, with the methods of parsing these levels given with the WITH statement in the select:
SELECT * FROM OPENROWSET(PROVIDER = 'CosmosDB', CONNECTION = 'Account=my-cosmos-db;Database=my-db', OBJECT = 'my-container', SERVER_CREDENTIAL = 'my_credential' ) WITH ( ProductNo INT '$.productID', ProductName VARCHAR(20) '$.productName', Supplier VARCHAR(20) '$.supplier.supplierName', SupplierPhoneNo VARCHAR(15) '$.supplier.supplierPhone' ) AS products_data