Exam DP203 Hybrid Transactional Analytical
Hybrid Transactional / Analytical Processing (HTAP). This is where Azure Synapse Link replicates transactional data into an analysis data store.
From CosmosDB: Azure Synapse Link for CosmosDB
For Azure SQL Database or a SQL Server instance: Azure Synapse Link for SQL (to a dedicated pool, and a spark pool can connect to that dedicated pool).
Azure Synapse Link for Dataverse. Where Dataverse (the Power Platform database) is the OLTP database. Note that here the data goes into Gen2, not a dedicated pool for some reason.
CosmosDB
First need to Enable Azure Synapse Link in the CosmosDBb resource in the Portal by clicking a button.
Alternatively this can be done in the Azure CLI by running:
az cosmosdb update --name my-cosmos-db --resource-group my-rg --enable-analytical-storage true
or Powershell:
Update-AzCosmosDBAccount -Name "my-cosmos-db" -ResourceGroupName "my-rg" -EnableAnalyticalStorage 1
Once enabled it cannot be disabled.
Dynamic Schema Maintenance
As schema changes are made in the upstream OLTP, these get replicated down to the Analysis database. JSON.
There are two types
WellDefined: First instance of data in the JSON determines the data type
FullFidelity: Each instance of data is sent with its data type, so allowing for changes in data type. Is only used with MongoDB.
I think CosmosDB is implemented with containers. You can enable Azure Synapse Link in them either by configuring this in these containers, or by (as above) configuring it as the resource directly in the portal.
Or in the Azure CLI:
az cosmosdb sql container create --resource-group my-rg --account-name my-cosmos-db --database-name my-db --name my-container --partition-key-path "/productID" --analytical-storage-ttl -1
or Powershell:
New-AzCosmosDBSqlContainer -ResourceGroupName "my-rg" -AccountName "my-cosmos-db" -DatabaseName "my-db" -Name "my-container" -PartitionKeyKind "hash" -PartitionKeyPath "/productID" -AnalyticalStorageTtl -1
Like with the resource, this config cannot be removed without deleting the container.
Linked Service
Next, in Synapse Studio, you need to create a linked server to the CosmosDB, to be able to pull the data into Synapse. "Connect to external data".
Query from Spark
You can apparently use a Spark pool to query the CosmosDB using the linked service.
df = spark.read .format("cosmos.olap")\ .option("spark.synapse.linkedService", "my_linked_service")\ .option("spark.cosmos.container", "my-container")\ .load() display(df.limit(10))
It says "The data is loaded from the analytical store in the container, not from the operational store"
You can also write data using: mydf.write.format
and you can run SQL code on it.
Query from Serverless pool
In addition to using a Spark pool, you can also query an Azure Cosmos DB analytical container by using a built-in serverless SQL pool in Azure Synapse Analytics
SELECT * FROM OPENROWSET( 'CosmosDB', 'Account=my-cosmos-db;Database=my-db;Key=abcd1234....==', [my-container]) AS products_data