Exam DP203 Hybrid Transactional Analytical

From MillerSql.com

Hybrid Transactional / Analytical Processing (HTAP). This is where Azure Synapse Link replicates transactional data into an analysis data store.

From CosmosDB: Azure Synapse Link for CosmosDB

For Azure SQL Database or a SQL Server instance: Azure Synapse Link for SQL (to a dedicated pool, and a spark pool can connect to that dedicated pool).

Azure Synapse Link for Dataverse. Where Dataverse (the Power Platform database) is the OLTP database. Note that here the data goes into Gen2, not a dedicated pool for some reason.

CosmosDB

First need to Enable Azure Synapse Link in the CosmosDBb resource in the Portal by clicking a button.

Alternatively this can be done in the Azure CLI by running:

az cosmosdb update --name my-cosmos-db --resource-group my-rg --enable-analytical-storage true

or Powershell:

Update-AzCosmosDBAccount -Name "my-cosmos-db" -ResourceGroupName "my-rg" -EnableAnalyticalStorage 1

Once enabled it cannot be disabled.

Dynamic Schema Maintenance

As schema changes are made in the upstream OLTP, these get replicated down to the Analysis database. JSON.

There are two types

WellDefined: First instance of data in the JSON determines the data type

FullFidelity: Each instance of data is sent with its data type, so allowing for changes in data type. Is only used with MongoDB.

I think CosmosDB is implemented with containers. You can enable Azure Synapse Link in them either by configuring this in these containers, or by (as above) configuring it as the resource directly in the portal.

Or in the Azure CLI:

az cosmosdb sql container create --resource-group my-rg --account-name my-cosmos-db --database-name my-db --name my-container --partition-key-path "/productID" --analytical-storage-ttl -1

or Powershell:

New-AzCosmosDBSqlContainer -ResourceGroupName "my-rg" -AccountName "my-cosmos-db" -DatabaseName "my-db" -Name "my-container" -PartitionKeyKind "hash" -PartitionKeyPath "/productID" -AnalyticalStorageTtl -1

Like with the resource, this config cannot be removed without deleting the container.

Linked Service

Next, in Synapse Studio, you need to create a linked server to the CosmosDB, to be able to pull the data into Synapse. "Connect to external data".

Query from Spark

You can apparently use a Spark pool to query the CosmosDB using the linked service.

 df = spark.read
     .format("cosmos.olap")\
     .option("spark.synapse.linkedService", "my_linked_service")\
     .option("spark.cosmos.container", "my-container")\
     .load()

display(df.limit(10))