Exam DP203 Hybrid Transactional Analytical: Difference between revisions

From MillerSql.com
NeilM (talk | contribs)
NeilM (talk | contribs)
Line 88: Line 88:
</pre>
</pre>


then you can use a select statement like:
then you can use a select statement like the following, with the methods of parsing these levels given with the '''WITH''' statement in the select:


<pre>
<pre>

Revision as of 22:13, 20 November 2024

Hybrid Transactional / Analytical Processing (HTAP). This is where Azure Synapse Link replicates transactional data into an analysis data store.

From CosmosDB: Azure Synapse Link for CosmosDB

For Azure SQL Database or a SQL Server instance: Azure Synapse Link for SQL (to a dedicated pool, and a spark pool can connect to that dedicated pool).

Azure Synapse Link for Dataverse. Where Dataverse (the Power Platform database) is the OLTP database. Note that here the data goes into Gen2, not a dedicated pool for some reason.

CosmosDB

First need to Enable Azure Synapse Link in the CosmosDBb resource in the Portal by clicking a button.

Alternatively this can be done in the Azure CLI by running:

az cosmosdb update --name my-cosmos-db --resource-group my-rg --enable-analytical-storage true

or Powershell:

Update-AzCosmosDBAccount -Name "my-cosmos-db" -ResourceGroupName "my-rg" -EnableAnalyticalStorage 1

Once enabled it cannot be disabled.

Dynamic Schema Maintenance

As schema changes are made in the upstream OLTP, these get replicated down to the Analysis database. JSON.

There are two types

WellDefined: First instance of data in the JSON determines the data type

FullFidelity: Each instance of data is sent with its data type, so allowing for changes in data type. Is only used with MongoDB.

I think CosmosDB is implemented with containers. You can enable Azure Synapse Link in them either by configuring this in these containers, or by (as above) configuring it as the resource directly in the portal.

Or in the Azure CLI:

az cosmosdb sql container create --resource-group my-rg --account-name my-cosmos-db --database-name my-db --name my-container --partition-key-path "/productID" --analytical-storage-ttl -1

or Powershell:

New-AzCosmosDBSqlContainer -ResourceGroupName "my-rg" -AccountName "my-cosmos-db" -DatabaseName "my-db" -Name "my-container" -PartitionKeyKind "hash" -PartitionKeyPath "/productID" -AnalyticalStorageTtl -1

Like with the resource, this config cannot be removed without deleting the container.

Linked Service

Next, in Synapse Studio, you need to create a linked server to the CosmosDB, to be able to pull the data into Synapse. "Connect to external data".

Query from Spark

You can apparently use a Spark pool to query the CosmosDB using the linked service.

 df = spark.read
     .format("cosmos.olap")\
     .option("spark.synapse.linkedService", "my_linked_service")\
     .option("spark.cosmos.container", "my-container")\
     .load()

display(df.limit(10))

It says "The data is loaded from the analytical store in the container, not from the operational store"

You can also write data using: mydf.write.format

and you can run SQL code on it.

Query from Serverless pool

In addition to using a Spark pool, you can also query an Azure Cosmos DB analytical container by using a built-in serverless SQL pool in Azure Synapse Analytics

SELECT *
FROM OPENROWSET(​
    'CosmosDB',
    'Account=my-cosmos-db;Database=my-db;Key=abcd1234....==',
    [my-container]) AS products_data

If the source JSON contains multi-level data, like:

{
    "productID": 126,
    "productName": "Sprocket",
    "supplier": {
        "supplierName": "Contoso",
        "supplierPhone": "555-123-4567"
    }
    "id": "62588f072-11c3-42b1-a738-...",
    "_rid": "mjMaAL...==",
    ...
}

then you can use a select statement like the following, with the methods of parsing these levels given with the WITH statement in the select:

SELECT *
 FROM OPENROWSET(PROVIDER = 'CosmosDB',
                 CONNECTION = 'Account=my-cosmos-db;Database=my-db',
                 OBJECT = 'my-container',
                 SERVER_CREDENTIAL = 'my_credential'
 )
 WITH (
    ProductNo INT '$.productID',
    ProductName VARCHAR(20) '$.productName',
    Supplier VARCHAR(20) '$.supplier.supplierName',
    SupplierPhoneNo VARCHAR(15) '$.supplier.supplierPhone'
 ) AS products_data