Exam DP203 SQL Management

From MillerSql.com

SQL Pool management

In SQL pools, the unit of scale is an abstraction of compute power that is known as a DWU - data warehouse unit

You can scale a Synapse SQL dedicated pool through:

  1. Azure portal (click on Scale, then drag a slider)
  2. Synapse Studio (also right-click- scale + slider)
  3. TSQL: ALTER DATABASE mySampleDataWarehouse MODIFY (SERVICE_OBJECTIVE = 'DW300c')
  4. PowerShell: Set-AzSqlDatabase -ResourceGroupName "resourcegroupname" -DatabaseName "mySampleDataWarehouse" -ServerName "sqlpoolservername" -RequestedServiceObjectiveName "DW300c"

Spark Pools

To do the same with Spark pools, you can click "Scale" in Synapse Studio.

Spark Pool Autoscale

"Auto Scale Settings" in the Azure portal, or "scale" in Synapse Studio.

AutoScale collects the following metrics as it runs:

Total Pending CPU The total number of cores required to start execution of all pending nodes.
Total Pending Memory The total memory (in MB) required to start execution of all pending nodes.
Total Free CPU The sum of all unused cores on the active nodes.
Total Free Memory The sum of unused memory (in MB) on the active nodes.
Used Memory per Node The load on a node. A node on which 10 GB of memory is used, is considered under more load than a worker with 2 GB of used memory.

The following conditions will then autoscale the memory or CPU:

Scale-up Scale-down
Total pending CPU is greater than total free CPU for more than 1 minute. Total pending CPU is less than total free CPU for more than 2 minutes.
Total pending memory is greater than total free memory for more than 1 minute. Total pending memory is less than total free memory for more than 2 minutes.

Fundamentally, scale up and down occur when the pending (required) CPU and memory is too small or too large compared with the free memory.

Pool pausing

To save money, pause SQL pools and Spark pools that are not being used. This can be done in Synapse Studio for both SQL and Spark pools, or in the portal for SQL pools. Or you can set up autopause to pause after 15 minutes of idleness.

Workload prioritization

You can set the following properties on workloads, in order to allow the system to choose which ones to prioritize if there is more than one running at the time:

  • Workload Classification
  • Workload Importance
  • Workload Isolation

To create a workload group, run:

    Create WORKLOAD GROUP CEODemo WITH  
    ( MIN_PERCENTAGE_RESOURCE = 50        -- integer value
    ,REQUEST_MIN_RESOURCE_GRANT_PERCENT = 25 --  
    ,CAP_PERCENTAGE_RESOURCE = 100
    )

This creates workload group CEODemo, stating that anything that runs under it, will have the stated resources at their disposal.

Next, to create a Workload Classifier, setting the importance of a login, run::

CREATE WORKLOAD CLASSIFIER CEO
  WITH (WORKLOAD_GROUP = 'CEODemo'
  ,MEMBERNAME = 'asa.sql.workload01',IMPORTANCE = High);

This creates classifier "CEO" that applies to login "asa.sql.workload01", giving queries run by this login a high importance, and assignment to the CEODemo workload group.

This means that when that login runs queries, then a select (in a different connection) on sys.dm_pdw_exec_requests shows them as importance=high - as well as the query running with the resources given to it by the Workload group.

Other

[Max Concurrency] = [CAP_PERCENTAGE_RESOURCE] / [REQUEST_MIN_RESOURCE_GRANT_PERCENT]

Azure Advisor

Click Advisor in the Azure Portal, to open the Advisor app. It displays tiles giving information on the following:

  • Cost
  • Security
  • Reliability
  • Operational excellence
  • Performance

You can click on each tile to get more information and recommendations, for example to create statistics.on table columns. View impacted tables.

It works on telemetry data:

  • Data Skew and replicated table information.
  • Column statistics data.
  • TempDB utilization data.
  • Adaptive Cache.

and is updated every 24 hours.

Dynamic Management Views

DMVs to be queried.

sys.dm_pdw_exec_sessions - who is logged in

sys.dm_pdw_exec_requests (request_id) - who is running stuff.

sys.dm_pdw_request_steps - get the distributed SQL (DSQL) plan for the Request_id. When a DSQL plan is taking longer than expected, the cause can be a complex plan, so consider optimizing your table distributions to reduce data movement.

sys.dm_pdw_sql_requests - execution information of the query steps

DBCC PDW_SHOWEXECUTIONPLAN(1, 78); - show query plans.

sys.dm_pdw_dms_workers - information about all the workers completing a Data Movement Step. This might show if there is "data skew", often caused by large numbers of nulls, that result in a concentration of data on single nodes and not spread out.

Dynamic Management Views (DMV) only contains 10,000 rows of data, so on heavily-used tables the data might be out of date after minutes.

  • Monitoring waits
  • Monitoring tempdb
  • Monitoring memory
  • Monitoring transaction log
  • Monitoring PolyBase