Exam DP203 Synapse Analytics Security
Synapse Analytics Security
Firewall rules
Determines what type of traffic a client IP address has to the Synapse workspace. Applies to all public endpoints. Create rules of IP address ranges in the Azure Portal properties window of the workspace.
Make sure that the firewall on your network and local computer allows outgoing communication on TCP ports 80, 443 and 1443 for Synapse Studio.
Also, you need to allow outgoing communication on UDP port 53 for Synapse Studio. To connect using tools such as SSMS and Power BI, you must allow outgoing communication on TCP port 1433.
Virtual Networks
Azure Virtual Network (VNet). Allows Azure resources to communicate with other Azure resources, local machine, and on-premise networks.
Managed workspace Virtual Network is managed by Synapse Analytics, and can only be enabled at the point the workspace is created (a checkbox). It offers the following:
- No need to configure inbound network security groups, nor subnets for Spark.
- Prevents data exfiltration.
- Allows you to create Managed private endpoints.
- Provides user-level isolation for Spark activities because each Spark cluster is in its own subnet.
Dedicated SQL pool and serverless SQL pool are multi-tenant capabilities and therefore reside outside of the Managed workspace - to communicate between these, use Azure private links.
Private Endpoints
In Synapse Studio, in the Manage tab, click on "Private Endpoints" on the left hand side. This allows you to set private endpoints up.
Only available if the workspace is enabled for Managed Workspace Virtual Network. You must create the virtual network first.
Allows connection to other Azure services, e.g. Azure Cosmos DB, over a "private link". Note that traffic doesn't go outside Microsoft's backbone network.
Private endpoints are mapped to a specific resource in Azure and not the entire service.
Conditional Access - Microsoft Entra ID
In the Azure Portal, select Microsoft Entra ID.
In here select Conditional Access. Me: This option doesn't seem to be available.
In the Conditional Access-Policies blade, click New policy, provide a name, and then click Configure rules
Under Assignments choose the users/groups the policy should apply to.
Then under cloud apps choose the Azure services the policy will apply to. For example Azure SQL Database. (In the case of Azure SQL Database, you might have to add the user as a SQL login).
Then under Access Control, select whether to grant or block access, and also which policies should apply. For example "Require Multifactor Authentication", or "Require domain-joined devices".
Azure services must authenticate with each other.
Types of security
Microsoft Entra ID. Accounts for users and groups in Entra.
Managed Identities. These are accounts for services in Entra.
SQL Authentication: Username and password created in dedicated SQL pools.
Storage Account Keys a single key is generated for each storage account. If you have this key you can access the storage account. Rather than sending out this key, it is better to keep it in the Azure key vault, and give out access to the vault instead. Portal - storage account - settings - keys.
Shared access signatures. A manually created code to send to external accounts to allow access. The client can append it to a URL.Service-level vs account-level shared access signatures.
The Synapse workspace managed identity needs permissions to perform operations in the pipelines.
You can retrieve the managed identity in the Azure portal in Overview - Managed Identity Object
When you go to create a Linked Service in Synapse Studio, pointing say to Gen2 storage, you select the Azure subscription from the popup list, and then it will tell you what the Managed Identity is, allowing you to select it.
SSMS can connect to Synapse using Active Directory Universal Authentication
Database column-level security
Dedicated SQL pools are in scope for column and row level security. The pool must be registered with AAD - Azure Active Directory, and you must have contributor access to the underlying storage account.
Run SQL command:
GRANT SELECT ON Membership(MemberID, FirstName, LastName, Phone, Email) TO TestUser;
to grant select permission on the Membership table, those listed columns only.
Alternatively create a view with the columns, and give the user access to it.
Database row-level security
CREATE SECURITY POLICY[!INCLUDEtsql] statement.
Table-value functions; Filter predicates, not block predicates.
The user must have the SELECT or EXECUTE permission on the predicate function (table function?). If SCHEMABINDING = OFF they also need access to the underlying table.
It is a good idea to create a new schema to be used with these permissions, in order to distinguish them from permissions on the underlying tables.
ALTER ANY SECURITY POLICY permission should only be intended for highly privileged users.
Try to avoid recursion in predicate functions, as it is bad for performance.
Create Login. Create User.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<user_password>'; CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Service Identity'; CREATE EXTERNAL DATA SOURCE ext_datasource_with_abfss WITH (TYPE = hadoop, LOCATION = 'abfss://<file_system_name@storage_account>.dfs.core.windows.net', CREDENTIAL = msi_cred); CREATE EXTERNAL FILE FORMAT MSIFormat WITH (FORMAT_TYPE=DELIMITEDTEXT); CREATE EXTERNAL TABLE Sales_ext WITH (LOCATION='<your_table_name>', DATA_SOURCE=ext_datasource_with_abfss, FILE_FORMAT=MSIFormat, REJECT_TYPE=Percentage, REJECT_SAMPLE_VALUE=100, REJECT_VALUE=100) AS SELECT * FROM sales;
In the following, the table function I think runs on each row, returning 1 if the SalesRep column value is equal to the currently logged in user.
CREATE SCHEMA Security; GO CREATE FUNCTION Security.fn_securitypredicate(@SalesRep AS sysname) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT 1 AS fn_securitypredicate_result WHERE @SalesRep = USER_NAME() OR USER_NAME() = 'Manager'; CREATE SECURITY POLICY SalesFilter_ext ADD FILTER PREDICATE Security.fn_securitypredicate(SalesRep) ON dbo.Sales_ext WITH (STATE = ON); ALTER SECURITY POLICY SalesFilter_ext WITH (STATE = OFF);
Dynamic Data Masking
Azure SQL Managed Instance?
This is configured in the dedicated SQL pool.
In the portal, go to the dedicated SQL Pool resource, which is one of the resources in the resource group. In here, under the "Security" menu on the left hand side, select Dynamic Data Masking.
Here you add masking rules. Where the column to be masked is given, and the masking function to be used (say a credit card number - can only see the last 4 digits). Need to set the masking policy for the database, and then the individual rules.
Get-AzSqlDatabaseDataMaskingPolicy is the powershell cmdlet to be used (give the resourcegroup name as well). Also Set...
Get-AzSqlDatabaseDataMaskingRule
You can also GET or PUT a masking request via an API call:
PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Sql/servers/{serverName}/databases/{databaseName}/dataMaskingPolicies/Default/rules/{dataMaskingRuleName}?api-version=2021-06-01
Transparent Data Encryption - TDE
Uses a Database Encryption Key (DEK). This key is either service-managed (Microsoft keeps the key secret), or customer-managed (key saved into the Azure key vault).
Server-level. Same key is used in the destination of geo-replication.
This managed key never leaves the key vault. The TDE Protector can be generated by the key vault.
You can audit and report on all the TDE protectors in the vault.
In the portal, go to the dedicated SQL Pool resource, which is one of the resources in the resource group. In here, under the "Security" menu on the left hand side, select Transparent Data Encryption.
If you move an encrypted database, it stays encrypted. If you instead export it, it is unencrypted into a BACPAC file.
When you want to access files from the Azure Data Lake Storage Gen 2 within your Azure Synapse Analytics Workspace, it uses AAD passthrough for the authentication. Therefore, there is no need to use TokenLibrary. However, to connect to other linked services, you are enabled to make a direct call to the TokenLibrary.
from pyspark.sql import SparkSession sc = SparkSession.builder.getOrCreate() token_library = sc._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary connection_string = token_library.getConnectionString("<LINKED SERVICE NAME>") print(connection_string) or from pyspark.sql import SparkSession sc = SparkSession.builder.getOrCreate() token_library = sc._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary accountKey = token_library.getConnectionStringAsMap("<LINKED SERVICE NAME>").get("<KEY NAME>") print(accountKey)