Exam DP203 front-end services Databricks: Difference between revisions
No edit summary |
No edit summary |
||
Line 19: | Line 19: | ||
<pre> | <pre> | ||
filtered_df = df.filter(df["Age"] > 30) | filtered_df = df.filter(df["Age"] > 30) | ||
</pre>install Python libraries such as Pandas, NumPy, or Scikit-learn. MLlib for machine learning. | |||
<pre> | |||
# Create a sample DataFrame | |||
data = [("Alice", 34), ("Bob", 45), ("Cathy", 29)] | |||
columns = ["Name", "Age"] | |||
df = spark.createDataFrame(data, columns) | |||
# Select columns | |||
df.select("Name").show() | |||
# Filter rows | |||
df.filter(df["Age"] > 30).show() | |||
# Group by and aggregate | |||
df.groupBy("Age").count().show() | |||
</pre> | </pre> |
Revision as of 17:35, 21 November 2024
Second front-end_service is 2. Databricks.
Access the Databricks portal from the Azure Portal by going into the Databricks resouce and clicking to open the Databricks workspace.
Databricks supports Python, Scala, R, and Spark SQL, along with multiple machine learning frameworks.
Delta Lake
Governance: Unity catalog and Purview
df = spark.sql("SELECT * FROM products") df = df.filter("Category == 'Road Bikes'") display(df)
Databricks File System (DBFS)
Matplotlib, Seaborn
filtered_df = df.filter(df["Age"] > 30)
install Python libraries such as Pandas, NumPy, or Scikit-learn. MLlib for machine learning.
# Create a sample DataFrame data = [("Alice", 34), ("Bob", 45), ("Cathy", 29)] columns = ["Name", "Age"] df = spark.createDataFrame(data, columns) # Select columns df.select("Name").show() # Filter rows df.filter(df["Age"] > 30).show() # Group by and aggregate df.groupBy("Age").count().show()