A Cluster is basically a collection of virtual machines.
In a cluster, there is usually a driver node, which orchestrates and the tasks performed by one or more worker nodes.
Cluster allow us to treat this group of computers, as single compute engine via the driver node.
Cluster Types
Single Node Cluster: Only one node (Driver Node). No Worker Nodes. Supports Spark workloads. Suitable for lightweight Machine Learning and Data Analysis. Not horizontally scalable. Multi Node Cluster: One Driver Node and one or more Worker Nodes. Horizontally scalable. Suitable for large workloads like Spark Jobs. Access Modes
Single User: Only one user access. Supports Python, SQL, Scala, and R. Shared: Multiple users, process isolation. Available on premium workspaces. Supports Python and SQL. No Isolation Shared: Multiple users, no process isolation. Supports all four languages. Less secure. Custom: Legacy option, not available in the latest interface. Databricks Runtimes
Databricks Runtime: Includes Apache Spark, libraries for Java, Scala, Python, R, and GPU. Databricks Runtime ML: Adds machine learning libraries like PyTorch, TensorFlow. Photon Runtime: Adds Photon Engine for faster SQL workload processing. Databricks Runtime Light: For automated workloads, no advanced features. Auto Termination
Terminates idle clusters to avoid costs. Default value is 120 minutes, adjustable between 10-10,000 minutes. Auto Scaling
Dynamically adjusts the number of Worker Nodes based on workload. Not recommended for streaming workloads. Azure VM Types
Memory Optimized: For memory-intensive tasks like ML. Compute Optimized: For structured streaming applications. Storage Optimized: For high disk throughput needs. General Purpose: For enterprise-grade applications and analytics. GPU Accelerated: For deep learning models. Cluster Policies
Simplifies cluster configuration for standard users. Limits cluster size and options to ensure cost control. Available only on premium tier.
Data World Solution
A Cluster is basically a collection of virtual machines.
In a cluster, there is usually a driver node, which orchestrates and the tasks performed by one or more worker nodes.
Cluster allow us to treat this group of computers, as single compute engine via the driver node.
Cluster Types
Single Node Cluster: Only one node (Driver Node). No Worker Nodes. Supports Spark workloads. Suitable for lightweight Machine Learning and Data Analysis. Not horizontally scalable.
Multi Node Cluster: One Driver Node and one or more Worker Nodes. Horizontally scalable. Suitable for large workloads like Spark Jobs.
Access Modes
Single User: Only one user access. Supports Python, SQL, Scala, and R.
Shared: Multiple users, process isolation. Available on premium workspaces. Supports Python and SQL.
No Isolation Shared: Multiple users, no process isolation. Supports all four languages. Less secure.
Custom: Legacy option, not available in the latest interface.
Databricks Runtimes
Databricks Runtime: Includes Apache Spark, libraries for Java, Scala, Python, R, and GPU.
Databricks Runtime ML: Adds machine learning libraries like PyTorch, TensorFlow.
Photon Runtime: Adds Photon Engine for faster SQL workload processing.
Databricks Runtime Light: For automated workloads, no advanced features.
Auto Termination
Terminates idle clusters to avoid costs. Default value is 120 minutes, adjustable between 10-10,000 minutes.
Auto Scaling
Dynamically adjusts the number of Worker Nodes based on workload. Not recommended for streaming workloads.
Azure VM Types
Memory Optimized: For memory-intensive tasks like ML.
Compute Optimized: For structured streaming applications.
Storage Optimized: For high disk throughput needs.
General Purpose: For enterprise-grade applications and analytics.
GPU Accelerated: For deep learning models.
Cluster Policies
Simplifies cluster configuration for standard users. Limits cluster size and options to ensure cost control. Available only on premium tier.
11 months ago | [YT] | 1