14 Important points to understand Serverless Compute in Databricks
1. If you have created the classic compute you know that you would typically wait for 3-7 minutes for cluster to get started. Also scaling up takes similar amount of time. serverless reduces this startup time drastically.
2. its simple, we do not have to set 100's of infra settings just to balance out on the performance - cost tradeoff.
3. for serverless jobs we just have to pick a goal - standard / performance. if we pick standard then performance will be on par with classic, but if we pick performance then you will get very quick startup time and this will be efficient to run your workloads.
4. serverless compute runs in control plane (databricks cloud account). Unlike the classic compute which runs on your cloud account, serverless has a big change. Now the infra runs on databricks account. So it's basically a fleet of resources which are already running and it grabs few cores from that.
5. In classic compute world, Upgrading DBR's (Databricks runtime version) is a pain. It was reported that 20% of the time goes in just doing this maintenance work of upgrading DBR's. with serverless all of this is taken care.
6. Currently serverless compute is available for your SQL workloads, lakeflow Jobs, Notebooks, Declarative pipelines (DLT)
7. serverless compute is a versionless product. It anways runs on the latest version.
8. to track the usage/ cost you can set the buget policy with tags. Then you can query your system tables to understand the DBU breakup.
9. you can set the environment (only applicable for serverless). you can have your REPL memory to 16 GB or increase it to 32 GB when you are collecting lot of data on your REPL. can be in case of python pandas.
10. Right now serverless supports SQL & Python languages.
11. When using serverless, the type of worker, the number of workers, scaling up and scaling down. All of these things are taken care. Also its done in a intelligent way and this should significantly reduce the cost.
12. when using classic compute, lets say if you say you need a 8 node cluster. And on this you are not running spark, then also you end up paying for 8 nodes even if its not used. But in serverless it will create a cluster only when you use spark. Else it might just have one node when doing things using pandas.
13. In serverless its not that databricks is enforcing you to use the latest versions only, might be you have specific need for a older version then you can select that from environments screen.
Sumit Mittal
14 Important points to understand Serverless Compute in Databricks
1. If you have created the classic compute you know that you would typically wait for 3-7 minutes for cluster to get started. Also scaling up takes similar amount of time. serverless reduces this startup time drastically.
2. its simple, we do not have to set 100's of infra settings just to balance out on the performance - cost tradeoff.
3. for serverless jobs we just have to pick a goal - standard / performance.
if we pick standard then performance will be on par with classic, but if we pick performance then you will get very quick startup time and this will be efficient to run your workloads.
4. serverless compute runs in control plane (databricks cloud account). Unlike the classic compute which runs on your cloud account, serverless has a big change. Now the infra runs on databricks account. So it's basically a fleet of resources which are already running and it grabs few cores from that.
5. In classic compute world, Upgrading DBR's (Databricks runtime version) is a pain. It was reported that 20% of the time goes in just doing this maintenance work of upgrading DBR's. with serverless all of this is taken care.
6. Currently serverless compute is available for your SQL workloads, lakeflow Jobs, Notebooks, Declarative pipelines (DLT)
7. serverless compute is a versionless product. It anways runs on the latest version.
8. to track the usage/ cost you can set the buget policy with tags. Then you can query your system tables to understand the DBU breakup.
9. you can set the environment (only applicable for serverless). you can have your REPL memory to 16 GB or increase it to 32 GB when you are collecting lot of data on your REPL. can be in case of python pandas.
10. Right now serverless supports SQL & Python languages.
11. When using serverless, the type of worker, the number of workers, scaling up and scaling down. All of these things are taken care. Also its done in a intelligent way and this should significantly reduce the cost.
12. when using classic compute, lets say if you say you need a 8 node cluster. And on this you are not running spark, then also you end up paying for 8 nodes even if its not used. But in serverless it will create a cluster only when you use spark. Else it might just have one node when doing things using pandas.
13. In serverless its not that databricks is enforcing you to use the latest versions only, might be you have specific need for a older version then you can select that from environments screen.
14. Serverless is the way to go!
2 weeks ago | [YT] | 32