Databricks
Databricks is available to SEAD users as a non-standard product
What is Databricks
Databricks is a cloud-based Big Data processing platform which provides users with an integrated environment to collaborate on projects and offers a range of tools for data exploration, visualisation and analysis. Within the Databricks environment, users can:
- build pipelines for streaming data processing
- build and run machine learning tools
- create interactive dashboards
- take advantage of scalable distributed computing capability
Users will also have access to the Databricks Academy training subscription (an online library of Databricks training guides), in addition to instruction materials on how to setup the Databricks workspace provided in the library drive.
How to allocate a Databricks workspace to a project
To allocate a Databricks workspace to your project, you will need to submit a request to your SEAD administrator. Once your project is allocated a Databricks workspace, it can be accessed from within your Virtual Machine (VM) using the installed Edge or Firefox browsers.
Cost consumption
As Databricks uses separate compute power, projects requesting access to Databricks should consider if they need to continue to maintain their existing VM sizes. The option of scaling down the size of existing VMs provides users the opportunity to save on project costs.
What are the cluster policy arrangements?
Users can be provisioned with the following cluster policy options:
Instance: DS3 v2
- Server purpose: General purpose
- Max autoscale workers: 5
- CPU: 4
- RAM/Databricks Units: 14GB/0.75
Instance: D13 v2
- Server purpose: Memory optimized
- Max autoscale workers: 4
- CPU: 8
- RAM/Databricks Units: 56GB/2
Instance: DS3 v2
- Server purpose: Compute optimized
- Max autoscale workers: 4
- CPU: 16
- RAM/Databricks Units: 32GB/3
Databricks cluster policies will restrict the type and number of workers you can provision for a cluster. If an existing policy does not fit your requirements, you can request a new policy via the ABS. All information regarding this can be found in the library drive.
To ensure the security and integrity of SEAD, partners will not have administrative access to the Databricks workspace and some usage restrictions may apply. Administration will be exclusively managed by the ABS.