https://DevOpsCloud.io -- Cloud Monk Losang Jinpa, Ph.D., MCSE/MCT, GitOps DevOps Engineer

Databricks

Return to Databricks Cloud Shell and Apache Spark

Snippet from Wikipedia: Databricks: Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company, founded in 2013 by the original creators of Apache Spark. The company provides a cloud-based platform to help enterprises build, scale, and govern data and AI, including generative AI and other machine learning models.
Databricks pioneered the data lakehouse, a data and AI platform that combines the capabilities of a data warehouse with a data lake, allowing organizations to manage and use both structured and unstructured data for traditional business analytics and AI workloads. The company similarly develops Delta Lake, an open-source project to bring reliability to data lakes for machine learning and other data science use cases.

Creative Commons Attribution-Share Alike 4.0

AN OPEN AND UNIFIED DATA ANALYTICS PLATFORM FOR DATA ENGINEERING, MACHINE LEARNING, AND ANALYTICS

From the original creators of Apache Spark, Delta Lake, MLflow, and Koalas

Select a platform:

DATABRICKS PLATFORM - FREE TRIAL

For businesses

Collaborative environment for Data teams to build solutions together
Unlimited clusters that can scale to any size, processing data in your own account
Job scheduler to execute jobs for production pipelines
Fully collaborative notebooks with multi-language support, dashboards, REST APIs
Native integration with the most popular ML frameworks (scikit-learn, TensorFlow, Keras,…), Apache SparkTM, Delta Lake, and MLflow
Advanced security, role-based access controls, and audit logs
Single Sign On support
Integration with BI tools such as Tableau, Qlik, and Looker
14-day full feature trial (excludes cloud charges)

CHOOSE YOUR CLOUD

Please note that Azure Databricks is provided by Microsoft Azure and is subject to Microsoft's terms. By clicking on the “AWS” button to get started, you agree to the Databricks Terms of Service. By clicking on the “Google Cloud” button to get started, you agree to the Databricks Terms of Service.

COMMUNITY EDITION

For students and educational institutions

Single Spark cluster limited to 15GB and no worker nodes
Basic notebooks without collaboration
Limited to 3 max users
Public environment to share your work

By clicking “Get Started” for the Community Edition, you agree to the Databricks Community Edition Terms of Service.

https://databricks.com/try-platform

Welcome to Databricks Community Edition!

Databricks Community Edition provides you with access to a free Spark micro-cluster as well as a cluster manager and a notebook environment - ideal for developers, data scientists, data engineers and other IT professionals to get started with Spark.

We need you to verify your email address by clicking on this link. You will then be redirected to Databricks Community Edition!

Get started by visiting: https://community.cloud.databricks.com/login.html

If you have any questions, please contact feedback@databricks.com.

- The Databricks Team

Instructor: Adam Breindel

LinkedIn: https://www.linkedin.com/in/adbreind

Email: adbreind@gmail.com

Twitter: @adbreind - 20+ years building systems for startups and large enterprises - 10+ years teaching data, ML, front- and back-end technology - Fun large-scale data projects… - Streaming neural net + decision tree fraud scoring - Realtime & offline analytics for banking - Music synchronization and licensing for networked jukeboxes - Industries - Finance, Insurance - Travel, Media / Entertainment - Energy, Government

Create a Databricks account • Sign up for free Community Edition now at https://databricks.com/try-databricks • Use Firefox, Chrome or Safari

Getting Started

These steps are illustrated on subsequent pages; this is the summary: 1. Copy the courseware link or prepare to type it ☺ https://materials.s3.amazonaws.com/2021/oreilly0708/spark.dbc

2. Import that file into your Databricks account per the instructions on the following slides.

3. Create a cluster: choose Databricks Runtime 8.2 (illustrated in the following slides) Setup with Databricks

3 Log in to Databricks

4 Import Notebooks… 1 2 3 5 Import Notebooks for Today… Type or paste in today’s notebook URL … then click Import Choose URL… https://materials.s3.amazonaws.com/2021/oreilly0708/spark.dbc 6 Find your notebook(s) here… Click Workspace Here are your notebooks 7 Create a Cluster 1 2 8 Create a Cluster 3 4 5 Runtime 8.2 (Scala 2.12, Spark 3.1.1) 9 All set: let's go!

https://learning.oreilly.com/live-events/spark-31-first-steps/0636920371533/0636920061945/

https://on24static.akamaized.net/event/33/46/13/4/rt/1/documents/resourceList1631717069306/setup.pdf

Create Cluster New Cluster 0 Workers:0 GB Memory, 0 Cores, 0 DBU 1 Driver:15.3 GB Memory, 2 Cores, 1 DBU UI]] | .[[dbc, .scala, .py, .sql, .r, .ipynb, .Rmd, .html, .zip