Sumit Mittal

The very first thing most likely you will do when working with Databricks is Ingesting your data, so that your data is sitting in Delta Lake.

whats the benefit of data sitting in Delta Lake compared to Data Lake?

you get
- ACID properties
- Perform DML operations
- Time Travel
- Schema Evolution and Enforcement
and much more!

So getting the data in Delta lake is the first task.

How do you do that?

Earlier, it used to be a bunch of third party tools, in house tools.

But now things are very much simplified. you can now use Lakeflow Connect for the Data Ingestion.

your organization data might be sitting in cloud storages - adls gen2 , amazon s3, databricks volumes or the data might be sitting in databases, SaaS applications etc..

with lakeflow connect we can have efficient ingenstion pipelines all within databricks.

there are different types of connectors

- upload files from local storage to volume
- Standard connectors (ingest from cloud storage into your delta lake)
- managed connectors (ingest from SaaS applications / Databases)

Ways of doing Data Ingestion from cloud storage (Standard Connectors)

- CTAS
- Copy Into
- Auto Loader

We can use fully managed connectors for doing Data Ingestion from Databases or SaaS applications.

different kind of ingestion modes

- batch ingestion (all data is re-ingested each time)
- incremental batch (all new data is ingested, previously loaded records are skipped automatically)
- incremental streaming - continously load data rows or batches of data rows as it is generated so that you can query as it arrives in near real time.

Once the data is ingested in delta lake you do ETL using Lakeflow Declarative Pipelines in Databricks. This was earlier referred to as DLT (Delta Live tables)

I hope you found this helpful.

Do you want a hack on how to practise in Azure Databricks for almost free using a paid edition?

I can talk about it in my next post if you are curious to know!

4 days ago | [YT] | 39



@MohamedKhalil-ep5zz

good

1 day ago | 0

@DEwithWrangler

Is there any course talks about

3 days ago | 0

@yashpadiyar4952

We generally use dlt when having a streaming source na?

3 days ago | 0

@the_hustling_wanderer

Lakeflow is built on Nifi?

4 days ago | 0

@swayam7685

Sir ...I want your help....I am a civi engineer..after my graduation i worked 3 years in civil line then i jumped to IT..i know only sql and basic of power bi and ssrs ...I want to switch my career I have given 7 interviews 3 interview I got rejection in first round...and 4 till manager round I cleared but my bad luck 2 position got hold another 2 internal hiring they did .....I want to know which course i have to do for my job switch.... Could you please help me.... please sir

3 days ago | 0