You have a standard hourly ETL pipeline running in Airflow. It processes financial transactions for a fintech client. The logic is simple:
Extract: Pull raw logs from S3. Transform: PySpark job to format dates and clean strings. Load: Append the data into a Delta table in Databricks. Notify: Send a Slack alert that the job is done.
Then, the pipeline runs at 2:00 AM.
Step 1 (Extract): Success. Step 2 (Transform): Success. Step 3 (Load): Success. (The data is committed to the Delta Log). Step 4 (Notify): FAILURE. (The Slack API times out).
You wake up, see the red task in Airflow. To "fix" it quickly, you clear the status of the whole DAG run to Retry.
The Question: What did you just do to my business?
The Data Engineering Channel
You have a standard hourly ETL pipeline running in Airflow. It processes financial transactions for a fintech client. The logic is simple:
Extract: Pull raw logs from S3.
Transform: PySpark job to format dates and clean strings.
Load: Append the data into a Delta table in Databricks.
Notify: Send a Slack alert that the job is done.
Then, the pipeline runs at 2:00 AM.
Step 1 (Extract): Success.
Step 2 (Transform): Success.
Step 3 (Load): Success. (The data is committed to the Delta Log).
Step 4 (Notify): FAILURE. (The Slack API times out).
You wake up, see the red task in Airflow. To "fix" it quickly, you clear the status of the whole DAG run to Retry.
The Question: What did you just do to my business?
What did you just do to the business?
1 week ago | [YT] | 11