An Azure service for ingesting, preparing, and transforming data at scale.
Hey Chandramouli (and Amira!),
This is a great topic—nothing more frustrating than a green-check pipeline that quietly loads garbage. Here are some of the most common “silent failure” patterns I’ve seen, plus how you can proactively catch them in your ADF pipelines:
- Type mismatches silently convert or drop data • What happens: Source column changes from string to int (or vice versa), copy activity quietly casts bad values to null or defaults. • How to catch it: – Add a “Validate Schema” activity (Data Flow schema drift or custom script) before your load. – Use Mapping Data Flow’s built-in schema drift settings to alert on unexpected column types. – Fail the pipeline if validation flags any unexpected type.
- Upstream schema/table changes go unnoticed • What happens: Dev adds a new column, renames one, or changes a file layout—downstream queries silently skip it. • How to catch it: – Maintain a metadata table of expected table/file schemas and use a Lookup + If Condition at the start of your pipeline to compare actual vs. expected. – Use the Data Factory REST API or Azure Function to pull the dataset’s JSON schema at runtime and diff it against your “golden” schema.
- Child pipelines swallow failures • What happens: A “Execute Pipeline” inside a ForEach or parent pipeline fails, but the parent still reports success because the error path isn’t wired up. • How to catch it: – In your child pipelines, use TRY-CATCH (OnFailure paths) and raise a custom error (Set Variable + Fail activity). – Back in the parent, hook the child’s Failure path into a Fail activity so any child error bubbles up.
- Missing row-count or record-level checks • What happens: You run a daily load from SalesData, records go down from 10k to 2k and you never notice. • How to catch it: – After your Copy/Data Flow, insert a Validation step:
– Fail early if any count or rule check doesn’t pass.• Compare source vs target row counts (Lookup + If Condition). • Validate key business rules (e.g., no null dates, no negative amounts) via a simple SQL or Mapping Data Flow filter. - No auditing or metrics stored • What happens: You have no baseline to know if “this run” is off by 10% vs “normal.” • How to catch it: – Log audit details to a SQL/Azure Table/Log Analytics workspace after each activity:
– Build a dashboard/alert when any metric deviates beyond a threshold.• Row counts in/out, data skew warnings, error counts. • Timestamps and execution durations for each step.
Putting it all together, a robust pattern looks like:
- Orchestrate your ETL as: • Pre-validation (schema + row count) • Core transformation/load • Post-validation (business rules + row count) • Audit logging
- Use OnSuccess/OnFailure links on every critical activity so nothing ever silently finishes.
- Treat your pipeline’s “Success” as “all validation steps passed,” not “all activities ran.”
Hope this helps—and I’d love to join next Tuesday’s session!
References
• Troubleshoot pipeline validation errors in ADF: https://docs.microsoft.com/azure/data-factory/pipeline-trigger-troubleshoot-guide
• Mapping Data Flow schema drift & validation: https://docs.microsoft.com/azure/data-factory/concepts-data-flow-schema-drift
• Row count and business rule checks with Lookup + If Condition: https://docs.microsoft.com/azure/data-factory/control-flow-lookup-activity
• Audit logging patterns & metrics: https://docs.microsoft.com/azure/data-factory/concepts-pipelines-activities#monitoring-and-logging