0

What’s the best way to handle having a different schedule interval for backfilling and ongoing running?
For backfilling I want to use a daily interval, but for ongoing running I want to use an hourly interval.
I can think of three approaches to this:

The easiest approach I see is to define two DAGs in the one .py file. dag_backfill with a daily interval, a start date in the past and end date of datetime.now(), and dag_ongoing with an hourly interval and start date of datetime.now() that takes over when dag_backfill finishes. However two DAGs in one file is discouraged here:

We do support more than one DAG definition per python file, but it is not recommended as we would like better isolation between DAGs from a fault and deployment perspective…

Two .py files that import the same python functions that make up the pipeline. I worry about keeping the separate files consistent in this approach.
Only one DAG with an hourly interval that checks if the the run date is over 1 day in the past and if so only runs at midnight for those dates. I feel like that is inelegant through as it would obscure the schedule the backfilling will run on, at least from the gui homepage.

Is there a common pattern for this or known best practice?

Kuldeep Baberwal Changed status to publish February 17, 2025