Page MenuHomePhabricator

RFC: Add partitions to arbitrary cron-based schedules
Changes PlannedPublic

Authored by dgibson on Thu, Oct 15, 3:48 PM.

Details

Summary

With the magic of cronitor, there is no reason we can't generate a partition set for arbitrary cron schedules. This would allow us to maintain the invariant that all schedules have partition sets and can be backfilled. Just putting out for any initial thoughts/feedback.

The logic is backwards-compatible with existing XXXly_schedule partition logic: at a given execution time, the partition for the generated run is the previous execution time. This might not make sense for all schedules, so we could make it configurable if we want.

Test Plan

TBD

Diff Detail

Repository
R1 dagster
Branch
timezonecronpartitions
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

dgibson published this revision for review.Thu, Oct 15, 3:51 PM
dgibson added inline comments.
python_modules/dagster/dagster/core/definitions/decorators/schedule.py
120–124

this could be configurable as well, or we could get super fancy and try to figure it out based on the cron string - would need to produce unique partition names per execution of course

This is great. I think the big question here now is whether we should always use cron_partition_range and create_cron_partition_selector_fn instead of date_partition_range and create_default_partition_selector_fn for all the other schedule decorators.

python_modules/dagster/dagster/core/definitions/decorators/schedule.py
96

good call on making a brand new decorator instead of squeezing in back-compat

120–124

what do you mean by "could get super fancy and try to figure it out based on the cron string"?

150–152

good point. a boolean flag on the decorator feels a bit gross but we could document it and explain it easily.

yeah, that would be a good future change I think. A missing piece is that we would need to be able to efficiently figure out the next partition without loading them all I think? Since you could have an hourly schedule that spans hundreds of days, for example.

Yeah, I think this looks reasonable. Let's add some tests to document the behavior we're going for?

Something similar to python_modules/dagster/dagster_tests/core_tests/partition_tests/test_utils.py

sandyryza added inline comments.
python_modules/dagster/dagster/core/definitions/decorators/schedule.py
98

Does start_date need to be required? This is one of the things I found really annoying about Airflow. For a drop/recreate table, a start_date doesn't really make sense.