Page MenuHomePhabricator

Get cron schedule from ExternalSchedule rather than the Schedule DB
ClosedPublic

Authored by dgibson on Thu, Oct 8, 4:12 PM.

Details

Summary

Instead of relying on the (potentially out of date) cron schedule in the schedule DB, use the code/ScheduleDefinition as a source of truth for how often to run. This paves the way for being able to add per-schedule timezones without needing to make a schema change, plus a glorious future where you don't need to run a 'reconcile' command every time you change your schedule.

There are two downsides though:

  • You need to load the schedule on every loop to check (which is kind of a bummer for e.g. a daily schedule - that's a lot of empty checks). There are various optimizations we can do here if that becomes a problem.
  • If there's a load failure and we can't load the schedule for some reason, we no longer have a tick to put it in.
Test Plan

BK

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Oct 8, 4:29 PM
Harbormaster failed remote builds in B19318: Diff 23476!
python_modules/dagster/dagster/scheduler/scheduler.py
108–131

stepping back - should this be workspace oriented instead of DB oriented?

python_modules/dagster/dagster/scheduler/scheduler.py
108–131

I don't think so? We need the DB to know which schedules are actually running. Once we have those origins + start time we should delegate pretty much everything else to the workspace though IMO

python_modules/dagster/dagster/scheduler/scheduler.py
108–131

Ya you still need to use the DB too - what i meant was should a given scheduler process only handles the schedules in a workspace - and it loads them all on init (and fails out if theres a problem) and holds those schedules for its life time. There are downsides with this approach but i think its worth discussing. I just imagine that the current approach of driving from the DB could have some interesting problems that are hard for a user to understand - where finding a way for dagit and this scheduler to behave in the same "predictable" way could be good.

python_modules/dagster/dagster/scheduler/scheduler.py
108–131

"and holds those schedules for its life time"

This part seems non-ideal since the schedules might change and the scheduler process might run for a long time.

Other than that I'm not opposed to the suggestion to filter the set of schedules that are considered by those that are in a certain workspace, I think it is moderately unrelated to this diff though?

ya those discussions dont need to block this diff - I am not certain on the trade offs around losing tick based error reporting for this case, but if you feel good about it you can move forward

This revision is now accepted and ready to land.Tue, Oct 13, 9:33 PM