Page MenuHomePhabricator

[3/3] Refactor scheduler to simplify usage
AbandonedPublic

Authored by sashank on Sep 17 2019, 11:59 PM.

Details

Reviewers
schrockn
Group Reviewers
Restricted Project
Summary

The previous model for interacting with the Scheduler (start, stop, resume, end) was too complex, so this diff introduces a simpler interface to use the scheduler.

In code, the user defines a set of ScheduleDefinitions which are passed to the Scheduler.

s1 = ScheduleDefinition(...)
s2 = ScheduleDefinition(...)
s3 = ScheduleDefinition(...)

scheduler = SystemCronScheduler(schedule_defs=[s1, s2, s3], artifacts_dir=...)

There is a method init defined on Scheduler which is responsible for keeping the metadata files on disk and the crontab in sync with the ScheduleDefinitions in code. When init is called:

  • If there is a new ScheduleDefinition introduced, it creates the metadata files for the schedule with status ScheduleStatus.STOPPED
  • For every previously existing ScheduleDefinition (which are definitions with the same name), any changes to the definition are persisted and the status is kept unchanged
  • If any ScheduleDefinitions were removed, then the metadata files and crontab entry is removed for that schedule

The user can initialize the schedule by running dagster schedule init. Before calling init, the user can preview the changes that will be made by running dagster schedule init --preview, which shows an output like this:

Finally, the behavior of start and stop also have changed: they simply add and remove the cron job from the crontab. Users can start and end schedules using dagster schedule start {name} and dagster schedule stop {name}, or control the schedule status from the dagit scheduler UI:

Running dagit automatically initializes the scheduler. One thing to decide is whether to block dagit opening and require user confirmation if there are changes before automatically running init.

Test Plan

unit (test_scheduler and test_cron_scheduler)

Diff Detail

Repository
R1 dagster
Branch
refactor-schedule-2
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

sashank created this revision.Sep 17 2019, 11:59 PM
sashank retitled this revision from [WIP] Refactor scheduler to simply usage to [WIP] Refactor scheduler to simplify usage.Sep 18 2019, 6:28 PM
sashank retitled this revision from [WIP] Refactor scheduler to simplify usage to [3/3] Refactor scheduler to simplify usage.Sep 20 2019, 4:45 PM
sashank updated this revision to Diff 4906.Sep 20 2019, 4:51 PM

Revert scheduler examples

sashank edited the summary of this revision. (Show Details)Sep 20 2019, 5:19 PM
sashank edited the summary of this revision. (Show Details)Sep 20 2019, 5:25 PM
sashank updated this revision to Diff 4908.Sep 20 2019, 5:26 PM
sashank edited the summary of this revision. (Show Details)

up

Harbormaster failed remote builds in B3946: Diff 4909!
sashank updated this revision to Diff 4919.Sep 20 2019, 9:05 PM

Fix tests

sashank updated this revision to Diff 4922.Sep 20 2019, 9:33 PM

More test fixes

sashank edited the test plan for this revision. (Show Details)Fri, Sep 20, 11:17 PM
sashank added a reviewer: Restricted Project.
alangenfeld added inline comments.
js_modules/dagit/src/schedules/SchedulesRoot.tsx
142

explicitly name this optimistic

274–293

some things to try get the real mutation results working:

  • rename scheduleId to id
  • query name every where too
  • alias to id field id: scheduleId
schrockn requested changes to this revision.Wed, Sep 25, 5:30 PM
schrockn added a subscriber: schrockn.

I would suggest separating the process of initing a scheduler and the schedule itself into two separate classes, once that represents the *process* of creation, and one the represents the end result. Having the scheduler be stateful like this can cause confusion. What happens when init() is called twice? Does that even make sense. Plus there's the general confusion of __init__ versus init. Instead you could have function that does all the work creating the pipeline and then the end scheduler is just the post-init representation of what is going on.

Also I don't think it's worth it to go back and chop this diff, but this would have also been pretty nice to be a few separate diffs, as it is comprised a few components that pretty independently authorable and testable (e.g. changeset computation, init process, fancy pants CLI, js frontend). Doing so makes it easier for the reviewer and less frustrating for you since in aggregate it usually ends up with less back and forth.

python_modules/dagster/dagster/cli/schedule.py
81–91

might be nice to push this capability into the scheduler itself so we could reuse in graphql or whereever

128–157

so fancy

python_modules/dagster/dagster/core/scheduler/scheduler.py
136–167

recommend extracting this out into its own function with unit tests (function that takes old_schedules and new_schedules_defs).

Documenting args with check calls would make things more clear to a new reader, etc.

This revision now requires changes to proceed.Wed, Sep 25, 5:30 PM
sashank updated this revision to Diff 4981.Wed, Sep 25, 11:10 PM
sashank marked an inline comment as done.
  • updateStore -> optimisticUpdateStore
  • moved schedule status filtering from CLI to all_schedules method
  • extract change_set logic to method get_schedule_change_set and add unit tests
sashank planned changes to this revision.Wed, Sep 25, 11:10 PM
sashank added inline comments.Wed, Sep 25, 11:12 PM
python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_scheduler.py
84

Added this unit test

python_modules/dagster/dagster/cli/schedule.py
81–91

Added a status argument to all_schedules and moved the logic into there

sashank abandoned this revision.Fri, Sep 27, 4:32 PM