Page MenuHomePhabricator

Create Scheduler API
ClosedPublic

Authored by sashank on Aug 9 2019, 10:18 PM.

Details

Reviewers
alangenfeld
schrockn
Group Reviewers
Restricted Project
Commits
R1:46421ee5154c: Create Scheduler API
Summary

Scheduler API:

Scheduler is an abstract class which defines methods such as create_schedule, start_schedule, etc. Each instance of a Scheduler stores a list of RunSchedules

Two implementations of Scheduler included are:

  • TestScheduler - works entirely in memory and does not implement starting and stopping schedules (just creating and removing them)
  • SystemCronScheduler - saves schedules to a file, and uses cron to start and stop schedules.
Test Plan

unit

Diff Detail

Repository
R1 dagster
Branch
schedule-storage
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

sashank created this revision.Aug 9 2019, 10:18 PM
alangenfeld requested changes to this revision.Aug 12 2019, 9:23 PM
alangenfeld added subscribers: max, alangenfeld.
  • While cribbing from pipeline run storage was a good way to get started - i think some of the things it currently does are not great to copy. [1]
  • I don't think "storage" is the right granularity for setting up a class hierarchy for the scheduler, at least at this point. From my vantage we're on trajectory to have SystemCronScheduler which implements the Scheduler interface/base class.
  • "in memory" scheduler should be more explicit about being a TestScheduler for testing. Impl can live over in test utils too.

Happy to sit and do a high level API design chat if you want. The GraphQL schema change is usually a good framing for making sure things feel right.

python_modules/dagster-graphql/dagster_graphql/implementation/pipeline_schedule_storage.py
142–216 ↗(On Diff #3606)

[1] I'm not a huge fan of this use of OOP - I think it will likely be much cleaner to just have Scheduler conformant classes vend a value type PipelineSchedule and have the implementation details of storage and retrieval be in the details of the Scheduler

You can check with @max on how he is planning on refactoring the run storage since were hitting issues with this set up there

This revision now requires changes to proceed.Aug 12 2019, 9:23 PM
schrockn added inline comments.
python_modules/dagster-graphql/dagster_graphql/implementation/pipeline_schedule_storage.py
142–216 ↗(On Diff #3606)

+1

sashank updated this revision to Diff 3622.Aug 12 2019, 11:21 PM

Update API to design discussed in person

Scheduler API:

There is an abstract base class Scheduler, which defines abstract methods such as create_schedule, start_schedule, etc.

There are two scheduler classes that implement Scheduler:

  • TestScheduler - works entirely in memory and does not implement starting and stopping schedules (just creating and removing them)
  • SystemCronScheduler - saves schedules to a file, and uses cron to start and stop schedules.

Each scheduler stores a list of Schedules. The type of the schedules depends on the type of the scheduler. TestScheduler simply uses the base Schedule class, but SystemCronScheduler uses SystemCronSchedule, which implements the logic for reading the schedule to and from the file.

this feels worth putting in its own directory /dagster_graphql/implementation/scheduler/ maybe or just /dagster_graphql/scheduler/, might be time to move away from the catch all implementation folder.

python_modules/dagster-graphql/dagster_graphql/implementation/schedule.py
112 ↗(On Diff #3622)

lets move this in to its own file

140–146 ↗(On Diff #3622)

you can stack this diff too ahead of landing anything right?

174 ↗(On Diff #3622)

maybe call it RunSchedule, schedule on its own feels a touch vague

199–234 ↗(On Diff #3622)

hm i still think this makes sense just to have as logic in SystemCronScheduler and not have any inheritance for Schedule.

update diff title and summary as well

sashank marked 3 inline comments as done.Aug 13 2019, 12:12 AM
sashank added inline comments.
python_modules/dagster-graphql/dagster_graphql/implementation/schedule.py
140–146 ↗(On Diff #3622)

yup will implement in the stack

199–234 ↗(On Diff #3622)

good idea - moved into SystemCronScheduler

sashank updated this revision to Diff 3623.Aug 13 2019, 12:39 AM
sashank marked an inline comment as done.

Moved code to /implementation/scheduler/ and deleted SystemCronSchedule (logic moved to SystemCronScheduler)

sashank retitled this revision from Create pipeline schedule storage to Create Scheduler API.Aug 13 2019, 12:49 AM
sashank edited the summary of this revision. (Show Details)
sashank updated this revision to Diff 3624.Aug 13 2019, 1:39 AM
sashank edited the summary of this revision. (Show Details)

Change type of execution_params on RunSchedule from string to ExecutionParams

seems like a good start. will let @alangenfeld do final approval as you two have been discussing

python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/scheduler.py
77

seems like a test scheduler should only live in tests or in a test.py file or something

107

namedtuples are nice for value objects like this

python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/system_cron_scheduler.py
53

this is a little cute. i think we should just stick to get_schedule_by_id

schrockn resigned from this revision.Aug 13 2019, 3:12 AM

just popping off of my queue

sashank updated this revision to Diff 3627.Aug 13 2019, 3:39 AM
  • Remove TestScheduler - will add back when using it in future tests
  • Make RunSchedule a namedtuple
sashank updated this revision to Diff 3630.Aug 13 2019, 4:58 AM

Change json.loads to json.load in SystemCronScheduler._load_schedules()

sashank updated this revision to Diff 3635.Aug 13 2019, 10:24 AM

Use keys instead of attributes to access execution_params dict in all_schedules_for_pipeline

This revision is now accepted and ready to land.Aug 13 2019, 3:15 PM
sashank updated this revision to Diff 3644.Aug 13 2019, 5:46 PM

Run tests

sashank updated this revision to Diff 3648.Aug 13 2019, 5:57 PM

Rerun tests

schrockn added inline comments.Aug 13 2019, 6:01 PM
python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/system_cron_scheduler.py
89

may want to consider a harder failure here

alangenfeld added inline comments.Aug 13 2019, 6:02 PM
python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/system_cron_scheduler.py
89

goodcatch

ya we probably just want to let the exception bubble

sashank updated this revision to Diff 3650.Aug 13 2019, 6:10 PM

Rethrow exception

This revision was automatically updated to reflect the committed changes.
schrockn added inline comments.Aug 13 2019, 6:16 PM
python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/system_cron_scheduler.py
90

going to need to use six.raise_from for py2

sashank added inline comments.Aug 13 2019, 6:18 PM
python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/system_cron_scheduler.py
90

will get this in the next diff

schrockn added inline comments.Aug 13 2019, 6:18 PM
python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/system_cron_scheduler.py
90

cool