Page MenuHomePhabricator

[2/3] Extract scheduler from RepositoryDefinition using ExecutionTargetHandle
ClosedPublic

Authored by sashank on Sep 12 2019, 10:19 PM.

Details

Summary

Proposal 2 on how to remove the scheduler from RepositoryDefinition:

We have a plugins: scheduler section in repository.yaml that points to a file or module and function:

repository:
  file: repo.py
  fn: define_repo
scheduler:
  file: sched.py
  fn: define_scheduler

And uses ExecutionTargetHandle to initialize and store the schedule. The schedule can then be retrieved using handle.build_scheduler(dagster_instance=instance)

We can still have one scheduler target multiple repositories – we would just have to copy the scheduler: snippet in every repository.yaml that uses the scheduler

Alternative implementation: D1023

Test Plan

todo

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sashank retitled this revision from [WIP] Extract scheduler from RepositoryDefinition using handles to [RFC] Extract scheduler from RepositoryDefinition using handles.Sep 12 2019, 10:31 PM
sashank edited the summary of this revision. (Show Details)
sashank edited the summary of this revision. (Show Details)
sashank added a reviewer: Restricted Project.Sep 12 2019, 10:34 PM
sashank retitled this revision from [RFC] Extract scheduler from RepositoryDefinition using handles to [RFC] Extract scheduler from RepositoryDefinition using ExecutionTargetHandle.
sashank edited the summary of this revision. (Show Details)Sep 12 2019, 10:49 PM
Harbormaster failed remote builds in B3768: Diff 4689!
sashank updated this revision to Diff 4716.Sep 13 2019, 5:09 PM

Update tests

sashank updated this revision to Diff 4721.Sep 13 2019, 5:45 PM

more tests

alangenfeld added inline comments.
python_modules/dagster-graphql/dagster_graphql/implementation/context.py
13–29

I really like the plugins setup in the folder - can we carry that through here?

Throwing one idea out there - we could have a DagsterPlugin interface that enforces the existence of a from_instance/for_instance and a key/name property or something and then build up a plugins dict on the GraphQL context.

sashank added inline comments.Sep 13 2019, 10:39 PM
python_modules/dagster-graphql/dagster_graphql/implementation/context.py
13–29

As discussed, will implement when we need to make our next plugin

sashank updated this revision to Diff 4752.Sep 13 2019, 10:40 PM

Remove plugin key from yaml

sashank planned changes to this revision.Mon, Sep 16, 10:37 PM
schrockn requested changes to this revision.Tue, Sep 17, 12:01 AM
schrockn added a subscriber: schrockn.

why do you we need the Scheduler to take a DagsterInstance? This seems like the inverse of the dependency that we want? It looks like you only use it for _schedule_dir which is unused?

python_modules/dagster/dagster/core/definitions/handle.py
102

it would be nice to consolidate these implementations at some point. might want to create an issue to track that. could be a good first task for someone

375

when is this true?

438

better error message?

python_modules/dagster/dagster/core/scheduler/scheduler.py
15

why do we need the _schedule_dir?

This revision now requires changes to proceed.Tue, Sep 17, 12:01 AM
sashank edited the summary of this revision. (Show Details)Tue, Sep 17, 5:15 PM
sashank marked an inline comment as done.Wed, Sep 18, 10:42 PM
sashank added inline comments.
python_modules/dagster/dagster/core/definitions/handle.py
375

added a comment

sashank updated this revision to Diff 4866.Wed, Sep 18, 10:43 PM
  • Update error message for calling schedule_entrypoint from non-repository-based ExecutionTargetHandle
  • Rename schedule_dir to artifacts_dir
  • Change Scheduler's dependency on instance to just a artifacts_dir
  • Cleanup
sashank added inline comments.Wed, Sep 18, 10:49 PM
python_modules/dagster/dagster/core/definitions/handle.py
102
sashank added inline comments.Wed, Sep 18, 10:51 PM
python_modules/dagster/dagster/core/scheduler/scheduler.py
15

We use it in SystemCronScheduler to store the .sh files and metadata for initialized schedules

schrockn added inline comments.Thu, Sep 19, 12:14 AM
python_modules/dagster/dagster/core/scheduler/scheduler.py
15

Right so why doesn't just the SystemCronScheduler have it? I should have worded my question precisely. Why is it in the base class?

alangenfeld resigned from this revision.Thu, Sep 19, 3:44 PM
sashank planned changes to this revision.Thu, Sep 19, 7:38 PM
sashank updated this revision to Diff 4892.Fri, Sep 20, 12:44 AM

re-arranging stack

sashank planned changes to this revision.Fri, Sep 20, 12:44 AM
sashank planned changes to this revision.Fri, Sep 20, 1:11 AM
sashank added inline comments.Fri, Sep 20, 4:34 PM
python_modules/dagster/dagster/core/scheduler/scheduler.py
15

Finally understood. I am realizing that the constructor signature and the instance attributes don't have to match up, and that we can push the it down into SystemCronScheduler.

I'll make Scheduler a pure interface, and push down all implementation to SystemCronScheduler to make this cleaner.

sashank updated this revision to Diff 4905.Fri, Sep 20, 4:40 PM
  • Make Scheduler a pure interface and move implementation of get_all_schedule_defs and get_schedule_def to SystemCronScheduler
sashank retitled this revision from [RFC] Extract scheduler from RepositoryDefinition using ExecutionTargetHandle to [2/3] Extract scheduler from RepositoryDefinition using ExecutionTargetHandle.Fri, Sep 20, 4:45 PM
sashank updated this revision to Diff 4935.Fri, Sep 20, 11:12 PM

update commit message

Harbormaster completed remote builds in B3967: Diff 4935.
schrockn accepted this revision.Wed, Sep 25, 5:11 PM

let's move forward with this. There's def a little bit of copy paste in here which I think can be consolidate. The MockScheduler and SystemCronScheduler duplication should be straightforward to resolve in a follow up. But let's press forward here. This is a great step.

python_modules/dagster-graphql/dagster_graphql/implementation/context.py
28

I generally think we should avoid this pattern of getting a default value that potentially does a non-trivial computation but I don't feel strongly enough to block this diff. I think having a specialized factory method for this and forcing the callsites to understand where they are getting the scheduler from is generally preferable. We can easily to do this refactor in a follow on diff.

python_modules/dagster/dagster/core/scheduler/scheduler.py
15

🙌🏻🙌🏻🙌🏻

python_modules/libraries/dagster-cron/dagster_cron/cron_scheduler.py
16–53

This is another copy paste job from MockScheduler. I think it would be relatively straight forward to extract out a InMemScheduleDefStore class to something that both the SystemCronScheduler and the MockScheduler delegate out to. Feel free to do a follow on diff.

This revision is now accepted and ready to land.Wed, Sep 25, 5:11 PM