Page MenuHomePhabricator

Pass instance into user-supplied env/tags functions to support complex partition selection logic
ClosedPublic

Authored by prha on Jan 4 2020, 2:08 AM.

Details

Summary

We want more complex partition selection logic (that examines run storage), which requires
access to the dagster instance. This diff plumbs this logic through ScheduleDefinition, using
a helper method (instead of properties) to calculate the environment_dict, tags, and should_execute
logic for a given scheduled execution request

Depends on D1776

Test Plan

Tested last_empty_partition logic through the graphql test case

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

prha created this revision.Jan 4 2020, 2:08 AM
prha updated this revision to Diff 8368.Jan 4 2020, 9:16 PM

fix py27

prha updated this revision to Diff 8369.Jan 6 2020, 1:21 AM

python 2.7 update

alangenfeld added inline comments.Jan 6 2020, 4:21 PM
python_modules/dagster/dagster/core/definitions/partition.py
190

hmm, we should think some more about what exactly we want passed here

  • should we use a context arg as a layer of indirection to ease api changes over time
  • should we pass the def or the output of get_partitions on from the def
prha planned changes to this revision.Jan 6 2020, 5:47 PM
prha added inline comments.
python_modules/dagster/dagster/core/definitions/partition.py
190

I like the idea of passing context, but in this case, the context would be the DagsterGraphQLContext, not a pipeline execution context.... I have some concerns about this being confusing, but don't know what else to do.

changing the partition selector signature to take in a list of partitions makes sense to me, will change...

prha updated this revision to Diff 8376.Jan 6 2020, 9:54 PM

Added context interface, for user-defined scheduler functions.

Kept the selector interface the same, because it's nice to read the partition set name off of the definition.

alangenfeld added inline comments.Jan 6 2020, 10:00 PM
python_modules/dagster/dagster/core/definitions/partition.py
190

I like the idea of passing context, but in this case, the context would be the DagsterGraphQLContext, not a pipeline execution context.... I have some concerns about this being confusing, but don't know what else to do.

I think we would probably make a new context class - PartitionSelectionContext or something for this as opposed to re-use something ill fitting

prha added a comment.Jan 6 2020, 10:06 PM

should note, it's a breaking change that environment_dict_fn, tags_fn, and should_execute must take in the context as an arg...

environment_dict_fn was added in 0.6.4 (but not extensively documented)
tags_fn was added to master but not yet released
should_execute was added in 0.6.0, but the default case should be updated

prha updated this revision to Diff 8399.Jan 6 2020, 10:58 PM

rebase

sashank added a comment.EditedJan 9 2020, 8:23 PM

I think we would probably make a new context class - PartitionSelectionContext or something for this as opposed to re-use something ill fitting

What do you think about having two contexts: ScheduleExecutionContext to pass to the schedule methods, and PartitionSelectionContext for the partition selector? I'm wondering if it's a good idea to expose the concept or use the term DagsterInstance for these contexts. Also, we could later add things to each context individually as needed rather than having them be coupled.

Then, we could perhaps only thread through a RunStorage instead of the entire instance, since it seems like it's what's actually useful, and avoid exposing DagsterInstance here.

python_modules/dagster/dagster/core/definitions/decorators.py
942

Do we want to change these defaults to None as well?

prha added a comment.Jan 9 2020, 8:34 PM

Ideally, we could just pass the DagsterGraphQLContext around, which would satisfy a number of interface checks (PartitionSelectionContext, SchedulingContext, etc), but it turns out to be a little awkward in Python so I punted on that to be as generic as possible.

python_modules/dagster/dagster/core/definitions/decorators.py
942

yeah, will change these also.

What do you think about having two contexts: ScheduleExecutionContext to pass to the schedule methods, and PartitionSelectionContext for the partition selector?
Ideally, we could just pass the DagsterGraphQLContext around

I think i lean towards @sashank's point of view here - I think the DagsterGraphQLContext is likely to change a lot and its not currently exposed to users anywhere - I think the more specific context objects are worth it

python_modules/dagster-graphql/dagster_graphql/implementation/context.py
9

while this interface approach would work well in a typed language - in python people may poke at the other fields here and then experience an unintended breaking changes if we pass a different implementor of the interface

prha updated this revision to Diff 8531.Jan 9 2020, 9:34 PM

renamed, moved DagsterInstanceContext to ScheduleExecutionContext, made it schedule specific.

This makes more sense to me since even the partition selection is happening within a scheduled context

prha updated this revision to Diff 8537.Jan 9 2020, 10:00 PM

use distinctly constructed context object in the scheduling context

This revision is now accepted and ready to land.Jan 9 2020, 10:09 PM
prha updated this revision to Diff 8541.Jan 9 2020, 10:13 PM

rebase

sashank added inline comments.Jan 9 2020, 10:21 PM
python_modules/dagster/dagster/core/definitions/schedule.py
14–19

What are your thoughts on these contexts having access to the complete DagsterInstance, rather than just RunStorage?

alangenfeld added inline comments.Jan 9 2020, 10:23 PM
python_modules/dagster/dagster/core/definitions/schedule.py
14–19

we *try* to abstract away the existence of run_storage on the instance (see the indirection methods on the instance) when it comes to fetching runs so i think thats the right aspiration even though we dont do it as consistent as we should and we should operate against the instance here

prha updated this revision to Diff 8542.Jan 9 2020, 10:27 PM

resolve conflict with run storage refactor