Page MenuHomePhabricator

[2/4] Move ExecutionPlan.artifact_persisted to not require loading user code

Authored by dgibson on Jan 25 2021, 3:38 PM.



Built on

As part of the work to make ExecutionPlan constructable from a snapshot (without access to the PipelineDefinition), persist the information on the outputs that require user code during the ExecutionPlan creation, so we don't need to have the PipelineDefinition available in the run worker.

Test Plan


Diff Detail

R1 dagster
Lint OK
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 25 2021, 3:57 PM
Harbormaster failed remote builds in B24803: Diff 30203!
dgibson retitled this revision from Move ExecutionPlan.artifact_persisted to not require loading user code to [2/4] Move ExecutionPlan.artifact_persisted to not require loading user code.Jan 25 2021, 6:39 PM

this looks good to me - the storage_is_persistent is more of a back-compat property for the deprecated intermediate storage.
can we test if the re-execution on dagit works properly? the re-execution flow there depends on execution_plan.artifacts_persisted


@yuhan re-execution in dagit appears to work. I believe that API also has BK test coverage (not 100% certain)

This revision is now accepted and ready to land.Jan 26 2021, 7:41 PM

thanks yuhan - @alangenfeld had some reasonable concerns about jamming more stuff in the snapshots for this purpose so wanted to get his eyes on this too

alangenfeld added a subscriber: sandyryza.

+ @sandyryza for IOManager thoughts


is there a reason we don't model this on IOManager directly, as an overridable property?


I'm fine with this bool in a world where we don't persist StepOutput, but feel much less good about it as a persisted thing. Feels like we have an immature modeling of the situation currently.

Since I've joined we have tried a few different times to capture these conditions and error out ASAP are :

  • will this output be accessible from another process (outputs_persisted ?)
  • will this output be accessible from another machine (outputs_persisted_remotely ?)

Not catching these early can lead to pretty confusing errors. Some of the checks around this are still around (leading to this diff) but we have had to remove others for reasons such as "filesystem actually means remote accessible for me because I use a network mounted file system". I think we can do better with IOManager since it's much more reasonable to for example make a nfs_io_manager by inheriting the filesystem one in userland and just overriding an hypothetical "outputs_persisted_remotely" property.

I don't think it's a hard blocker to get this right before we lock something in to persistence that if we need it to move forward on host-mode run worker, but I think we should take some time to think about it and try.


honestly this should be addressed as well before we persist StepOutput before we go down that road - does Optional[bool] really make sense? seems like we should just coerce the bool since i am guessing thats what we do at the callsite

grepping it looks like we currently only use this at resource init time - does output materialization actually work still or do we have multiple sources of truth?


are we handling the None case correctly? given the previous logic it feels like we bias towards assuming output will be persistent


just to throw an idea out

io_manager -> None if no io manager, otherwise

  outputs_persisted: bool,
  # room for outputs_persisted_remotely or other things we may want to capture and know in host land

could also consider an io_manager_key here and storing these IOManagerProperties/ IOManagerSnapshot else where in the structure


Here was my reasoning:

Now that we have IOManagerDefinition, it _would_ be possible to add this. IMO, it's still a weird thing to add for the reasons discussed in the first link, but I don't feel super strongly.

recording thoughts on this but putting it in the queue for now until host-mode execution is pressing


this makes sense - especially given how we used to model it. I wonder if there is a way to model it that feels better.

  • model the inverse: does_not_persist_values
  • more specific to what we want to interpret supports_remote / supports_out_of_process that we would be fine defaulting to true

other idea

we can add other enum entries on stuff we want catch and just bucket all non matches to USER_PROVIDED. None for legacy doesn't have one stuff

This revision now requires changes to proceed.Feb 1 2021, 6:24 PM