Page MenuHomeElementl

[crag] Add fields not brought about in default config, fix bug with execute_in_process and default config, catch bad config earlier on .
AbandonedPublic

Authored by cdecarolis on Jul 23 2021, 6:01 PM.

Details

Summary

Currently, we aren't carrying over fields from the original config if they aren't specified in the default config for a job. This leads to either weird errors, or wrong config in some cases. This attempts to fix the issue.

That fix also required me to finagle around the executor config that is now always carried over.

Test Plan

unit tests

Diff Detail

Repository
R1 dagster
Branch
graph_config_fix
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

python_modules/dagster/dagster/core/definitions/graph.py
768–774

should we just do a real proper validation pass over the config instead of only covering this small subset of error conditions?

python_modules/dagster/dagster/core/definitions/pipeline.py
569–597

this feels pretty scary to me - we should just make a special variant of the in process executor def that takes Permissive config and ignores it all

python_modules/dagster/dagster_tests/core_tests/graph_tests/test_graph.py
345–353

to my above point - what about "ops/solids: not_my_op"

that wouldn't get caught with current checks right? I think it make sense to use the core config validation machinery

the only open question is if we want to support "partial" default config - which i think is no

This revision now requires changes to proceed.Jul 23 2021, 6:36 PM
python_modules/dagster/dagster/core/definitions/pipeline.py
569–597

That's a really good point - in execute_in_process I don't think we'll ever end up using the execution config anyway.

python_modules/dagster/dagster/core/definitions/pipeline.py
569–597

cc @alangenfeld; what do we do about the top level config blobs that will likely be a part of any job, like

{execution: {multiprocess: {}}}

Even if we had an executor that takes permissive, the top level entry is different, and would still break

python_modules/dagster/dagster/core/definitions/pipeline.py
569–597

ah good point - hmmmm

This ConfigMapping stacking spooks me a bit which is why i was hopeful to find an alternative path. It might be the way we need to go but lets think through the other options:

  • target "execute_plan" directly which would bypass executor - loses pipeline start/fail/success events
  • have the "special" in process executor change make the whole execution section permissive at RunConfigSchema generation time - I think this is promising

if we stick with this config mapping approach ild want to see some more tests & comments

have the "special" in process executor change make the whole execution section permissive at RunConfigSchema generation time - I think this is promising

I know this isn't a huge regression, but it still feels like a regression to me. I feel like it's a pretty core change for what is fundementally a workaround.

target "execute_plan" directly which would bypass executor - loses pipeline start/fail/success events

I think this approach actually resonates with me more. I think it in some ways neatly resolves the conceptual friction of replacing the executor with another one. It feels nice to me to say that we're bypassing the executor altogether with execute_in_process rather than foisting a new one in. Maybe I'm underestimating how difficult it will be to use execute_plan, but that approach seems cleaner and actually brings execute_in_process closer to what we want.

@alangenfeld upon further examination, I don't think the execute_plan solution can work. Even if we keep around the original executors, we still need to create a new pipeline/mode in order to switch out the io manager. Then, we'll get a different set of complaints regarding the fact that we're using a non-persistent IO manager with an out of proc executor. I think we're forced to either do the config replacement or the permissive thing.