Page MenuHomeElementl

[job] add executor_def
ClosedPublic

Authored by alangenfeld on Jun 29 2021, 9:09 PM.

Details

Summary

default to multiproc and fs_io_manager

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 29 2021, 9:53 PM
Harbormaster failed remote builds in B32852: Diff 40480!

One idea around executing jobs with the graph / in_process API is to have something like execute_in_process(**job.unpack_for_test()) / execute_graph(**job.unpack_for_test()) which effectively just strips away the executor. Could be useful for swapping out individual resources too.

python_modules/dagster-test/dagster_test_tests/test_graph_job_repos.py
8–17

I think this pattern is more representative of people doing job execution from python

If we can have @repository capture its own pointer then we could net out somewhere like this:

execute(
  graph_job_dev_repo.get_job_pointer(job_name)
  instance=DagsterInstance.local_temp() # or just .get() for real real execution
)
python_modules/dagster/dagster_tests/core_tests/graph_tests/test_graph.py
70–71

these are a bit awkward - but I don't think are representative of real user situations

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 29 2021, 10:46 PM
Harbormaster failed remote builds in B32859: Diff 40487!

use execute_in_process in op tests

python_modules/dagster-test/dagster_test_tests/test_graph_job_repos.py
8–16

We should maybe either make this public, or make public in some way how to use DagsterInstance.local_temp with an ephemeral dir

python_modules/dagster/dagster/core/definitions/graph.py
438–449

probably worth commenting this block

+1 to adding executor_def on jobs. The only part of this change that makes me nervous is switching the default to fs_io_manager. execute_in_process(**job.unpack_for_test()) feels gross to me. This could potentially give us the best of both worlds: https://dagster.phacility.com/D8695?

The only part of this change that makes me nervous is switching the default to fs_io_manager. ...This could potentially give us the best of both worlds: https://dagster.phacility.com/D8695?

To clarify, you mean just fs_io_manager and not also the defaulting of multiprocess execution? Can you expand on what worries you? Is it just the use of the filesystem when execute_in_process is used?

execute_in_process(**job.unpack_for_test()) feels gross to me.

Yea it was a ....unique idea. I think your suggestion of my_job.execute_in_process() and my_graph.execute_in_process(resources={...}) is much cleaner

To clarify, you mean just fs_io_manager and not also the defaulting of multiprocess execution?

Right. I think most users will be happy with multiprocess when executing from Dagit and CLI.

Can you expand on what worries you? Is it just the use of the filesystem when execute_in_process is used?

Yeah, exactly.

just the use of the filesystem when execute_in_process is used? Yeah, exactly.

Ok cool - let me see if there is a more isolated way to address that in the impl of execute_in_process in case we feel D8695 is coming in too hot

I chatted with Alex offline and he convinced me on the approach here. I'll leave it to @cdecarolis for final review, but consider my concerns addressed.

LGTM outside of nits + comments

python_modules/dagster/dagster/core/definitions/graph.py
722–724

I'm assuming this will get replaced once D8695 lands

python_modules/dagster/dagster/core/definitions/pipeline.py
525–529

It can't right now bc pipeline, but will it be possible at some point for a job to have input values? Or is it expected that those will be resolved via config when expanding a graph to a job?

540–547

Maybe explicitly comment that we are discarding provided executor / io manager here

1042–1055

I would comment block this

1052

This does make me feel a bit uneasy, but I guess people will only run into it if they haven't customized their IO manager yet so should probably be ok. Gonna be a tricky documentation maneuver to not over-explain

python_modules/dagster/dagster/core/execution/execute.py
153

nit: maybe you want type(node) here?

python_modules/dagster/dagster_tests/core_tests/graph_tests/test_graph.py
70–71

this is quite clean imo.

This revision is now accepted and ready to land.Wed, Jul 7, 4:06 PM
alangenfeld added inline comments.
python_modules/dagster/dagster/core/definitions/pipeline.py
525–529

ya if we added it - it would be in the job creation not here

This revision was automatically updated to reflect the committed changes.