Page MenuHomeElementl

[RFC] Directly invoke solids
ClosedPublic

Authored by cdecarolis on Apr 20 2021, 4:05 PM.

Details

Summary

Provide a way for users to directly invoke solids. This will have applications in unit testing and learning Dagster.

An open question after this diff is whether to type check inputs and outputs when invoking solids directly. Since type checking can require resources, providing those may be tricky given the user-facing APIs that we are providing.

Test Plan

Unit tests, migrating internal repo tests to this to see how it shakes out, and update tutorials to see how we can message this new API.

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
56–58

[x]

assert solid_with_inputs(None, 5, 6) == 11
assert solid_with_inputs(None, x=5, y=6) == 11
assert solid_with_inputs(None, 5, y=6) == 11

to align with function sig above - maybe even context= doesn't work

110

need test for iterator solid with multiple outputs. Ideally also tests for async too

python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
60–64

Yea that makes sense. Will support that behavior instead.

110

Will add

python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
60–64

That said though, wouldn't it mimic function behavior more exactly if you couldn't mix up params? Similar to how context= not working, I think it's a bit weird that you could provide inputs out-of-position, given they're not defined to be kwargs.

python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
60–64

given they're not defined to be kwargs

not sure what you mean by this - the function has named input parameters x and y and when specifying by name the order does not matter

>>> def test(x, y):
...     print(x, y)
...
>>> test(1, 2)
1 2
>>> test(y=1, x=2)
2 1
python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
60–64

to be honest I totally thought that wasn't a thing. Will fix

Macro psyduck:

Fix kwargs logic, tests for asyncio and multiple outputs

Respin with DirectSolidExecutionContext

Use more descriptive disable expressions

After discussion with @alangenfeld and @sandyryza , gonna work on this in parallel with another approachability change: making solids not necessarily require context arg

Respun using a fxn/context manager to build out a solid execution context, and on top of optional-context changes

Hey @cdecarolis - is this ready for another round of review? what are the remaining open questions?

Update on top of resource object changes

This is now ready for another round of review

python_modules/dagster/dagster/core/execution/context/invocation.py
172

Some naming nitpicks:

  • Can we get away with omitting "execution"? Would be good for this to be a short as possible.
  • For functions that are doing initialization, I think "make", "create", or "build" are better than "get".
  • Thoughts on solid_config -> config?
python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
2

It would be helpful to add a test for what happens when a solid yields a non-Output event like an AssetMaterialization, Failure, or Retry.

The behavior might already be defined in this diff, but, if not, some suggestions:

  • Failure raises an error.
  • Retry raises an error.
  • AssetMaterialization - not entirely sure. My best guess is that we should move to a world where AssetMaterializations get placed on the context and the direct invoker can access them from there. And we change the recommended API from yield AssetMaterialization to context.asset_materialization. That could be a helpful change independently of this diff, because it would mean that users who want to yield AssetMaterializations could still use return and still use mypy annotations on their return types.

cc @alangenfeld

228

IMO it would be better for this to return some tuple-esque object that supports unpacking. That would make it more closely mirror how we expect users to deal with multiple outputs in @pipeline.

Longer term, I think it would be helpful to support some kind of way to return multiple outputs without yield. That would enable better Python type-checking. MultiOutput (or whatever we choose to call it) would be a tuple-esque object:

@solid(
    output_defs=[OutputDefinition("a"), OutputDefinition("b")
)
def my_solid() -> MultiOutput[str, int]:
    return MultiOutput("apple", 7)

++ sandy's feedback

  • more tests
  • naming
  • outputs
python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
2

ya for asset materializations we could potentially use the "capture" pattern we use on the step context for capturing outputs and exceptions to capture asset materializations and then having something like context.get_asset_materializtions() on the direct solid context

since this is experimental and an edge case im not too worried if we change the api over this as we iterate

This revision now requires changes to proceed.May 10 2021, 4:59 PM
python_modules/dagster/dagster/core/execution/context/invocation.py
172
  • I think omitting execution is fine. Will do
  • I personally prefer build - emphasizes that there is some sort of process that takes place to get the thing (at least in my head)
  • I personally think it might confuse people a tad, since the context is constructed independently of the solid definition, but I'm not super opposed to changing it.

Fix multi-output, add tests for retry, failure, add asset materialization recording, be more strict about when context is provided.

This looks good to me! Had a few small final comments.

@alangenfeld @schrockn what do you think?

python_modules/dagster/dagster/core/execution/context/invocation.py
66

Would it make sense to just call this _materializations? "recorder" makes me think it's some kind of special object with a record method.

85

This could use a docstring.

172

The word "solid" will still be present where it's constructed, right? Because build_solid_context? Or are you thinking or something else?

would be good to have the multi output test have direct invoke right next to execute_pipeline / execute_solid to demonstrate their alignment (or lack thereof)

also test yielding the outputs in order different than their entries in output_defs

python_modules/dagster/dagster/core/definitions/solid_invocation.py
35

retry not requested

this phrasing reads odd to me - it was requested but that request was denied

python_modules/dagster/dagster/core/execution/context/invocation.py
66

or _capture- agree recorder is odd

python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
247–250

I think this solid would fail in a pipeline/ execute_solid since the output defs are not declared so these are unknown/unexpected outputs

we use the order of the output_defs to manage the order of the items in the tuple for the composition case

I *think* we should enforce the same rules in both places

python_modules/dagster/dagster/core/definitions/solid_invocation.py
35

That makes more sense. I guess I was thinking of it as whether we're bubbling it up further, but I think what you're saying would be clearer to the user as to what's happening

fixing outputs

Add more tests for multi-output case, change arg names

python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
279–283

[1]

298–301

This error message, as well as [1] and [2], do actually send out errors that more closely mirror the execute_solid behavior, but they are hidden behind the SolidInvocationError. My thinking is that this is not a huge deal since the original error message is still there, but not sure how to represent that in test. thoughts @alangenfeld ?

317–321

[2]

python_modules/dagster/dagster/core/definitions/solid_invocation.py
27–32

[here]

56–152

these should probably not be check 's since they are errors from user code / input

148–155

we should be more precise with this if/else block so it fails more clearly on unexpected stuff

python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
298–301

the user code boundary at [here] is wrapping a bunch of dagster code - we should more tightly put only progressing the user code iterator in that block and then do all of our processing out side of that -

that would net us not-wrapped errors here

python_modules/dagster/dagster/core/definitions/solid_invocation.py
148–155

maybe check is fine - but they tend to be a worse user facing error  experience

maybe add a DagsterInvalidInvocationError for when the wrong info is passed?

req-ing mostly for the tighter user code boundary since i think thats valuable to get right - the other error changes are your discretion

python_modules/dagster/dagster_tests/core_tests/test_solid_invocation.py
135–147

these feel like they should be invalid definition errors instead of check errors

160–189

maybe check fine but dagster invariant is more consistent with other error experiences

This revision now requires changes to proceed.May 13 2021, 3:00 PM

Wrap iterator in tighter error bound, fix some messages

alrightalrightalright

i think this is a good place to start

python_modules/dagster/dagster/core/definitions/composition.py
73–75

mypy

python_modules/dagster/dagster/core/definitions/solid.py
117

should we fire an experimental warning for this?

python_modules/dagster/dagster/core/definitions/solid_invocation.py
29–32

*

156

raise from

or honestly just let it raise on through? not sure if thats better/worse

This revision is now accepted and ready to land.May 13 2021, 7:53 PM