Page MenuHomePhabricator

Modernize dagstermill file handling
AbandonedPublic

Authored by max on Wed, Jan 6, 12:06 AM.

Details

Summary

This drops the bespoke file handling that Dagstermill previously used
for output notebooks and for passing outputs back to solid code from the
Jupyter process, instead using the file_manager and object_manager facilities.
Documentation changes will follow after review of this basic approach.

Test Plan

Unit

Diff Detail

Repository
R1 dagster
Branch
dagstermill-file-handling
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Jan 6, 9:15 PM
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Jan 6, 10:31 PM
Harbormaster failed remote builds in B23781: Diff 28921!
python_modules/dagster/dagster/core/definitions/events.py
810 ↗(On Diff #28921)

FYI this has been fixed in D5863 - can grab it from rebase master

Harbormaster returned this revision to the author for changes because remote builds failed.Fri, Jan 8, 9:08 PM
Harbormaster failed remote builds in B23953: Diff 29117!
max requested review of this revision.Fri, Jan 8, 9:29 PM
examples/airline_demo/airline_demo/solids.py
542–544

nit: might be good to capture this input def pattern with a function that just takes the input name

python_modules/dagster/dagster/core/execution/plan/execute_step.py
236–238

clean dis

python_modules/libraries/dagstermill/dagstermill/examples/repository.py
202–214

lol

python_modules/libraries/dagstermill/dagstermill/manager.py
192–194

add a test with fan-in inputs, you may need to unpack a list of these i think

320–321

this is in the context of generating a set of dynamic outputs which would be happening in one notebook, so i don't think collision is an issue. Will need to figure out how to support dynamic outputs from a notebook with this yield_result API or another at somepoint

python_modules/libraries/dagstermill/dagstermill/solids.py
233–334

this is real gnarly, is it feasible to decompose this at all? Some narrative about the different steps happening would be good too. It was particularly confusing to see these core abstractions re-used here without context since at first glance it seems like it might be a custom alternate path to the core execution flow as opposed to just re-using abstractions in a slightly different context

This revision now requires changes to proceed.Fri, Jan 8, 10:34 PM
python_modules/libraries/dagstermill/dagstermill/manager.py
314

Nice

max marked an inline comment as done.

rebase & nits

max planned changes to this revision.Mon, Jan 11, 5:55 PM
max planned changes to this revision.Mon, Jan 11, 6:03 PM
max planned changes to this revision.Mon, Jan 11, 10:00 PM
max planned changes to this revision.Mon, Jan 11, 10:09 PM
max planned changes to this revision.Mon, Jan 11, 10:13 PM
max planned changes to this revision.Mon, Jan 11, 10:23 PM
max planned changes to this revision.Mon, Jan 11, 10:26 PM
max planned changes to this revision.Mon, Jan 11, 10:53 PM
max planned changes to this revision.Mon, Jan 11, 11:14 PM

@alangenfeld I've reorganized and broken up the solid compute function logic to be more readable, and added support for fan-in inputs

some complex stuff but i think the comments and cleanup make it parseable at least, thanks for that.

python_modules/dagster/dagster/core/execution/plan/execute_step.py
260–380

coordinate with @yuhan on these changes since she has conflicting diff

python_modules/libraries/dagstermill/dagstermill/solids.py
229

whats this about?

This revision is now accepted and ready to land.Tue, Jan 12, 9:31 PM
python_modules/libraries/dagstermill/dagstermill/solids.py
229

sigh, yes. i don't even know what to put in a comment because this is so weird but.

>>> def f():
...     return
... 
>>> def g():
...     yield from f()
... 
>>> [x for x in g()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
  File "<stdin>", line 2, in g
TypeError: 'NoneType' object is not iterable

but

>>> def f():
...     return
...     yield
... 
>>> def g():
...     yield from f()
... 
>>> [x for x in g()]
[]
python_modules/libraries/dagstermill/dagstermill/solids.py
229

but this function has other yield which i assume would make it work, but you found it to be otherwise?

python_modules/libraries/dagstermill/dagstermill/examples/repository.py
214

haha there has to be a term of art for a FIXME to TODO conversion

python_modules/libraries/dagstermill/dagstermill/solids.py
229

i believe you should be able to emit the return entirely.

The presence of a yield statement makes the function an iterator, even if the yield is never executed.

def f():
     return
     yield

returns an empty iterator

whereas

def g():
   return

Is just a normal function that returns None

feels like we have broad agreement on this approach, going to abandon this diff and open a stack