Page MenuHomePhabricator

change scoped pipeline context to event generator

Authored by prha on Feb 18 2020, 11:00 PM.



We want to allow pipeline initialization to start emitting events (e.g. resource init
events, system storage init events etc). Currently pipeline initialization happens in
scoped_pipeline_context, a context manager that yields the pipeline context. This diff
replaces that pipeline context manager with one that returns a "generator" object, which contains
a method to generate initialization events, and a method to retrieve the constructed pipeline

Suggestions welcome on how to better structure this, better name this, etc

Test Plan

Ran through BK test suite, also, manually ran resource-ful pipeline with long init times,
saw events emitted appropriately via execute_pipeline calls

Diff Detail

R1 dagster
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

prha created this revision.Feb 18 2020, 11:00 PM
schrockn requested changes to this revision.Feb 19 2020, 5:57 PM

This seems like a good direction. Great job wading through this. Thorny part of the codebase for sure.

One thought is that the _has_generated is a bit of a red flag. I wonder if we can have a more functional approach where you yield a stream of events and the yield a pipeline context at the end. This would require isinstance of checks at callsites but could feel a bit better.




try finally probably appropriate here,


try finally


I don't think we should really provide defaults here. This is meant to be used internally where defaults have been set by some caller. Concretely I think that the "default" value for the callback being resource_initialization_context_manager should only be encoded once in the codebase if possible

This revision now requires changes to proceed.Feb 19 2020, 5:57 PM
prha added a comment.Feb 19 2020, 6:10 PM

re: _has_generated:

I initially had it so that the resource/builder/context were yielded along with the event stream, but switched it (per @alangenfeld's suggestion) because it makes it way more explicit in the caller what is going on. I do like this better and we're not mixing types so much. It was getting confusing where we're stacking things at different levels and the type I'm trying to extract is different at each level.

prha updated this revision to Diff 9774.Feb 19 2020, 6:18 PM

add try/finally, remove double defaults

schrockn added inline comments.Feb 19 2020, 6:48 PM

we mix and match this all the time, but this seems more like a check (programming error from our end) rather than invariant violation (user-facing error)

prha updated this revision to Diff 9776.Feb 19 2020, 6:49 PM

fix scoped_pipeline_context reference

prha updated this revision to Diff 9793.Feb 20 2020, 12:06 AM
  • pipeline initialization events, but with generators INSTEAD of context managers
prha added a comment.Feb 20 2020, 12:12 AM

couple things.... still trying to figure out the exception handling. I think we weren't tearing down resources properly before, so figuring out what tests we need to write, change, etc.

I wrote a decorator to make the generator result wrapping a little prettier. It was really nice, but pylint made it very annoying to use since it does not know how to handle decorators.

schrockn added inline comments.Feb 20 2020, 12:17 AM

this is very clear. 👍🏻


can just be list(initialization_manager.generate_setup_events()) i think


maybe CreatedResourceEvent or something

certainly comment-worthy

I'm quite happy with this direction. curious what you think @alangenfeld

prha updated this revision to Diff 9802.Feb 20 2020, 2:00 AM
prha marked an inline comment as done.

add tests, make sure resource stack tears down on failures

schrockn resigned from this revision.Feb 20 2020, 5:58 PM

q mgmt.

I'm going to defer to @alangenfeld for final approval. If there are substantive changes please readd me as I'm very interested in this. This is great stuff.

nice this is pretty slick - i think we should just go a little extra hard on the tests while we have this complex bullshit paged in to our head


comments and maybe move to utils or something


does this need to be in finally?


I think we want to test for ordering too


add comments here

alangenfeld added inline comments.Feb 20 2020, 6:21 PM

we should have a battery of tests against this directly - for example

  • we should check.failed if you never yield the expected object type
  • make sure we handle all the exception conditions
  • your wildest dreams

check these inputs

alangenfeld requested changes to this revision.Feb 20 2020, 8:23 PM

your queue for tests

This revision now requires changes to proceed.Feb 20 2020, 8:23 PM
prha planned changes to this revision.Feb 20 2020, 8:31 PM
prha updated this revision to Diff 9844.Feb 20 2020, 10:46 PM
  • tests and refactor
prha updated this revision to Diff 9850.Feb 20 2020, 10:54 PM
  • add check casts to get_object calls
alangenfeld added inline comments.Feb 21 2020, 8:41 PM

ho boy this is really something

these generator things should get moved to their own files to make this insane complexity feel a little less daunting

alangenfeld accepted this revision.Feb 21 2020, 8:41 PM

at your discretion - moving to separate files may prove unreasonable

This revision is now accepted and ready to land.Feb 21 2020, 8:41 PM
prha updated this revision to Diff 9865.Feb 21 2020, 9:08 PM
  • add exception-handling tests, add comments, extract resource init
This revision was automatically updated to reflect the committed changes.