Page MenuHomeElementl

RFC object_manager_from_intermediate_storage
ClosedPublic

Authored by yuhan on Dec 16 2020, 10:21 PM.

Details

Summary

RFC: I added instance as an additional attribute to InitResourceContext
because in order to convert intermediate_storage_def to intermediate_storage (which is what IntermediateStorageAdapter takes),
the system needs InitIntermediateStorageContext, and InitIntermediateStorageContext takes many required args which are not currently
available in InitResourceContext.
Adding instance to InitResourceContext allows us to get all the info InitIntermediateStorageContext needs, see [1]

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

yuhan added inline comments.
python_modules/dagster/dagster/core/storage/system_storage.py
139–145

[1] init_context.instance isn't available currently. adding it would unblock getting the following info

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 16 2020, 10:39 PM
Harbormaster failed remote builds in B23058: Diff 28034!

run_id -> pipeline_run
instance -> instance_for_backwards_compat

yuhan requested review of this revision.Dec 17 2020, 1:14 AM
python_modules/dagster/dagster/core/execution/resolve_versions.py
174–181

@cdecarolis does intermediate storage work with memoization and versioning?
don't think it would work with the intermediate storage adapter now, because the object_manager created from intermediate storage won't have the has_asset method. what do you think?

python_modules/dagster/dagster/core/execution/resolve_versions.py
174–181

I removed memoization support for intermediate storage, it only works with asset store now, so don't worry about preserving compat there.

This all looks good to me, but seems like a pretty impactful change, so will leave it to those who understand the ramifications better.

python_modules/dagster/dagster/core/execution/context/init.py
43

Is this intentionally not-documented because its intended for internal usage?

This revision is now accepted and ready to land.Dec 21 2020, 9:35 PM
python_modules/dagster/dagster/core/execution/context/init.py
43

yep thats also the reason why we named it this bad

yuhan requested review of this revision.Dec 22 2020, 1:40 AM

cc @schrockn regarding D5709 which dropped pipeline_def from resource init context

i just realized that intermediate storage context here needed pipeline_def... i thought i could get it from pipeline_run..

What I could do here, given we've removed pipeline_def from InitResourceContext, are:
Option #1

  • drop pipeline_def from InitIntermediateStorageContext
  • unfortunately, mode_def and type_storage_plugin_registry are both depending on pipeline_def. while dropping mode_def might be fine, type_storage_plugin_registry is kinda critical to intermediate storage (it needs the pipeline_def.all_dagster_types(), so i guess I would need to do a db call from instance to fetch the pipeline snapshot then get the mode_def and all_dagster_types there, i.e. instance.get_pipeline_snapshot(pipeline_run.pipeline_snapshot_id).. cc @sandyryza i guess a db call shouldn't be too bad because it is already a rare case that a user would want to migrate a custom intermediate storage to object manager (i.e. a user calls this method)?

Option #2

  • add pipeline_def back to InitResourceContext as a bad name as pipeline_def_for_backwards_compat which will be optional and only used by this intermediate storage creation path.

Thoughts?

I can either revert it or we can add the backwards compat variant. Your call!

pipeline_def_for_backwards_compat then. it will be used in this code path only.

pipeline_def -> pipeline_def_for_backwards_compat

This revision is now accepted and ready to land.Dec 22 2020, 12:48 PM
This revision was automatically updated to reflect the committed changes.