RFC: I added instance as an additional attribute to InitResourceContext
because in order to convert intermediate_storage_def to intermediate_storage (which is what IntermediateStorageAdapter takes),
the system needs InitIntermediateStorageContext, and InitIntermediateStorageContext takes many required args which are not currently
available in InitResourceContext.
Adding instance to InitResourceContext allows us to get all the info InitIntermediateStorageContext needs, see [1]
Details
unit
Diff Detail
- Repository
- R1 dagster
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
python_modules/dagster/dagster/core/storage/system_storage.py | ||
---|---|---|
139–145 | [1] init_context.instance isn't available currently. adding it would unblock getting the following info |
python_modules/dagster/dagster/core/execution/resolve_versions.py | ||
---|---|---|
174–181 | @cdecarolis does intermediate storage work with memoization and versioning? |
python_modules/dagster/dagster/core/execution/resolve_versions.py | ||
---|---|---|
174–181 | I removed memoization support for intermediate storage, it only works with asset store now, so don't worry about preserving compat there. |
This all looks good to me, but seems like a pretty impactful change, so will leave it to those who understand the ramifications better.
python_modules/dagster/dagster/core/execution/context/init.py | ||
---|---|---|
43 | Is this intentionally not-documented because its intended for internal usage? |
python_modules/dagster/dagster/core/execution/context/init.py | ||
---|---|---|
43 | yep thats also the reason why we named it this bad |
cc @schrockn regarding D5709 which dropped pipeline_def from resource init context
i just realized that intermediate storage context here needed pipeline_def... i thought i could get it from pipeline_run..
What I could do here, given we've removed pipeline_def from InitResourceContext, are:
Option #1
- drop pipeline_def from InitIntermediateStorageContext
- unfortunately, mode_def and type_storage_plugin_registry are both depending on pipeline_def. while dropping mode_def might be fine, type_storage_plugin_registry is kinda critical to intermediate storage (it needs the pipeline_def.all_dagster_types(), so i guess I would need to do a db call from instance to fetch the pipeline snapshot then get the mode_def and all_dagster_types there, i.e. instance.get_pipeline_snapshot(pipeline_run.pipeline_snapshot_id).. cc @sandyryza i guess a db call shouldn't be too bad because it is already a rare case that a user would want to migrate a custom intermediate storage to object manager (i.e. a user calls this method)?
Option #2
- add pipeline_def back to InitResourceContext as a bad name as pipeline_def_for_backwards_compat which will be optional and only used by this intermediate storage creation path.
Thoughts?