Page MenuHomeElementl

fs_io_manager base dir + init_context.instance
ClosedPublic

Authored by yuhan on Feb 17 2021, 7:12 PM.

Details

Summary

the diff (worth mentioning on changelog)

  • fs_io_manager now defaults the base_dir to instance.storage_directory()
  • custom_path_fs_io_manager requires now requires base_dir which is base directory where all the step output will be stored in
  • renames InitResourceContext.instance_for_backwards_compat to InitResourceContext.instance

other changes:

  • instance now has a new method storage_directory() for the default storage directory
  • LocalArtifactStorage now has a new property storage_dir
Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 17 2021, 7:29 PM
Harbormaster failed remote builds in B25996: Diff 31736!
alangenfeld added inline comments.
python_modules/dagster/dagster/core/storage/fs_io_manager.py
82–86

could also fetch the base_dir at load/store time off the context if there isnt a "manual override"

yuhan requested review of this revision.Feb 22 2021, 7:19 PM
python_modules/dagster/dagster/core/storage/fs_io_manager.py
82–86

I'd push back on this and go with passing base_dir in at resource init time

because:

  1. we don't need to dynamically generate base_dir as intermediate storage does (run_id is an input), because get_run_scoped_output_identifier returns run_id,
  1. if we fetch the base_dir from instance at run time, the OutputContext will require to have the access to instance, like:
def _get_path(self, context):
    """Automatically construct filepath."""
    keys = context.get_run_scoped_output_identifier()
    base_dir = self.base_dir or context.step_context.instance.io_manager_directory

    return os.path.join(base_dir, *keys)

this will then require users to mock step_context.instance.io_manager_directory when unit testing OutputContext, which conflicts our thinking before:
https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/dagster/dagster/core/execution/context/system.py#L518:54

although, alternatively, we could expose io_manager_directory and attach it to OutputContext, so users only need to mock OutputContext(..., io_manager_directory=..). but I don't think the base dir is something generic enough that we want to put on OutputContext.

yuhan edited the test plan for this revision. (Show Details)

leaving open to get thoughts on removing default value on other io managers - I think defaulting to . is very sketchy but would entertain arguments otherwise

python_modules/dagster/dagster/core/storage/fs_io_manager.py
63

i think the init_context should just have the instance - im not sure why we have the _for_backwards_compat stuff

169–171

since this is marked as experimental should we force users to pass in a dir instead of defaulting to CWD? Same for versioned_filesystem_io_manager.

python_modules/dagster/dagster/core/storage/root.py
27–29

i think storage_dir since its not just for io_managers

init_context.instance_for_backwards_compat -> init_context.instance
versioned_filesystem_io_manager defaults base_dir to instance-level base storage dir
custom_path_fs_io_manager requires base_dir

io_manager_directory -> storage_directory

revert versioned_filesystem_io_manager changes

python_modules/dagster/dagster/core/storage/fs_io_manager.py
169–171

updated.

versioned_filesystem_io_manager is more like fs_io_manager that users would want out-of-box support from it. follow up here: D6636

yuhan retitled this revision from fs io_manager base dir to fs_io_manager base dir + init_context.instance.Feb 23 2021, 6:22 AM
python_modules/dagster/dagster/core/storage/fs_io_manager.py
169–171

i lied. after digging into the code paths, i dont think versioned_filesystem_io_manager defaulting to instance base dir (D6636) is a good idea.

the real diff is D6637

illallowit

python_modules/dagster/dagster/core/execution/context/init.py
60–61

move relative to comment ? or delete comment

This revision is now accepted and ready to land.Feb 24 2021, 6:16 PM