Page MenuHomePhabricator

Remove ability to load modules resolved by the working directory
ClosedPublic

Authored by prha on Jun 26 2020, 10:19 PM.

Details

Summary

By default, the current working directory is used to resolve modules.

This means that modules that are resolved when running dagit from one directory
might not resolve when resolved from another, even when using the same repo
args. This is a problem with the current implementation of the cron scheduler,
which will not have the working directory context to resolve modules.

This diff removes that ability to make things more consistent, and adds a more
descriptive error message.

Test Plan

BK

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
prha added a comment.Jul 2 2020, 1:53 AM

remove ability to load modules resolved by local directory

prha retitled this revision from RFC: add working dir to module code pointer to Remove ability to load modules resolved by the working directory.Jul 2 2020, 1:56 AM
prha edited the summary of this revision. (Show Details)
prha edited the test plan for this revision. (Show Details)
prha updated this revision to Diff 17886.Jul 2 2020, 4:35 AM

add mock to stub out sys.path replacements

prha updated this revision to Diff 17887.Jul 2 2020, 4:54 AM

check against ImportError instead, to support py27

schrockn requested changes to this revision.Jul 2 2020, 1:12 PM
schrockn added inline comments.
python_modules/dagster/dagster/utils/__init__.py
440–449

what would you think of just parameterizing replace_sys_path_cwd with remove_from_path rather than relying on mock.patch?

447–471

this is a whole thing. seems like some copious comments would be appropriate

python_modules/dagster/dagster_tests/cli_tests/workspace_tests/autodiscovery_tests/test_autodiscovery.py
92

eliminate

This revision now requires changes to proceed.Jul 2 2020, 1:12 PM
alangenfeld added inline comments.Jul 2 2020, 2:41 PM
docs/next/src/pages/docs/overview/repositories-workspaces/workspaces.mdx
97 ↗(On Diff #17852)

The other use case for this feature to keep in mind is not how the original file is loaded but rather the fact that relative imports occur within that file depend on the CWD. This use case makes me think we may need to support working_directory in the file case as well.

prha added inline comments.Jul 2 2020, 3:59 PM
docs/next/src/pages/docs/overview/repositories-workspaces/workspaces.mdx
97 ↗(On Diff #17852)

In the file code pointer case, the path gets resolved to an absolute path, so I'm not sure that we need to.

alangenfeld added inline comments.Jul 2 2020, 4:05 PM
docs/next/src/pages/docs/overview/repositories-workspaces/workspaces.mdx
97 ↗(On Diff #17852)

but if you load that absolute pathed file from not-in-its-expected-directory, the relative imports contained within will not work

prha updated this revision to Diff 17915.Jul 2 2020, 5:14 PM

clean up, simplify path manipulation, add more comments

prha added inline comments.Jul 2 2020, 5:40 PM
docs/next/src/pages/docs/overview/repositories-workspaces/workspaces.mdx
97 ↗(On Diff #17852)

Do relative imports work anyway?

I get ImportError: attempted relative import with no known parent package

python_modules/dagster/dagster/utils/__init__.py
440–449

may be missing your point, but I think using the mock is required

alangenfeld added inline comments.Jul 2 2020, 5:56 PM
docs/next/src/pages/docs/overview/repositories-workspaces/workspaces.mdx
97 ↗(On Diff #17852)

ah sorry i mispoke, they are not relative imports but they are CWD dependent

├── git-repo
│   ├── workspace.yaml
│   ├── repo.py
│   ├── pipelines.py

repo.py:

from pipelines import foo_pipeline

@repository
def repo():
  return [foo_pipeline]

workspace.yaml

load_from:
  python_file: repo.py

this works when you operate with git-repo as your cwd

This was the setup of someone who hit issues and posted in #general

prha added inline comments.Jul 2 2020, 6:03 PM
docs/next/src/pages/docs/overview/repositories-workspaces/workspaces.mdx
97 ↗(On Diff #17852)

I see... Then I think we would be able to cheat by always changing to the parent directory of the file when loading modules?

I'll add a test case...

alangenfeld added inline comments.Jul 2 2020, 6:33 PM
docs/next/src/pages/docs/overview/repositories-workspaces/workspaces.mdx
97 ↗(On Diff #17852)

cheat by always changing to the parent directory

Eeehhhhhhhh i dunno. Going to

load_from:
  python_module:
    module_name: repo
    working_directory: .

will actually fix it, so maybe we dont need to support it on file pointer. This just conflicts with the original comment of this thread.

prha added a comment.Jul 2 2020, 8:36 PM

On reflection, I think the only reason that person in #general hit that case (CWD-dependent imports) was because we were manually adding the working directory to the path in python_modules/dagster/dagster/cli/__init__.py.

This is a proposal to explicitly break that behavior. I'm not convinced that we actually need the file-based working directory.

prha updated this revision to Diff 17967.Jul 2 2020, 10:56 PM
  • add python package load-from arg
schrockn added inline comments.Jul 2 2020, 11:08 PM
python_modules/dagster/dagster/cli/workspace/config_schema.py
46–53

unfortunately I don't think package is the right name. a package is a collection of modules that may or may not be installed

python_modules/dagster/dagster/core/code_pointer.py
266

we should also change all these from fn_name since it may or may not be a function. we use attribute elsewhere

prha added inline comments.Jul 2 2020, 11:11 PM
python_modules/dagster/dagster/cli/workspace/config_schema.py
46–53

yeah, reading the docs, there's nothing that really distinguishes between installed vs uninstalled.

module, library, package, are all just different levels of hierarchy.

package has the strongest ties to installation, since colloquially you generally always install a package.

prha added inline comments.Jul 2 2020, 11:19 PM
python_modules/dagster/dagster/cli/workspace/config_schema.py
46–53
It’s important to note that the term “package” in this context is being used as a synonym for a distribution (i.e. a bundle of software to be installed), not to refer to the kind of package that you import in your Python source code (i.e. a container of modules). It is common in the Python community to refer to a distribution using the term “package”.

From: https://packaging.python.org/tutorials/installing-packages/

prha updated this revision to Diff 17970.Jul 2 2020, 11:37 PM
  • rename fn_name
prha added inline comments.Jul 2 2020, 11:53 PM
python_modules/dagster/dagster/cli/workspace/config_schema.py
46–53

If we're okay being verbose, what about installed_python_package...

do we have a test for python_package?

python_modules/dagster/dagster/cli/workspace/config_schema.py
53

I think the best argument for package actually is that PyPi "Python Package Index" which lends a lot of credence to that it should be named package. Ok I'm cool with this

python_modules/dagster/dagster/utils/__init__.py
440–458

I'd still prefer to see these function pointers passed (e.g. I think _add_to_sys_path_hook should be passed to this in prod and then the test layer should pass in different functions) in rather than relying on mock.patch but its fine.

465–468

is there a reason why we are being so permissive. seems like it would hide legit errors

prha updated this revision to Diff 17971.Jul 3 2020, 2:00 AM
  • switch tests to pass in overriding remove_path_fn instead of using mocks
prha updated this revision to Diff 17975.Jul 3 2020, 2:29 AM

update changes, docs

schrockn accepted this revision.EditedMon, Jul 6, 6:34 PM

finallydone

thank you for working through all of the iterations

accepted. but i am concerned about

keys = sorted(index_map.keys())
for key in keys:
    sys.path.insert(key, index_map[key])

would it not be more correct/easier to do

old_path = sys.path
# body of fn
sys.path = old_path

?

python_modules/dagster/dagster/cli/workspace/cli_target.py
45

comment would be good. it's a bit confusing at first blush

python_modules/dagster/dagster/core/code_pointer.py
291–293

may be worth introducing new cli arg in follow up

python_modules/dagster/dagster/utils/__init__.py
465

shouldn't we be restoring this in the exact same order that it was originally?

prha updated this revision to Diff 18065.Mon, Jul 6, 7:18 PM
  • add ability to specify working_directory for python_file targets
prha updated this revision to Diff 18068.Mon, Jul 6, 7:52 PM
rebase, comments
prha updated this revision to Diff 18096.Mon, Jul 6, 11:05 PM
  • change test_project workspace to use installed module
alangenfeld added inline comments.Tue, Jul 7, 3:25 PM
docs/next/src/pages/learn/guides/workspaces.mdx
1

this is just a temporary duping issue and were dropping this one at somepoint right?

67

an installed package

python_modules/dagster/dagster/cli/__init__.py
31–32

ya had no idea this was hiding here the whole time

python_modules/dagster/dagster/core/code_pointer.py
88–89

should we do the remove the working directory & warn in this case too?

python_modules/dagster/dagster_tests/cli_tests/workspace_tests/autodiscovery_tests/test_autodiscovery.py
162

tests for the file -> file + working dir setup ?

prha updated this revision to Diff 18166.Tue, Jul 7, 6:52 PM
  • add file in directory module resolution warning, tests
This revision is now accepted and ready to land.Tue, Jul 7, 6:53 PM
prha updated this revision to Diff 18198.Tue, Jul 7, 8:45 PM
  • handle importerror api differences between py27, py3+
prha updated this revision to Diff 18202.Tue, Jul 7, 8:54 PM
  • rebase
prha updated this revision to Diff 18229.Tue, Jul 7, 10:27 PM
  • restore sys.modules for packages
This revision was landed with ongoing or failed builds.Wed, Jul 8, 12:02 AM
This revision was automatically updated to reflect the committed changes.