Page MenuHomePhabricator

don't error on solid type annotations that don't resolve to dagster types
Needs ReviewPublic

Authored by sandyryza on Nov 12 2020, 4:22 PM.

Details

Summary

This makes it easier to adopt Dagster incrementally. Users who use type annotations, but who don't yet want to learn Dagster types, can annotate their decorated function parameters and return values with any Python type without triggering errors.

The tests show examples of what this looks like.

Test Plan

Added tests

Diff Detail

Repository
R1 dagster
Branch
auto-type (branched from master)
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Nov 12 2020, 4:38 PM
Harbormaster failed remote builds in B21053: Diff 25536!
Harbormaster returned this revision to the author for changes because remote builds failed.Jan 16 2021, 1:18 AM
Harbormaster failed remote builds in B24512: Diff 29850!
sandyryza retitled this revision from optional mypy type to don't error on solid type annotations that don't resolve to dagster types.Jan 16 2021, 1:24 AM
sandyryza edited the summary of this revision. (Show Details)
sandyryza edited the test plan for this revision. (Show Details)
Harbormaster returned this revision to the author for changes because remote builds failed.Jan 16 2021, 1:46 AM
Harbormaster failed remote builds in B24513: Diff 29851!
Harbormaster failed remote builds in B24514: Diff 29852!

first reactions:

  • concerns around hiding errors / confusion for intermediate user getting Any when trying to actually use types
  • throwing away information by dropping all the way to Any

one idea is to have a Any variant wrapper type that captures the class name. The wrapper type could have a clear description about what it is when surfaced in dagit , something like

  • Unregistered[MyClass]
  • Undefined[MyClass]
  • Anyish[MyClass]
  • Unknown[MyClass]
  • Unchecked[MyClass]

could even go as far as to register an implicit type that will do an instanceof check at runtime

  • Implicit[MyClass]
  • Inferred[MyClass]
  • InstanceOf[MyClass]
  • Anonymous[MyClass]

The thinking is to address those concerns by:

  • making use of the information provided, even if its just getting the string MyClass somewhere
  • making the condition where a mypy typehint was consumed but no "proper' DagsterType was found unique so its clear thats what happened when observed by users

@schrockn do you have thoughts here? I think I would be OK with any of the options raised by @alangenfeld. I vaguely recall that you had some concerns about getting fancy with auto-creation/parameterization of types.

Yeah my first instinct is stress and terror given our past forays into magically commingling the python and dagster type system.

The need for this makes sense and I do not dismiss, but this adds yet another way that a declaration of @solid.

Namely a declaration of

def solid2(_, _input1: MyClass):

could correspond to the following things:

  1. A dagster type MyClass AND a python of MyClass if MyClass is marked "usable" as a DagsterType and mypy or another typechecker is used.
  2. A dagster type MyClass if MyClass is a created dagster type and mypy compat is not desired.
  3. A different dagster type if declaration within InputDefinition and then a python type of MyClass for typechecking.
  4. A dagster type Any and a mypy type MyClass if is not marked as "usable" and no InputDefinition is declared.

This changes expands the number of interpretations of this definition to include the 4th. 1-3 already exist.

Regarding alex's suggestion, it's not clear that defaulting to Any would truly be the expected behavior. I would expect instanceof checks.

We also hold the invariant true that each type name in a given type namespace points to a unique type object. This prevents the user from accidentally declaring two types of the same name that have different behaviors and overwriting each other. In previous iterations of autoregistration schemes we relied on a global registry that mapped names to their automatically created DagsterType classes. This opens up more magic/confusion and potentially subtle bugs when considering things like code inclusion order. We could special case this wrapper object to act like List or Optional which are ephemeral types and don't have that object uniqueness invariant.

If I had to go back in time, I would probably focus on making InputDefinition and OutputDefinition declarations less verbose and cleanly separate dagster types and python types and not even mess with putting dagster types in python function signatures. However I have more tolerance for boilerplate than most python devs because I have less tolerance for this type of magic. I prefer verbose and explicit to minimal and implicit/magical all things being equal, but I am in the minority here.

That all being said, this is a real user pain point and I'm open to changes in this domain. I just wanted to write out my concerns.

@schrockn those concerns make a lot of sense.

If I had to go back in time, I would probably focus on making InputDefinition and OutputDefinition declarations less verbose and cleanly separate dagster types and python types and not even mess with putting dagster types in python function signatures.

Is it definitely too late to do this? At the very least, we could phase out the "A dagster type MyClass if MyClass is a created dagster type and mypy compat is not desired." option?

I'm concerned about too many breaking changes and too much thrash. But I think that the critical thing here would be coming up with a more condense syntax for inputs and outputs. Right now if we forced users to InputDefinition and OuptutDefinition for the dagster-type-only case the code explosion would be brutal.

Another approach I've considered is something like this:

@solid_sig
def solid_name_decl(input_one: DagsterTypeOne, input_two: DagsterTypeTwo) -> DagsterTypeThree:
   pass

@solid_body(solid_sig) # "signature" object contains all solid meta information
def solid_name(_context: SolidExecutionContext, input_one: PythonTypeOne, input_two: PythonTypeTwo) -> PythonTypeThree:
   # do the actually thing
   return PythonTypeThree()

or in the case of a generator

@solid_body(solid_sig)
def solid_name(_context: SolidExecutionContext, input_one: PythonTypeOne, input_two: PythonTypeTwo) -> Generator[DagsterEvent]:
   # do the actually thing
   yield Output(PythonTypeThree())

The Signature/Header object could be constructed in other manner as well

One could also go for a dictionary based approach

@solid(inputs={'input_one': DagsterTypeOne, 'input_two': DagsterTypeTwo}, output=DagsterTypeThree)
def solid_name(input_one: PythonTypeOne, input_two: PythonTypeTwo) -> PythonTypeThree:
   return PythonTypeThree()

`

and then you could have an additional class when you want to do fancier things with inputs, such as defaults, descriptions, manager keys etc

@solid(
   inputs={
      'input_one': SolidInput(DagsterTypeOne, description='foo'), 
      'input_two': SolidInput(DagsterTypeTwo, description='bar')
   },
   output=SolidOutput(DagsterTypeThree, description='sdfsdf', io_manager_key='whatever')
)
def solid_name(input_one: PythonTypeOne, input_two: PythonTypeTwo) -> PythonTypeThree:
   return PythonTypeThree()

`

We could also get clever and try to reuse input_defs and output_defs supporting a few different variants:

input_defs={'input_one' : Type}
input_defs={'input_one': InputDef(Type)}
input_defs=[InputDefintion('input_one', Type)]

I like the variants you listed that involve overloading the input_defs arg on solid.  I think that kind of overloading is pretty Pythonic / expected. When reading that code, it's very difficult to misinterpret what's going on.

I'm pretty resistant to directions that involve using type annotations for anything other than annotating the actual Python type that's expected somewhere.  It's fairly abusive of a Python language feature, and it causes non-trivial trouble for anyone who's trying to use that language feature in the correct way.  Regarding "I'm concerned about too many breaking changes and too much thrash", I'm 100% with you.  For the existing uses of this pattern, I don't mean to advocate for raising deprecation warnings, but I do think we could make new users' lives simpler by de-emphasizing it in our docs and examples.

Btw, for reference, here are a few examples of how users define types in their solids:

"I'm pretty resistant to directions that involve using type annotations for anything other than annotating the actual Python type that's expected somewhere. It's fairly abusive of a Python language feature, and it causes non-trivial trouble for anyone who's trying to use that language feature in the correct way."

I completely agree.

We could special case this wrapper object to act like List or Optional which are ephemeral types and don't have that object uniqueness invariant.

Yea this is exactly what I was thinking, which led to all the SomeWrapperType[TheInnerType]. It'll have to be a clever since List Optional point at inner unique types where this would have to work a bit different.

Seems like there is interest in:

  • experimenting with flexible input_def args.
    • Dict[str, DagsterType] case should be easy
    • cant really do Dict[str, InputDefinition] (without changing InputDefinition) due to duping the name so then you end up with an alternate type Input, InputDef which is kinda awkward. Not obvious to me best path forward there.
  • experimenting with some sort of InstanceOf[T] fallback type
    • involves threading the needle on expectations in the type system, unclear on exactly how difficult this will be

over this diff in its current form. To your queue for now, request review if there is more you would to discuss.

This revision now requires changes to proceed.Mon, Feb 15, 4:36 PM

@alangenfeld @schrockn here's a take that autogenerates a dagster type based on the python type instead of defaulting to Any.

From slack:

my thinking is that it's better to hard fail than to let there be different behavior depending on the order that these two blocks execute:

make_python_type_usable_as_dagster_type(MyType, ...)
@solid
def my_solid(_, input1: MyType):
    ...

This is a good point - but I think I lean that the implicit entry in global registry is a worse overall problem then this one. Reasoning being:

  • the description for the auto wrapped type can call out the potential make_python_type_usable_as_dagster_type problem (or call it out in the docs that are linked to)
  • the fix is relatively clear once the problem has been discovered

With the registration - when you try to import a solid from a place that defines a conflicting type you are in a tougher spot. Note you don't even have to use the offending solid - it just has to be defined in the module you are importing. To workaround you would have to add explicit dagster types for your conflicts or not import that library, or change the type names or something.

python_modules/dagster/dagster/core/types/dagster_type.py
775–780

I think you can avoid the registry by making a DagsterType that has a key of the python type name but no name property and overrides display_name to return the python_type name

type dict rules:
https://dagster.phacility.com/source/dagster/browse/master/python_modules/dagster/dagster/core/types/dagster_type.py$847-872

python_modules/dagster/dagster_tests/general_tests/py3_tests/test_inference.py
365

we should add tests for composite IO mapping and fan-in