Page MenuHomePhabricator

Per live discussions, remove usable types from libraries

Authored by schrockn on Feb 11 2020, 12:32 AM.



We are pushing users to scope these mappings
to their projects instead of libraries.

I did take my a couple times to get this right and
put the mapping call in the right spot, so there's that.

Test Plan


Diff Detail

R1 dagster
Lint OK
No Unit Test Coverage

Event Timeline

schrockn updated this revision to Diff 9485.Feb 11 2020, 12:32 AM
schrockn created this revision.


Harbormaster failed remote builds in B7706: Diff 9485!
schrockn updated this revision to Diff 9552.Feb 11 2020, 11:05 PM
schrockn retitled this revision from kill all library mappings to Per live discussions, remove usable types from libraries.
schrockn edited the summary of this revision. (Show Details)
schrockn edited the test plan for this revision. (Show Details)


Code wise this lgtm. I wasn't a part of this convo, but is the consequence of this change that if I ever import DataFrame from dagster_pyspark, I would need to call make_python_type_usable_as_dagster_type? What is the advantage of moving this out?

You definitely don't have to do this. You only do this *if* you want to be able to annotate your solids with the *python* type and have dagster associate the dagster type with it.

Let me throw up a variant that uses a different strategy just for bay_bikes for the purposes of comparison

The advantage of moving this out is that it is quite "presumptuous" for a library to claim a fundamental type and map it to the dagster type. It could cause some quite magical behavior. Imagine a user who wants to use mypy but then they forget to use an InputDefinition or OutputDefinition with their specialized dagster types. The system would magically coerce it to the library's dagster type.

Here's an example of what it would look like by using dagster types only in InputDefinitions and OutputDefinitions, and unmapped python types only in type annotations:

themissinghlink added a comment.EditedFeb 11 2020, 11:55 PM

Hold up. That counter example just shows that not using make_python_type_usable_as_dagster_type leads to verbose solid signatures. I def agree that we would want make_python_type_usable_as_dagster_type but the question is really, should libraries call make_python_type_usable_as_dagster_type or should that be left to the user. One claim is that explicit better than implicit. However, if this presumption is true 90% of the time, I fear that we are adding more surface area for the user to think about when building a pipeline.

Think of it this way, you are embarking on adding custom types to your dagster pipeline and you start evaluating dagster's type system. You then see multiple different ways to use and propagate custom types. That seems like a not great experience. Basically the question I have for you is, finer grained control is great, but how much is too much?

alangenfeld added a comment.EditedFeb 12 2020, 1:55 AM

You then see multiple different ways to use and propagate custom type

I would argue we're already in a confusing state of affairs and this simply removes some obscurity around the existing fundamental problems and approaches. We'll need to keep iterating based on how users receive it.

I say lets land D2009 as the most explicit version then we can rebase this diff and contrast mapping with explicit I/O defs and see how that feels and consider whether a named utility method helps things feel better.


a pattern of libraries providing a utility method might be nice



max added a comment.Feb 12 2020, 5:00 PM

I think I'm on the same page as Alex