We are pushing users to scope these mappings
to their projects instead of libraries.
I did take my a couple times to get this right and
put the mapping call in the right spot, so there's that.
schrockn on Feb 11 2020, 12:32 AM.Authored by
Code wise this lgtm. I wasn't a part of this convo, but is the consequence of this change that if I ever import DataFrame from dagster_pyspark, I would need to call make_python_type_usable_as_dagster_type? What is the advantage of moving this out?
You definitely don't have to do this. You only do this *if* you want to be able to annotate your solids with the *python* type and have dagster associate the dagster type with it.
Let me throw up a variant that uses a different strategy just for bay_bikes for the purposes of comparison
The advantage of moving this out is that it is quite "presumptuous" for a library to claim a fundamental type and map it to the dagster type. It could cause some quite magical behavior. Imagine a user who wants to use mypy but then they forget to use an InputDefinition or OutputDefinition with their specialized dagster types. The system would magically coerce it to the library's dagster type.
Hold up. That counter example just shows that not using make_python_type_usable_as_dagster_type leads to verbose solid signatures. I def agree that we would want make_python_type_usable_as_dagster_type but the question is really, should libraries call make_python_type_usable_as_dagster_type or should that be left to the user. One claim is that explicit better than implicit. However, if this presumption is true 90% of the time, I fear that we are adding more surface area for the user to think about when building a pipeline.
Think of it this way, you are embarking on adding custom types to your dagster pipeline and you start evaluating dagster's type system. You then see multiple different ways to use and propagate custom types. That seems like a not great experience. Basically the question I have for you is, finer grained control is great, but how much is too much?
I would argue we're already in a confusing state of affairs and this simply removes some obscurity around the existing fundamental problems and approaches. We'll need to keep iterating based on how users receive it.
I say lets land D2009 as the most explicit version then we can rebase this diff and contrast mapping with explicit I/O defs and see how that feels and consider whether a named utility method helps things feel better.