Adding Dagster Type Guide to explain the new API
General comments on first pass through. It's looking good!
can we throw all code blocks for documentation into examples and refer to them via code lines. This way whenever somebody changes a call-site tests fail notifying them to fix something rather than hoping that someone will remember to update the docs during review time.
might be worth getting rid of these comments. It's noisy.
Is this really the right way to do this? I thought the intention was to have the assert predicate in a type check method? This seems sketchy as you would get assertion errors during runtime. This is bad because it is possible to turn off asserts globally in the python interpreter and you loose all the metadata that you get with a type check failure.
Maybe I am missing a lot of context here from a past revision, in which case, this shouldn't block the revision.
the general case?
I feel like we can be a lot more succinct in this section.
Might be worth not mentioning the intermediate store, if a person hasn't read about intermediates, this could be confusing. Metadata is a pretty simple concept to explain on its own without introducing a dagster concept.
somewhere in here, maybe at the bottom, we should have some verbiage like "The dagster type system is independent from the PEP 484 Python type system, although we overload the type annotation syntax on functions to make it easier to specify the input and output types of your solids. Python classes are *not* necessarily Dagster types. If you're using a static type checker like mypy, please...."
A solid's metadata describes the conditions -- the state of the external world -- that must hold for the computation to succeed.
this could usefully be reworded
concise example could be useful. really solids declare a schema for their configuration.
hm, interesting -- what are you trying to evoke by this distinction? i presume it's to point out to the reader that not everything on which a solid operates has to flow between solids. but fwiw, even in the case of a database table, the reference to the table is managed by the intermediate store.
this is interesting but a little arcane -- not sure the best place to put this tho there is obv a class of readers who will appreciate this pointer
the typechecks are flexible, but i don't see what they have to do with being optional
from existing (data?)
hm, i think this is the best way to communicate the signature of the type_check_fn
i think you mean, a python type that is directly usable as a dagster type
this is potentially misleading, i would qualify it -- "there is a 1:1 relationship between each of these python types"
by their inputs
for a solid
use either the singular or the plural for data but not both in the same para
cut this line
consolidate para into one sentence
if i understand this correctly, you're trying to point to a couple of cross-cutting concerns:
this parenthetical note should be expanded into a separate section on fan-in
define their own
both in type annotations..
@; link to the sphinx autogen docs using :py:func:
maybe flesh this out
therefore.. must not
is there an open issue
Implementers choice, but I would prefer regular english. The signature of the type_check_fn is explicitly described in english below this. If people really want to know signatures, we should have this in docstrings. Right now, I think it distracts more than it helps.
not following this one
Well, this contrived example is imagining a business object, however contrived. Imagine this class being used outside of dagster. This is a reasonable thing to do there. The point is that meaningful validation is done at construction time. Once constructed the only think dagster needs to do is check instanceof.
google docs failed me!
this is more of a note to myself and us. we need to finalize this before wed
ya that is the plan. should have made that clear in summary. getting feedback on core content before getting down and dirty on the literal includes, which are a pain
imo this should be left for different part of the document. (or the config document). Would like to leave this intro section example code free
I'm setting up the taxonomy we describe later in terms of different types of inputs. The data dependencies are really the super set of data, metadata, Nothings, etc. Trying to tee that up
yeah agree. also not useful to include given that we haven't implemented this nature of tooling ourselves.
the default to 'Any', combined with the arbitrary strictness of the typecheck, make the type system qualify as optional
we can leave the sig for the docblock
I'm going to leave this as a TODO for now
I'm just trying to run through the use cases and delienate what it gets you in each case
I ran this by dwall and he said it was clarifying, if we are looking for a data point that isn't us
Generally the tone of these guide are less tutorial and more definitive source in which case we should feel free to refer to other concepts I think
Added issue here
hmmmm document is quite long already
in mypy? no idea
this is within striking distance - accepting so you can proceed at your discretion
do a final pass against previous in line comments before you land
maybe a lot for one section - consider some subheadings?
dont need to go crazy - but I feel this instance of DagsterType should deep link to API docs
update the type check functions now that i landed the context change
:py:class: these if they dont already deep link
to express one
^ marked as done but typo still present
this file will need updates post rebase
well, it's optional whether to impose types at all -- then the type checks can be as strict as they want to be. not worth harping on and i hate every part of my own soul that is tempted to care about type system terminology.
i encourage going crazy on this stuff, it feels insane to write but is so helpful for users
what i mean is that there isn't a 1:1 relationship between every python type and a corresponding dagster type -- just *these* types -- i stumbled when i first read this sentence
"Suppose you have a business object that does all of its meaningful validation when it is constructed..."
cf the custom types tutorial
Uses of the :py:class:~dagster.Nothing type typically point to a situation in which although there does exist some semantic dependency between two solids, usually to do with external state -- "such and such a table was created in a database by the upstream solid", "such and such upstream solid ran successfully within the last SLA period" -- that dependency isn't yet described and encoded in metadata. In general, this kind of dependency makes a pipeline harder to understand, and solids within that pipeline harder to test and reuse. or sth