Page MenuHomePhabricator

Dagster Type Guide
ClosedPublic

Authored by schrockn on Feb 6 2020, 7:33 PM.

Details

Summary

Adding Dagster Type Guide to explain the new API

Test Plan

read

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

General comments on first pass through. It's looking good!

docs/sections/learn/guides/dagster_types.rst
44

can we throw all code blocks for documentation into examples and refer to them via code lines. This way whenever somebody changes a call-site tests fail notifying them to fix something rather than hoping that someone will remember to update the docs during review time.

48

might be worth getting rid of these comments. It's noisy.

112

Is this really the right way to do this? I thought the intention was to have the assert predicate in a type check method? This seems sketchy as you would get assertion errors during runtime. This is bad because it is possible to turn off asserts globally in the python interpreter and you loose all the metadata that you get with a type check failure.

Maybe I am missing a lot of context here from a past revision, in which case, this shouldn't block the revision.

151

the general case?

193

I feel like we can be a lot more succinct in this section.

197

Might be worth not mentioning the intermediate store, if a person hasn't read about intermediates, this could be confusing. Metadata is a pretty simple concept to explain on its own without introducing a dagster concept.

max added inline comments.Feb 10 2020, 9:49 PM
docs/sections/learn/guides/dagster_types.rst
6

somewhere in here, maybe at the bottom, we should have some verbiage like "The dagster type system is independent from the PEP 484 Python type system, although we overload the type annotation syntax on functions to make it easier to specify the input and output types of your solids. Python classes are *not* necessarily Dagster types. If you're using a static type checker like mypy, please...."

8

A solid's metadata describes the conditions -- the state of the external world -- that must hold for the computation to succeed.

15

this could usefully be reworded

16

concise example could be useful. really solids declare a schema for their configuration.

17

hm, interesting -- what are you trying to evoke by this distinction? i presume it's to point out to the reader that not everything on which a solid operates has to flow between solids. but fwiw, even in the case of a database table, the reference to the table is managed by the intermediate store.

26

this is interesting but a little arcane -- not sure the best place to put this tho there is obv a class of readers who will appreciate this pointer

32

the typechecks are flexible, but i don't see what they have to do with being optional

34

from existing (data?)

48

hm, i think this is the best way to communicate the signature of the type_check_fn

77

*static

80

type_check_fn

92

i think you mean, a python type that is directly usable as a dagster type

93

this is potentially misleading, i would qualify it -- "there is a 1:1 relationship between each of these python types"

137

by their inputs

138

for a solid

139

use either the singular or the plural for data but not both in the same para

147

cut this line

184

provide example

189

consolidate para into one sentence

195

if i understand this correctly, you're trying to point to a couple of cross-cutting concerns:

  1. is the data passed from solid to solid, or is something else (metadata, a reference to data, or a sentinal value) passed instead? i.e., what does the intermediates machinery & serialization actually touch, what is the overhead, etc.
  1. how would you stub each of these types of input or output for test?
  1. what is it possible for a type check to usefully assert in each case?
247

this parenthetical note should be expanded into a separate section on fan-in

282

define their own

288

both in type annotations..

292

@; link to the sphinx autogen docs using :py:func:

336

maybe flesh this out

341

therefore.. must not

358

is there an open issue

docs/sections/learn/guides/dagster_types.rst
48

Implementers choice, but I would prefer regular english. The signature of the type_check_fn is explicitly described in english below this. If people really want to know signatures, we should have this in docstrings. Right now, I think it distracts more than it helps.

alangenfeld requested changes to this revision.Feb 10 2020, 9:57 PM

Q mgmt

This revision now requires changes to proceed.Feb 10 2020, 9:57 PM
schrockn added inline comments.Feb 11 2020, 1:04 AM
docs/sections/learn/guides/dagster_types.rst
44

ya

92

sure do!

93

not following this one

112

Well, this contrived example is imagining a business object, however contrived. Imagine this class being used outside of dagster. This is a reasonable thing to do there. The point is that meaningful validation is done at construction time. Once constructed the only think dagster needs to do is check instanceof.

139

google docs failed me!

151

this is more of a note to myself and us. we need to finalize this before wed

schrockn added inline comments.Feb 11 2020, 1:05 AM
docs/sections/learn/guides/dagster_types.rst
44

early return.

ya that is the plan. should have made that clear in summary. getting feedback on core content before getting down and dirty on the literal includes, which are a pain

schrockn marked 20 inline comments as done.Feb 11 2020, 10:16 PM
schrockn added inline comments.
docs/sections/learn/guides/dagster_types.rst
16

imo this should be left for different part of the document. (or the config document). Would like to leave this intro section example code free

17

I'm setting up the taxonomy we describe later in terms of different types of inputs. The data dependencies are really the super set of data, metadata, Nothings, etc. Trying to tee that up

26

yeah agree. also not useful to include given that we haven't implemented this nature of tooling ourselves.

32

the default to 'Any', combined with the arbitrary strictness of the typecheck, make the type system qualify as optional

48

we can leave the sig for the docblock

184

I'm going to leave this as a TODO for now

184
195

I'm just trying to run through the use cases and delienate what it gets you in each case

195

I ran this by dwall and he said it was clarifying, if we are looking for a data point that isn't us

197

Generally the tone of these guide are less tutorial and more definitive source in which case we should feel free to refer to other concepts I think

247

Added issue here

247
336

hmmmm document is quite long already

358

in mypy? no idea

schrockn updated this revision to Diff 9544.Feb 11 2020, 10:20 PM
schrockn marked 6 inline comments as done.

feedback; next step is to switch to code literal includes

alangenfeld accepted this revision.Feb 12 2020, 1:13 AM

this is within striking distance - accepting so you can proceed at your discretion

do a final pass against previous in line comments before you land

docs/sections/learn/guides/dagster_types.rst
4–40

maybe a lot for one section - consider some subheadings?

45
:py:class:`DagsterType`

dont need to go crazy - but I feel this instance of DagsterType should deep link to API docs

51

update the type check functions now that i landed the context change

89

:py:class: these if they dont already deep link

201

:py:class Nothing

208–209

to express a data dependencies

to express one
to express a data dependency

210–211

are operational concerns only

reads weird

283

^ marked as done but typo still present

examples/dagster_examples_tests/intro_tutorial_tests/test_type_guide.py
24

this file will need updates post rebase

This revision is now accepted and ready to land.Feb 12 2020, 1:13 AM
max added inline comments.Feb 12 2020, 7:19 PM
docs/sections/learn/guides/dagster_types.rst
32

well, it's optional whether to impose types at all -- then the type checks can be as strict as they want to be. not worth harping on and i hate every part of my own soul that is tempted to care about type system terminology.

45

i encourage going crazy on this stuff, it feels insane to write but is so helpful for users

77

*systems

93

what i mean is that there isn't a 1:1 relationship between every python type and a corresponding dagster type -- just *these* types -- i stumbled when i first read this sentence

112

"Suppose you have a business object that does all of its meaningful validation when it is constructed..."

184

cf the custom types tutorial

213

Uses of the :py:class:~dagster.Nothing type typically point to a situation in which although there does exist some semantic dependency between two solids, usually to do with external state -- "such and such a table was created in a database by the upstream solid", "such and such upstream solid ran successfully within the last SLA period" -- that dependency isn't yet described and encoded in metadata. In general, this kind of dependency makes a pipeline harder to understand, and solids within that pipeline harder to test and reuse. or sth

290

as deliver

schrockn retitled this revision from Dagster Type Guide Stab at dagster docs to Dagster Type Guide.Feb 12 2020, 7:51 PM
schrockn edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.