Page MenuHomePhabricator

(RFC) scaffolding for validation/quality classes
AbandonedPublic

Authored by max on Jun 10 2020, 5:58 PM.

Details

Test Plan

unit

Diff Detail

Repository
R1 dagster
Branch
dataquality (branched from master)
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 10 2020, 6:13 PM
Harbormaster failed remote builds in B13187: Diff 16211!
  • basic validation engines written
  • basic responses and as_solid
  • adding exception collection
  • bug fixes
  • removing pandas from sample validator
  • black
  • python 2.7
  • more scaffolding work
  • basic validation engines written
  • basic responses and as_solid
  • formatting
  • sample validator function
  • adding exception collection
  • bug fixes
  • removing pandas from sample validator
  • removing pandas from sample validator
  • as_solid for engines
max retitled this revision from scaffolding for validation/quality classes to (RFC) scaffolding for validation/quality classes.Jun 12 2020, 10:36 PM
python_modules/dagster/dagster/core/definitions/validators.py
8

i think I would call this Validator -- it's not really an abstract class (it has no abstract properties or methods, and there's actually nothing wrong with using it directly with user-provided params). it's likely that we'd want the final API to look more like our other core APIs and use syntax like @validator to decorate the validation function

9

prefer validation_fn to be consistent with style throughout codebase

10

we should check that this is callable

34

consider min_value/max_value or min_/max_

36

why the caps

40

what's implicit about this

41

no type checking is done so this can raise an unhandled runtime TypeError, e.g. when data is ['a', 'b'] and minim is 3

48

again, i don't think this is really an abstract class, since it can be directly instantiated and work fine. i *think* what you're trying to get at is that a user might want to implement their own behavior for AbstractResponse.react in the non-exception case, thus the NotImplementedError there?

50

we should check types in the constructor

59

my instinct would be to sugar this in the constructor rather than in react, esp if you want users to be able to override react

66

should probably be a DagsterError

75

consider ValidationEngine

79

since this and report_or_fail aren't yet hardened concepts it's helpful to have docstrings making plain their intention & how they interact.

87

prefer a docstring describing what the user has to implement -- this is unreachable code

93

it seems like the branch on l. 55 can never be hit?

95

is this backwards?

109

would be nice for these intermediate solids to have names

110

helpful for future readers who lack context on how this closure works to have a note here explaining what value=val and response=resp are doing. it might be clearer if you broke out the solid generation into a separate factory utility

111

i presume this doesn't actually work (it would have to be value.as_solid()(dat) and then i think you'd see a different error)

112

hm, interesting that you didn't choose to follow the same pattern here and have AbstractResponse.as_solid. it seems to me that you can *either* blow them both out into solids & construct a composite *or* construct a single validation solid that aggregates the internal logic, but that mixing them as here is unlikely to play nicely

117

i'm not sure what this is trying to do exactly -- self is probably not what you think it is here

127

the _ prefix is really intended for unused params -- just call this context. again, i don't think self is what you think.

max requested changes to this revision.Jun 22 2020, 2:07 PM
This revision now requires changes to proceed.Jun 22 2020, 2:07 PM
  • black
  • python 2.7
  • more scaffolding work
  • basic validation engines written
  • basic responses and as_solid
  • formatting
  • sample validator function
  • adding exception collection
  • bug fixes
  • removing pandas from sample validator
  • removing pandas from sample validator
  • as_solid for engines
  • bug fixes
  • basic changes to validators
max requested changes to this revision.Jun 25 2020, 6:46 PM

just returning this to you for queue management as it's dormant

This revision now requires changes to proceed.Jun 25 2020, 6:46 PM
max edited reviewers, added: leoeer; removed: max.
This revision now requires review to proceed.Tue, Jan 5, 8:34 PM