Page MenuHomeElementl

Dagster Pandas exists composition and Null event_metadata_fn handling
ClosedPublic

Authored by themissinghlink on Feb 6 2020, 11:40 PM.

Details

Summary

This revision implements an exists composition for the PandasColumn and also handles the case where people pass None into the constructor for event_metadata_fn.

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

schrockn added inline comments.
python_modules/libraries/dagster-pandas/dagster_pandas/validation.py
48

why are these class methods? do we expect these work on subclasses? Also it seems a lot to have a global registry, why not just construct them directly? seems like a lot of indirect.

This isn't new tho so shouldn't block this

This revision is now accepted and ready to land.Feb 7 2020, 12:04 AM
python_modules/libraries/dagster-pandas/dagster_pandas/validation.py
48

How do you do it without a classmethod? Also where is this global registry coming from? Do I not actually understand what classmethod is doing?

Is the alternative

@staticmethod
def exists(name, ...):
    return PandasColumn(name, ...)

Is that not the same as using a classmethod decorator?

python_modules/libraries/dagster-pandas/dagster_pandas/validation.py
48

@classmethod is more dynamic as it passes in the class whereas @staticmethod does not. This implies that the class fulfills some sort of contract. In this case that contract is implicit, in that the ctor method of any subclass must have the same signature as the base class.

When I see a classmethod, I assume that the author designed the class with inheritance in mind for usage, because that's the only time it has meaningful semantics versus staticmethod. It's a signal to the class consumer.

PandasColumn is not designed to be subclasses. Users inject custom behavior via the constraints, which is a much smaller API surface area for us to worry about.

If all these were static methods constructing PandasColumn invoking PandasColumn.add_configurable_constraints

Sidenote: add_configurable_constraints is a misleading name as well. I thought this end up doing something like adding a entry to mutable global registry key the cls as the key or something.

I would just directly construct the constraints in a helper function

def construct_keyword_constraints(exists, unique):
    contraints = []
    if unique:
       constraints.add(UniqueColumnConstraint())
    # same same for exists
   return constraints

You could decide to have a registry if you really want but seems like overkill. This structure makes the code way more obvious and requires the reader to search around the file a bunch.

If you agree and feel inspired, would be cool to see those refactors in a followup

python_modules/libraries/dagster-pandas/dagster_pandas/validation.py
48

"If all these were static methods constructing PandasColumn invoking PandasColumn.add_configurable_constraints"

i meant
"If all these were static methods constructing PandasColumn invoking PandasColumn.add_configurable_constraints I think it would be clearer what the intended usage is"

python_modules/libraries/dagster-pandas/dagster_pandas/validation.py
48

Damn.....did not think of that.

mindblown

I am going to land this and get a revision out which implements those changes since that will be a slightly bigger code change. Thanks for the knowledge drop, will factor that into future designs.