Page MenuHomePhabricator

(make-pipeline-a-solid-5) NodeInvocation and SolidInvocation
Needs RevisionPublic

Authored by schrockn on Oct 27 2020, 8:48 PM.

Details

Summary

Just to give you guys a sense of what this will look like. It is truly unfortunate. However SolidInvocation is exported top-level so we will need a story.

The more obvious structure:

class NodeInvocation(namedtuple("Solid", "name alias tags hook_defs")):
    def __new__(cls, name, alias=None, tags=None, hook_defs=None):
        name = check.str_param(name, "name")
        alias = check.opt_str_param(alias, "alias")
        tags = frozentags(check.opt_dict_param(tags, "tags", value_type=str, key_type=str))
        hook_defs = frozenset(check.opt_set_param(hook_defs, "hook_defs", of_type=HookDefinition))
        return super(cls, NodeInvocation).__new__(cls, name, alias, tags, hook_defs)


class SolidInvocation(NodeInvocation):
    """Identifies an instance of a solid in a pipeline dependency structure.
    Args:
        name (str): Name of the solid of which this is an instance.
        alias (Optional[str]): Name specific to this instance of the solid. Necessary when there are
            multiple instances of the same solid.
        tags (Optional[Dict[str, Any]]): Optional tags values to extend or override those
            set on the solid definition.
        hook_defs (Optional[Set[HookDefinition]]): A set of hook definitions applied to the
            solid instance.
    Examples:
        .. code-block:: python
            pipeline = PipelineDefinition(
                solid_defs=[solid_1, solid_2]
                dependencies={
                    SolidInvocation('solid_1', alias='other_name') : {
                        'input_name' : DependencyDefinition('solid_1'),
                    },
                    'solid_2' : {
                        'input_name': DependencyDefinition('other_name'),
                    },
                }
            )
    In general, users should prefer not to construct this class directly or use the
    :py:class:`PipelineDefinition` API that requires instances of this class. Instead, use the
    :py:func:`@pipeline <pipeline>` API:
    .. code-block:: python
        @pipeline
        def pipeline():
            other_name = solid_1.alias('other_name')
            solid_2(other_name(solid_1))
    """

    def __new__(cls, name, alias=None, tags=None, hook_defs=None):
        name = check.str_param(name, "name")
        alias = check.opt_str_param(alias, "alias")
        tags = frozentags(check.opt_dict_param(tags, "tags", value_type=str, key_type=str))
        hook_defs = frozenset(check.opt_set_param(hook_defs, "hook_defs", of_type=HookDefinition))
        return super(cls, SolidInvocation).__new__(cls, name, alias, tags, hook_defs

Results in the unfortunate error:

python_modules/dagster/dagster/core/definitions/dependency.py:153: in __new__
    return super(cls, SolidInvocation).__new__(cls, name, alias, tags, hook_defs)
python_modules/dagster/dagster/core/definitions/dependency.py:112: in __new__
    return super(cls, NodeInvocation).__new__(cls, name, alias, tags, hook_defs)
E   TypeError: super(type, obj): obj must be an instance or subtype of type

So the solution I came up was to have an inheritance hierarchy and then have it
contain a namedtuple. I did containment to ensure maintainence precisely the same
hashing and equality semantics. (Side note: hooks seem problematic here)

I'd love to have a better solution here so I'm all ears

Depends on D4912

Test Plan

BK

Diff Detail

Repository
R1 dagster
Branch
make-pipeline-a-solid-5
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

this should work:

class Foo(namedtuple("_Foo", "baz")):
    def __new__(cls, baz):
        return super(cls, Foo).__new__(cls, baz)


class Bar(Foo):
    def __new__(cls, baz):
        return Foo(baz)

caveat is that then you get an instance of Foo, not Bar

yeah it's technically a breaking change

curious to see if anyone else has any bright ideas. none of these solutions are particularly satisfying.

Sent via Superhuman ( https://sprh.mn/?vip=schrockn@elementl.com )

it is a small class - we could just duplicate it and add a marker interface BackCompatNodeInvocation that we inst check against

since there is only one public api input for this, the dependencies arg, you could just do conversions at that input

caveat is that then you get an instance of Foo, not Bar
yeah it's technically a breaking change

I think I'm in the camp that as long as we don't break the clear majority use (in this case defining a PipelineDefinition manually) its fine. I think its acceptable for SolidInvocation to return NodeInvocation – I can't really imagine a use out side of like a test making isinstance assertions breaking, and i think its acceptable to break that. At least at the stage of project we are at.

ok in that case I can just have

def SolidInvocation(...)
  return NodeInvocation(...)

I think what this does make clear is that our notion of node is definitely going to leak (I think top level nodes: in config is inevitable)

I think what this does make clear is that our notion of node is definitely going to leak

yeah I think the question is at what "level" does it leak - having to understand it to manually construct PipelineDefinition which is an exercise only really necessary for programmatic pipeline construction feels ok since that is an advanced use case.

(I think top level nodes: in config is inevitable)

I think graph: is a potential clever way to hide this from beginner \ novice users

That said, its probably still worth thinking through alternatives to Node (at least name wise) since I'm not *certain* we can keep it from bleeding in to the beginner / novice levels of understanding the system

ah yes. graph is much better! forgot about that excellent idea from mr @max

to your queue to resolve

This revision now requires changes to proceed.Oct 28 2020, 11:09 PM