Page MenuHomePhabricator

cdecarolis (Christopher DeCarolis)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 20 2020, 3:42 PM (9 w, 1 d)

Recent Activity

Today

cdecarolis accepted D4465: Use version resolution logic for result versions.

Looks like this is passing in bk - not sure why it's saying it failed.

Tue, Sep 22, 2:08 PM
cdecarolis accepted D4461: create execution plan at most once in create_run_for_pipeline.
Tue, Sep 22, 2:06 PM

Yesterday

cdecarolis updated the diff for D4498: changed solid version to index by string-ified version of solid handle rather than name of solid - handles composite case.

up

Mon, Sep 21, 11:19 PM
cdecarolis updated the diff for D4498: changed solid version to index by string-ified version of solid handle rather than name of solid - handles composite case.

Used string method by convention

Mon, Sep 21, 11:05 PM
cdecarolis requested review of D4497: Added extensive documentation to all version resolution computations..
Mon, Sep 21, 9:42 PM
cdecarolis updated the summary of D4494: added test to ensure that version is passed properly from solid decorator to solid definition..
Mon, Sep 21, 8:38 PM
cdecarolis requested review of D4494: added test to ensure that version is passed properly from solid decorator to solid definition..
Mon, Sep 21, 7:35 PM
cdecarolis updated the diff for D4480: Added full solid versioning support to step selection..

up

Mon, Sep 21, 7:16 PM
cdecarolis updated the diff for D4480: Added full solid versioning support to step selection..

Added return to docstring

Mon, Sep 21, 6:50 PM
cdecarolis added inline comments to D4480: Added full solid versioning support to step selection..
Mon, Sep 21, 6:35 PM
cdecarolis requested changes to D4465: Use version resolution logic for result versions.

This looks largely good, although I think it would also be affected by https://dagster.phacility.com/D4480. Would you be able to refactor to take that into account?

Mon, Sep 21, 6:21 PM
cdecarolis added inline comments to D4479: Implement get_addresses_for_step_output_versions for SQL event storage.
Mon, Sep 21, 6:18 PM
cdecarolis requested changes to D4461: create execution plan at most once in create_run_for_pipeline.

I think a lot of the most recent changes to https://dagster.phacility.com/D4480 will break this, unfortunately. Most notably the renaming of the config_value parameter,

Mon, Sep 21, 6:12 PM
cdecarolis updated the diff for D4480: Added full solid versioning support to step selection..

Renamed config_value to run_config, added an additional test for resourced solids.

Mon, Sep 21, 5:59 PM
cdecarolis added inline comments to D4480: Added full solid versioning support to step selection..
Mon, Sep 21, 4:08 PM
cdecarolis added inline comments to D4480: Added full solid versioning support to step selection..
Mon, Sep 21, 3:48 PM
cdecarolis updated the diff for D4481: passed version param to ResourceDefinition.

Added tests to test_required_resources.py to ensure that default version was correct, and version was being passed all the way to resource definition.

Mon, Sep 21, 3:30 PM
cdecarolis added a comment to D4481: passed version param to ResourceDefinition.

When I run into small bugs like this, I normally try to include a test that would have caught the bug in the first place. Might be easy to add or augment something in test_resource_definitions.

Mon, Sep 21, 2:36 PM

Fri, Sep 18

cdecarolis requested review of D4482: added solid required_resource_keys to executionstep.
Fri, Sep 18, 8:47 PM
cdecarolis requested review of D4481: passed version param to ResourceDefinition.
Fri, Sep 18, 8:44 PM
cdecarolis published D4480: Added full solid versioning support to step selection. for review.

@sandyryza I needed to make one change to create_run_for_pipeline.py - want to make sure this doesn't screw up what you were working on.

Fri, Sep 18, 8:27 PM
cdecarolis updated the diff for D4322: [core] added warning when solid not invoked from composition function.

Made helper function to avoid directly accessing _composition_stack global.

Fri, Sep 18, 2:26 PM

Thu, Sep 17

cdecarolis accepted D4457: non-optional execution plan in resolve_step_versions.
Thu, Sep 17, 8:44 PM
cdecarolis added a comment to D4457: non-optional execution plan in resolve_step_versions.
In D4457#120776, @yuhan wrote:

you should be able to get both mode and run_config from execution_plan.pipeline_def

Thu, Sep 17, 8:35 PM
cdecarolis added a comment to D4457: non-optional execution plan in resolve_step_versions.

I actually think we should do the opposite (don't provide an execution plan, but provide a pipeline def, run config, and mode); there is no convenient way of obtaining mode or environment_config, both of which will be necessary in order to resolve config + resource versions.

Per Slack conversation - if we need the run_config and mode to resolve versions, is there any issue with just passing those in to resolve_step_versions directly? If we build the ExecutionPlan inside resolve_step_versions, we'll also need to build it inside resolve_unmemoized_steps. While this isn't a performance concern, I think it does hurt readability and debuggability a bit.

Thu, Sep 17, 8:27 PM
cdecarolis requested changes to D4457: non-optional execution plan in resolve_step_versions.

I actually think we should do the opposite; there is no convenient way of obtaining mode or environment_config, both of which will be necessary in order to resolve config + resource versions.

Thu, Sep 17, 7:18 PM
cdecarolis accepted D4456: move version resolution to its own module.
Thu, Sep 17, 7:17 PM
cdecarolis added a comment to D4457: non-optional execution plan in resolve_step_versions.

This makes sense. It was kind of wonky having both options, and added a lot of unnecessary complication.

Thu, Sep 17, 5:32 PM
cdecarolis added a comment to D4456: move version resolution to its own module.

This approach makes sense to me. However, I do feel like some of these helper functions should probably live in some util file or something (like join_and_hash, etc)

Thu, Sep 17, 5:31 PM

Wed, Sep 16

cdecarolis requested review of D4450: function to resolve version for config.
Wed, Sep 16, 10:15 PM
cdecarolis added inline comments to D4437: Fixed type in docstring for solid versions. Added an experimental warning to DagsterTypeLoader versions..
Wed, Sep 16, 8:56 PM
cdecarolis updated the diff for D4437: Fixed type in docstring for solid versions. Added an experimental warning to DagsterTypeLoader versions..

documentation up

Wed, Sep 16, 8:56 PM
cdecarolis added reviewers for D4436: version for resource definition: sandyryza, yuhan.
Wed, Sep 16, 8:53 PM
cdecarolis updated the diff for D4437: Fixed type in docstring for solid versions. Added an experimental warning to DagsterTypeLoader versions..

Changed default value of external input version loader to None. Added experimental warning for _external_version_fn

Wed, Sep 16, 8:52 PM
cdecarolis requested review of D4436: version for resource definition.
Wed, Sep 16, 7:41 PM
cdecarolis added inline comments to D4437: Fixed type in docstring for solid versions. Added an experimental warning to DagsterTypeLoader versions..
Wed, Sep 16, 7:16 PM
cdecarolis updated the diff for D4322: [core] added warning when solid not invoked from composition function.

Made warning message easier to read. Improved tests. Pending invocations tracked at the composition context level.

Wed, Sep 16, 7:13 PM
cdecarolis updated the diff for D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..

Fxn naming changes. Documentation.

Wed, Sep 16, 3:06 PM
cdecarolis added inline comments to D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..
Wed, Sep 16, 2:36 PM
cdecarolis added reviewers for D4437: Fixed type in docstring for solid versions. Added an experimental warning to DagsterTypeLoader versions.: sandyryza, yuhan.
Wed, Sep 16, 2:19 PM

Tue, Sep 15

cdecarolis requested review of D4437: Fixed type in docstring for solid versions. Added an experimental warning to DagsterTypeLoader versions..
Tue, Sep 15, 10:18 PM
cdecarolis added inline comments to D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..
Tue, Sep 15, 5:37 PM
cdecarolis updated the diff for D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..

Up

Tue, Sep 15, 5:26 PM
cdecarolis updated the diff for D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..

Combined hash and join helper methods. Moved resolution of step versions to its own function on the dagster instance. Expanded testing to verify correct versions being produced.

Tue, Sep 15, 5:25 PM
cdecarolis accepted D4424: emr pyspark example updates for blog post.
Tue, Sep 15, 2:29 PM

Mon, Sep 14

cdecarolis requested review of D4406: added parameter to default fxn for dagster type loader version..
Mon, Sep 14, 7:01 PM
cdecarolis updated the diff for D4406: added parameter to default fxn for dagster type loader version..

Added public loader_version property.

Mon, Sep 14, 6:28 PM
cdecarolis updated the diff for D4406: added parameter to default fxn for dagster type loader version..

Made loader version a public property

Mon, Sep 14, 6:10 PM
cdecarolis added inline comments to D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..
Mon, Sep 14, 5:28 PM
cdecarolis added inline comments to D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..
Mon, Sep 14, 5:22 PM
cdecarolis updated the diff for D4404: added solid parameter to ExecutionStep (it was already being passed to constructor, but not saved as part of instance)..

Changed parameter to solid_version. Refactored tags and hooks to use solid.

Mon, Sep 14, 4:12 PM
cdecarolis added inline comments to D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..
Mon, Sep 14, 3:52 PM
cdecarolis added a comment to D4404: added solid parameter to ExecutionStep (it was already being passed to constructor, but not saved as part of instance)..
In D4404#119477, @yuhan wrote:

would it make more sense to have a version param instead of solid?

Mon, Sep 14, 3:36 PM
cdecarolis added inline comments to D4404: added solid parameter to ExecutionStep (it was already being passed to constructor, but not saved as part of instance)..
Mon, Sep 14, 3:34 PM
cdecarolis added inline comments to D4406: added parameter to default fxn for dagster type loader version..
Mon, Sep 14, 3:24 PM
cdecarolis abandoned D4351: Versioning for ExecutionStep, StepInput, and StepOutputHandle.
Mon, Sep 14, 3:18 PM
cdecarolis accepted D4415: fix and simplify usage examples for s3 and azure storages.

Good stuff. Makes sense to me.

Mon, Sep 14, 3:14 PM
cdecarolis added inline comments to D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..
Mon, Sep 14, 3:06 PM

Fri, Sep 11

cdecarolis added a comment to D4404: added solid parameter to ExecutionStep (it was already being passed to constructor, but not saved as part of instance)..

@alangenfeld just to provide some context, I use this to pass version information from solid when computing the version of a step. I figured since solid was already supplied as an input to the __new__ method, this was the least intrusive way to do that. Additionally since ExecutionSteps aren't persisted, it wouldn't affect run storage, etc.

Fri, Sep 11, 8:28 PM
cdecarolis updated the diff for D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..

Fixed functionality

Fri, Sep 11, 8:21 PM
cdecarolis updated the diff for D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..

Made it explicit that "no version provided" sentinel is the None value

Fri, Sep 11, 8:15 PM
cdecarolis updated the diff for D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..

Up

Fri, Sep 11, 7:14 PM
cdecarolis updated the diff for D4406: added parameter to default fxn for dagster type loader version..

Up

Fri, Sep 11, 6:59 PM
cdecarolis added a comment to D4405: added a new tag for memoized runs..
In D4405#119352, @yuhan wrote:

Is it going to be used by dagit/graphql flow only or is it also going to be used by api and cli? More specifically, is this tag required for execution that dagster requires this tag when executing a memorized pipeline?
In the current system, only the RESUME_RETRY_TAG is part of the execution logic but only in dagit/gql flow.
Other tags are primarily for storage purpose, where we don't use them in the core execution logic. For example, as for the STEP_SELECTION_TAG and SOLID_SELECTION_TAG, the reason why we use tags is for dagit to easily grab the user's selection value (the alternative was to persist them in the pipeline run table, but because we were not confident about the data model and didn't want to worry about potential data migration, we ended up putting them in tags for the short term)

Just wanted to give you context about how we currently use tags and how we think about persistency -- which are worth thinking ahead. But either way, persisting the info in tags at least for the short term sounds good!

Fri, Sep 11, 6:56 PM
cdecarolis updated the diff for D4406: added parameter to default fxn for dagster type loader version..

Added tests, and made default computed version None (we're using None as our sentinel value for no versions provided).

Fri, Sep 11, 6:49 PM
cdecarolis updated the diff for D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached..

Checked that tags exist

Fri, Sep 11, 6:06 PM
cdecarolis published D4407: [Depends on D4404, D4405, D4406] Algorithm to decide which steps to re-execute based on which versions are cached. for review.
Fri, Sep 11, 5:42 PM
cdecarolis added a comment to D4405: added a new tag for memoized runs..
In D4405#119340, @yuhan wrote:

whats the plan of using this tag?

Fri, Sep 11, 5:40 PM
cdecarolis added reviewers for D4405: added a new tag for memoized runs.: yuhan, sandyryza.
Fri, Sep 11, 5:35 PM
cdecarolis requested review of D4406: added parameter to default fxn for dagster type loader version..
Fri, Sep 11, 5:34 PM
cdecarolis requested review of D4405: added a new tag for memoized runs..
Fri, Sep 11, 5:31 PM
cdecarolis added reviewers for D4404: added solid parameter to ExecutionStep (it was already being passed to constructor, but not saved as part of instance).: sandyryza, yuhan.
Fri, Sep 11, 5:31 PM
cdecarolis requested review of D4404: added solid parameter to ExecutionStep (it was already being passed to constructor, but not saved as part of instance)..
Fri, Sep 11, 5:26 PM

Thu, Sep 10

cdecarolis requested review of D4393: sha1 expects a utf-8 encoded string.
Thu, Sep 10, 10:50 PM

Wed, Sep 9

cdecarolis added inline comments to D4341: RFC: record address and version on step output and surface it on instance.
Wed, Sep 9, 8:26 PM
cdecarolis added inline comments to D4341: RFC: record address and version on step output and surface it on instance.
Wed, Sep 9, 8:22 PM
cdecarolis added a comment to D4341: RFC: record address and version on step output and surface it on instance.

cache_scope not a great name either - but gets at the idea. To be clear in my head all it does is get hashed in to compute the final version_hash of a step/solid in the same way that the run_config will or the input values in downstream solids

Wed, Sep 9, 8:20 PM
cdecarolis added inline comments to D4341: RFC: record address and version on step output and surface it on instance.
Wed, Sep 9, 8:18 PM
cdecarolis added a comment to D4341: RFC: record address and version on step output and surface it on instance.

Under the current system, the logical object should probably be the step output

The problematic case now is 2 different solids with manually the same version string, aliased to the same name such that the output handles are the same in the two different pipelines so collide in storage.

@solid(version='v1')
def return_one():
  return 1

@solid(version='v1')
def return_two():
  return 2

@pipeline
def pipeline_a():
  return_one.alias('emit_number')()

@pipeline
def pipeline_b():
  return_two.alias('emit_number')()

which is certainly pretty contrived - but i think represents a real rare event that would be super hard for a user to figure out if they hit it

I have this gut feeling that we will want some sort of namespacing or something that can control which pipelines participate in the same pool of memoized results. Should it just be pipeline scoped? Should pipeline have a version such that you can "blow away the whole cache" so to speak by bumping that number?

Wed, Sep 9, 8:13 PM
cdecarolis updated the diff for D4351: Versioning for ExecutionStep, StepInput, and StepOutputHandle.

Rebased onto master

Wed, Sep 9, 6:58 PM
cdecarolis updated the diff for D4351: Versioning for ExecutionStep, StepInput, and StepOutputHandle.

Made StepInput version computed, and propagated up to ExecutionStep version as a result.

Wed, Sep 9, 6:39 PM
cdecarolis added inline comments to D4351: Versioning for ExecutionStep, StepInput, and StepOutputHandle.
Wed, Sep 9, 6:23 PM
cdecarolis updated the diff for D4322: [core] added warning when solid not invoked from composition function.

up

Wed, Sep 9, 6:17 PM
cdecarolis accepted D4326: version attribute for solid.

LGTM!

Wed, Sep 9, 6:15 PM
cdecarolis updated the diff for D4366: Version for DagsterTypeLoader. Incorporates loading code and external version fxn into a single version..

Updated to take config_value, and not context parameter

Wed, Sep 9, 4:23 PM
cdecarolis updated the diff for D4351: Versioning for ExecutionStep, StepInput, and StepOutputHandle.

Updated to reflect changes to DagsterTypeLoader. Reformatted via make black.

Wed, Sep 9, 4:22 PM
cdecarolis added inline comments to D4366: Version for DagsterTypeLoader. Incorporates loading code and external version fxn into a single version..
Wed, Sep 9, 3:49 PM
cdecarolis updated the diff for D4366: Version for DagsterTypeLoader. Incorporates loading code and external version fxn into a single version..

Added documentation. Changed version parameter to loader_version to reflect its usage more effectively. Reflected that external_version_fn will take in extra parameters. Rebased onto master.

Wed, Sep 9, 3:46 PM
cdecarolis requested review of D4366: Version for DagsterTypeLoader. Incorporates loading code and external version fxn into a single version..
Wed, Sep 9, 2:37 PM
cdecarolis requested review of D4351: Versioning for ExecutionStep, StepInput, and StepOutputHandle.
Wed, Sep 9, 2:14 PM

Tue, Sep 8

cdecarolis updated the diff for D4322: [core] added warning when solid not invoked from composition function.

Up

Tue, Sep 8, 7:40 PM
cdecarolis updated the diff for D4322: [core] added warning when solid not invoked from composition function.

Implementation for hooks and tags, more comprehensive tests. Flushing of stack upon checking whether invoked.

Tue, Sep 8, 7:35 PM

Fri, Sep 4

cdecarolis updated the diff for D4322: [core] added warning when solid not invoked from composition function.

More descriptive error messages, accounting for tags and hooks.

Fri, Sep 4, 9:23 PM
cdecarolis added a reviewer for D4322: [core] added warning when solid not invoked from composition function: alangenfeld.
Fri, Sep 4, 2:29 PM

Thu, Sep 3

cdecarolis updated the diff for D4322: [core] added warning when solid not invoked from composition function.

Formatting up

Thu, Sep 3, 9:41 PM
cdecarolis updated the diff for D4322: [core] added warning when solid not invoked from composition function.

Added test

Thu, Sep 3, 7:16 PM

Wed, Sep 2

cdecarolis requested review of D4325: soild -> solid :).

Flaky

Wed, Sep 2, 2:42 PM
cdecarolis requested review of D4322: [core] added warning when solid not invoked from composition function.

Flakey build failures.

Wed, Sep 2, 2:41 PM

Tue, Sep 1

cdecarolis added inline comments to D4211: Lakehouse CLI.
Tue, Sep 1, 7:49 PM

Thu, Aug 27

cdecarolis accepted D4291: fix changelog typo.

LGTM

Thu, Aug 27, 7:53 PM