Page MenuHomeElementl

cdecarolis (Christopher DeCarolis)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 20 2020, 3:42 PM (54 w, 15 h)

Recent Activity

Today

cdecarolis updated the summary of D9061: [memoization 4/n][rfc] memoization strategy for jobs.
Tue, Aug 3, 1:39 AM
cdecarolis requested review of D9191: test execute_in_process with aliasing.
Tue, Aug 3, 12:29 AM
cdecarolis requested review of D9139: [mypy] definitions/events.py.
Tue, Aug 3, 12:23 AM
cdecarolis retitled D9190: [crag] Add validation step to default config buildup for to_job. from Add validation step to default config buildup for to_job. to [crag] Add validation step to default config buildup for to_job..
Tue, Aug 3, 12:12 AM
cdecarolis requested review of D9190: [crag] Add validation step to default config buildup for to_job..
Tue, Aug 3, 12:11 AM
cdecarolis requested review of D9189: [crag] Ignore executor config when using execute_in_process.
Tue, Aug 3, 12:09 AM
cdecarolis abandoned D9045: [crag] Add fields not brought about in default config, fix bug with execute_in_process and default config, catch bad config earlier on ..
Tue, Aug 3, 12:08 AM

Yesterday

cdecarolis accepted D9172: [crag] remove job arg from partitioned schedule decorators.

See comment

Mon, Aug 2, 9:18 PM
cdecarolis accepted D9184: split up test_decorators.

yay

Mon, Aug 2, 9:16 PM
cdecarolis updated the diff for D9061: [memoization 4/n][rfc] memoization strategy for jobs.

Get rid of code example

Mon, Aug 2, 9:13 PM
cdecarolis updated the diff for D9061: [memoization 4/n][rfc] memoization strategy for jobs.

Rebase + allow tag to toggle memoization off

Mon, Aug 2, 7:45 PM

Sat, Jul 31

cdecarolis requested review of D9175: Carry over field aliases to default config for job.
Sat, Jul 31, 7:28 PM

Fri, Jul 30

cdecarolis added a comment to D8963: [memoization 3/n] move core memoization logic to live on the execution plan.
In D8963#237434, @yuhan wrote:

in terms of user experience, i think it's fine to implicitly add a tag to a run - some other features work like that, e.g. we implicitly add parent_run_id and root_run_id as tags to new runs, and solid/step selection too. besides, it's also good that users can view those info in the Runs page such as:

image.png (802×2 px, 185 KB)

to clarify, i agree that tags are less robust as it doesn't have good checks and could get lost - that may be the trade-off we are making or config may be a better alternative bc it's schematized.

Fri, Jul 30, 11:47 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Fix plumbing

Fri, Jul 30, 8:00 PM
cdecarolis requested review of D9166: [bug] fix dynamic outputs issue with adls2 io manager.
Fri, Jul 30, 7:03 PM
cdecarolis added a comment to D8963: [memoization 3/n] move core memoization logic to live on the execution plan.
In D8963#237378, @yuhan wrote:

Which brings us back to Alex's initial q: what's the right way to toggle memoization on and off?

I'd like to switch it on/off in dagit, so making it toggleable via either config or tag sounds good to me - if we have it defined on jobs, imo it'd be less efficient for development like ml training.

Fri, Jul 30, 6:58 PM
cdecarolis added a comment to D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

cc @alangenfeld regarding tags:

Fri, Jul 30, 6:14 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Account for run tags to determine whether to use memoization

Fri, Jul 30, 6:00 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Up

Fri, Jul 30, 4:34 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Mypy execution plan snapshot, add a test to ensure that we don't carry around None values for step_output_versions

Fri, Jul 30, 2:00 AM
cdecarolis added inline comments to D8963: [memoization 3/n] move core memoization logic to live on the execution plan.
Fri, Jul 30, 1:17 AM

Thu, Jul 29

cdecarolis added inline comments to D9061: [memoization 4/n][rfc] memoization strategy for jobs.
Thu, Jul 29, 7:17 PM
cdecarolis updated the diff for D9061: [memoization 4/n][rfc] memoization strategy for jobs.

Reimplement using VersionStrategy instance, add code example

Thu, Jul 29, 5:50 PM
cdecarolis added inline comments to D9061: [memoization 4/n][rfc] memoization strategy for jobs.
Thu, Jul 29, 5:48 PM
cdecarolis added inline comments to D9061: [memoization 4/n][rfc] memoization strategy for jobs.
Thu, Jul 29, 4:58 PM
cdecarolis updated the diff for D9061: [memoization 4/n][rfc] memoization strategy for jobs.

Implement VersionStrategy class

Thu, Jul 29, 2:50 AM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Up

Thu, Jul 29, 2:15 AM
cdecarolis updated the diff for D9059: [memoization 2/n] provide default base dir for versioned fs io manager.

Address comments

Thu, Jul 29, 2:09 AM

Wed, Jul 28

cdecarolis updated the summary of D9085: [memoization 6/n] make default fs io manager memoizable.
Wed, Jul 28, 6:47 PM
cdecarolis requested review of D9130: [mypy] typing.NamedTuple for dependency.py.
Wed, Jul 28, 6:04 PM
cdecarolis added a comment to D9061: [memoization 4/n][rfc] memoization strategy for jobs.

This strategy only really works for solids. A follow up diff will address the resources situation. Not sure if the best solution there is to keep the version argument on the resource decorator and make it not required to use memoization, or to add a resource_versioning_strategy argument (also not required) as well.

Hmm, I feel like I need to know the whole end state to evaluate this.

I think the f(solid) -> version pattern is reasonable, but should it be passed in just as a free function or something like a class with staticmethods that meets some interface MemoizationStrategy or something

Wed, Jul 28, 4:59 PM
cdecarolis requested review of D9114: [mypy] leftover definition_config_schema.
Wed, Jul 28, 1:20 AM
cdecarolis added a comment to D9074: [caprisun] Run ops after assets.

While I can def see the utility of the idea, this interface feels a bit off to me. From what I understand, the difference between nodes and logical assets isn't entirely cut and dry, and I'm wondering if people shouldn't be able to just add nodes to their list of assets, and we infer them as such / foist them into an asset?

Wed, Jul 28, 12:08 AM
cdecarolis accepted D9098: add instructions for setting up DAGSTER_HOME to new-project README.

Nice

Wed, Jul 28, 12:02 AM

Tue, Jul 27

cdecarolis added a comment to D9059: [memoization 2/n] provide default base dir for versioned fs io manager.

cc @yuhan @alangenfeld might abandon this in favor of https://dagster.phacility.com/D9085, but the discussion is still relevant I think. I'm certainly open to having a versioned_outputs directory or something at that layer.

Tue, Jul 27, 8:03 PM
cdecarolis abandoned D9088: [crag] Remove use of executor from execute_in_process..

This doesn't actually work

Tue, Jul 27, 7:07 PM
cdecarolis requested review of D9088: [crag] Remove use of executor from execute_in_process..
Tue, Jul 27, 7:03 PM
cdecarolis added a comment to D9045: [crag] Add fields not brought about in default config, fix bug with execute_in_process and default config, catch bad config earlier on ..

@alangenfeld upon further examination, I don't think the execute_plan solution can work. Even if we keep around the original executors, we still need to create a new pipeline/mode in order to switch out the io manager. Then, we'll get a different set of complaints regarding the fact that we're using a non-persistent IO manager with an out of proc executor. I think we're forced to either do the config replacement or the permissive thing.

Tue, Jul 27, 6:59 PM
cdecarolis requested review of D9085: [memoization 6/n] make default fs io manager memoizable.
Tue, Jul 27, 6:00 PM
cdecarolis updated the diff for D9061: [memoization 4/n][rfc] memoization strategy for jobs.

Fix tests

Tue, Jul 27, 5:30 PM
cdecarolis added a comment to D9045: [crag] Add fields not brought about in default config, fix bug with execute_in_process and default config, catch bad config earlier on ..

cc @alangenfeld ^ bc I forgot to tag you

Tue, Jul 27, 5:15 PM
cdecarolis added a comment to D9045: [crag] Add fields not brought about in default config, fix bug with execute_in_process and default config, catch bad config earlier on ..

have the "special" in process executor change make the whole execution section permissive at RunConfigSchema generation time - I think this is promising

I know this isn't a huge regression, but it still feels like a regression to me. I feel like it's a pretty core change for what is fundementally a workaround.

Tue, Jul 27, 4:06 AM
cdecarolis updated the summary of D9061: [memoization 4/n][rfc] memoization strategy for jobs.
Tue, Jul 27, 3:58 AM
cdecarolis updated the diff for D9061: [memoization 4/n][rfc] memoization strategy for jobs.

Attempting new strategy - ingest solid definition and produce version from it.

Tue, Jul 27, 3:49 AM

Mon, Jul 26

cdecarolis requested review of D9060: [memoization 5/n] enable memoization with root input managers.
Mon, Jul 26, 11:45 PM
cdecarolis planned changes to D9061: [memoization 4/n][rfc] memoization strategy for jobs.

Respin with approach Sandy & I discussed: A function that takes in a solid definition and produces a version from it.

Mon, Jul 26, 11:02 PM
cdecarolis requested review of D9061: [memoization 4/n][rfc] memoization strategy for jobs.
Mon, Jul 26, 9:21 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Finish updating snapshots

Mon, Jul 26, 7:49 PM
cdecarolis added inline comments to D9045: [crag] Add fields not brought about in default config, fix bug with execute_in_process and default config, catch bad config earlier on ..
Mon, Jul 26, 7:35 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Fix snapshots and cli tests

Mon, Jul 26, 6:00 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Fix snapshots (tests are passing on local)

Mon, Jul 26, 5:52 PM
cdecarolis requested review of D9059: [memoization 2/n] provide default base dir for versioned fs io manager.
Mon, Jul 26, 5:48 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Fix issue with version retrieval

Mon, Jul 26, 5:05 PM
cdecarolis added inline comments to D9045: [crag] Add fields not brought about in default config, fix bug with execute_in_process and default config, catch bad config earlier on ..
Mon, Jul 26, 4:58 PM

Fri, Jul 23

cdecarolis requested review of D9045: [crag] Add fields not brought about in default config, fix bug with execute_in_process and default config, catch bad config earlier on ..
Fri, Jul 23, 6:26 PM
cdecarolis added inline comments to D8550: [op] permit op config entry.
Fri, Jul 23, 6:02 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Address comments.

Fri, Jul 23, 5:57 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Up

Fri, Jul 23, 5:55 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Revamp tests, revamp serialization of step output versions, more efficient resource implementation

Fri, Jul 23, 5:53 PM

Thu, Jul 22

cdecarolis updated the diff for D8923: [memoization improvements 2/n] allow dagster execution with no steps..

Add description to dev loop tests

Thu, Jul 22, 7:02 PM
cdecarolis accepted D9014: support get_mapping_key in direct invocation.

LGTM aside from testing nit

Thu, Jul 22, 6:26 PM
cdecarolis updated the diff for D8920: [memoization improvements 1/n] get rid of step_keys_to_execute parameter on create_run_for_pipeline, simplify implementation of create_run_for_pipeline..

Get rid of universal step_keys_to_execute check

Thu, Jul 22, 4:15 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Address comments: make implementations more readable, add documentation, add hashing tests, fix errors

Thu, Jul 22, 1:23 AM

Wed, Jul 21

cdecarolis updated the diff for D8550: [op] permit op config entry.

Fix issue with config post-processing

Wed, Jul 21, 8:14 PM
cdecarolis updated the diff for D8979: [crag] partition schedules migration guide.

Snapshot up

Wed, Jul 21, 7:18 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Give implementations another pass. Finish addressing comments

Wed, Jul 21, 7:14 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Update implementation and addressed most comments

Wed, Jul 21, 6:43 PM
cdecarolis updated the diff for D8979: [crag] partition schedules migration guide.

Second pass

Wed, Jul 21, 2:46 PM

Tue, Jul 20

cdecarolis requested review of D8979: [crag] partition schedules migration guide.
Tue, Jul 20, 11:46 PM
cdecarolis requested review of D8980: Fix sensor testing docs.
Tue, Jul 20, 11:32 PM
cdecarolis added inline comments to D8920: [memoization improvements 1/n] get rid of step_keys_to_execute parameter on create_run_for_pipeline, simplify implementation of create_run_for_pipeline..
Tue, Jul 20, 10:18 PM
cdecarolis added a comment to D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

cc @alangenfeld I
think it's also worth considering: what end state do we want to reach? Do we envision memoization to always be opt-in? Or could we at some point make it opt-out?
Making it opt-out would require making our default io managers memoizable (which should be possible in my eyes), and providing a default version on all solids/resources. Is that a crazy notion?

Tue, Jul 20, 10:07 PM
cdecarolis added a comment to D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

what is the right way to opt-in to memoization behavior? Is it a toggle on the Job? Something else? I don't know if I have a good answer.

Tue, Jul 20, 10:05 PM
cdecarolis added inline comments to D8963: [memoization 3/n] move core memoization logic to live on the execution plan.
Tue, Jul 20, 10:02 PM
cdecarolis added a comment to D8923: [memoization improvements 2/n] allow dagster execution with no steps..

feels like there should be a few more tests but im struggling to think of what cases make sense to test. reexecute_pipeline with explicit empty step selection?

Tue, Jul 20, 10:00 PM
cdecarolis updated the diff for D8963: [memoization 3/n] move core memoization logic to live on the execution plan.

Fix tests

Tue, Jul 20, 7:44 PM
cdecarolis published D8963: [memoization 3/n] move core memoization logic to live on the execution plan for review.

Still more tests to fix but looking for feedback on the general idea

Tue, Jul 20, 5:23 PM
cdecarolis added inline comments to D8550: [op] permit op config entry.
Tue, Jul 20, 5:22 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Fix docs example tests

Tue, Jul 20, 4:50 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Update yet more snapshot tests

Tue, Jul 20, 4:47 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Fix more snap tests

Tue, Jul 20, 3:55 PM
cdecarolis accepted D8898: [mypy] update check.inst to not blank out types.

I think this looks good. Some of the checks I don't have full context on but the lack of mypy complaining seems like it should be alright (until we mypy more of the system and find out something else is broken).

Tue, Jul 20, 3:52 PM
cdecarolis added inline comments to D8942: [crag] Job description.
Tue, Jul 20, 3:44 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Update snaps

Tue, Jul 20, 3:29 PM
cdecarolis updated the diff for D8903: Add solid as parameter to build_hook_context.

Address comments

Tue, Jul 20, 3:23 PM
cdecarolis added a comment to D8550: [op] permit op config entry.

If we want to go back to the previous direction, we could walk / refer to the pipeline [1] structure when transforming the config.

The benefit this current direction has is providing something that we could be sent over graphql to dagit to allow the frontend to more gracefully handle these field aliases.

for what its worth I think this direction could get to a better feeling place with some iteration

Tue, Jul 20, 3:10 PM

Mon, Jul 19

cdecarolis updated the diff for D8903: Add solid as parameter to build_hook_context.

Up

Mon, Jul 19, 3:18 PM

Fri, Jul 16

cdecarolis updated the diff for D8903: Add solid as parameter to build_hook_context.

Add graph container when constructing solid

Fri, Jul 16, 11:15 PM
cdecarolis retitled D8923: [memoization improvements 2/n] allow dagster execution with no steps. from allow dagster execution with no steps. to [memoization improvements 2/n] allow dagster execution with no steps..
Fri, Jul 16, 9:26 PM
cdecarolis retitled D8920: [memoization improvements 1/n] get rid of step_keys_to_execute parameter on create_run_for_pipeline, simplify implementation of create_run_for_pipeline. from get rid of step_keys_to_execute parameter on create_run_for_pipeline, simplify implementation of create_run_for_pipeline. to [memoization improvements 1/n] get rid of step_keys_to_execute parameter on create_run_for_pipeline, simplify implementation of create_run_for_pipeline..
Fri, Jul 16, 9:25 PM
cdecarolis requested review of D8923: [memoization improvements 2/n] allow dagster execution with no steps..
Fri, Jul 16, 7:52 PM
cdecarolis abandoned D8524: [crag] execute_job API.

launch API attempt up next

Fri, Jul 16, 7:14 PM
cdecarolis published D8920: [memoization improvements 1/n] get rid of step_keys_to_execute parameter on create_run_for_pipeline, simplify implementation of create_run_for_pipeline. for review.
Fri, Jul 16, 5:49 PM
cdecarolis added a comment to D8903: Add solid as parameter to build_hook_context.

@alangenfeld how would you feel about requiring both solid and container to use the context.solid param? That way we could avoid a lot of the nastiness that results from making the container optional on Solid

Fri, Jul 16, 5:43 PM
cdecarolis requested review of D8916: put under test that jobs with partitioned config can be successfully backfilled..
Fri, Jul 16, 12:12 AM

Thu, Jul 15

cdecarolis published D8909: switch from using String to Text to match schema for mode col for review.
Thu, Jul 15, 6:57 PM
cdecarolis accepted D8907: nav entry for the graph/job/op migration guide.

Nice. Maybe move it a bit higher on the list?

Thu, Jul 15, 6:20 PM
cdecarolis requested review of D8903: Add solid as parameter to build_hook_context.
Thu, Jul 15, 6:17 PM
cdecarolis updated the diff for D8550: [op] permit op config entry.

Implement an approach that preserves correctness - at great great cost

Thu, Jul 15, 12:21 AM

Wed, Jul 14

cdecarolis added a comment to D8678: [RFC] time-based partitioned config and schedule from partitioned config.

Gonna go with separate args for minute and hour to keep in theme.

Wed, Jul 14, 5:42 PM