Page MenuHomePhabricator

sandyryza (Sandy Ryza)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 3 2020, 4:04 PM (28 w, 5 d)

Recent Activity

Tue, Oct 20

sandyryza accepted D4830: Yet another line number fix 😔.

Gah you're right

Tue, Oct 20, 5:00 PM
sandyryza added inline comments to D4830: Yet another line number fix 😔.
Tue, Oct 20, 4:36 PM
sandyryza abandoned D4745: Don't load pyspark when importing dagster_aws.emr.

Superseded by https://dagster.phacility.com/D4806

Tue, Oct 20, 1:14 AM
sandyryza requested review of D4806: create dagster-aws-pyspark library with emr_pyspark_step_launcher.
Tue, Oct 20, 1:09 AM

Mon, Oct 19

sandyryza closed D4763: remove selector from tutorial.
Mon, Oct 19, 11:55 PM
sandyryza committed R1:fcc2fdb20a2c: remove selector from tutorial (authored by sandyryza).
remove selector from tutorial
Mon, Oct 19, 11:55 PM
sandyryza added a comment to D4811: RFC: decorator-based mode definition.

Would you eliminate @solid?

I think there are almost no cases where SolidDefinition is preferable to @solid, because solids wrap functions. Unless we strongly guide them otherwise, I think some people might prefer to use ModeDefinition(resource_defs), because they might find putting their resource_defs inside a function weird.

Mon, Oct 19, 7:00 PM
sandyryza added a comment to D4811: RFC: decorator-based mode definition.

I think this is a net positive. My two reservations would be:

  • Two ways of doing something is worse than one.
  • Operating on a function instead of a resource dict gives users an extra degree of flexibility that they might find confusing.
Mon, Oct 19, 6:55 PM

Fri, Oct 16

sandyryza added inline comments to D4749: Reform docs frontpage.
Fri, Oct 16, 10:03 PM

Wed, Oct 14

sandyryza closed D4640: publish lakehouse module.
Wed, Oct 14, 10:05 PM
sandyryza committed R1:c6cee64f699d: publish lakehouse module (authored by sandyryza).
publish lakehouse module
Wed, Oct 14, 10:05 PM
sandyryza updated the diff for D4640: publish lakehouse module.

up

Wed, Oct 14, 8:44 PM
sandyryza retitled D4640: publish lakehouse module from Revert "Revert "publish lakehouse module"" to publish lakehouse module.
Wed, Oct 14, 8:43 PM
sandyryza updated the diff for D4640: publish lakehouse module.

up

Wed, Oct 14, 8:38 PM
sandyryza accepted D4782: [dagit] Remove "historical" tag on Run table.

I like the result!

Wed, Oct 14, 8:37 PM
sandyryza added a comment to D4579: address-store-1 AddressStore.

Quick turnaround! I like the direction this has headed in a lot. I think the AssetStore API makes a ton of sense, and I like how it's threaded through.

Wed, Oct 14, 6:56 PM
sandyryza added a reviewer for D4763: remove selector from tutorial: yichendai.
Wed, Oct 14, 6:05 PM
sandyryza published D4781: RFC: allow optional configuration of parent pipeline solids for subsetted pipeline for review.
Wed, Oct 14, 5:45 PM
sandyryza accepted D4780: fix longitudinal config.
Wed, Oct 14, 5:19 PM

Tue, Oct 13

sandyryza closed D4768: back-compat for ObjectStoreOperationResultData.
Tue, Oct 13, 10:40 PM
sandyryza committed R1:0a870782e642: back-compat for ObjectStoreOperationResultData (authored by sandyryza).
back-compat for ObjectStoreOperationResultData
Tue, Oct 13, 10:40 PM
sandyryza updated the test plan for D4763: remove selector from tutorial.
Tue, Oct 13, 10:19 PM
sandyryza updated the summary of D4763: remove selector from tutorial.
Tue, Oct 13, 10:19 PM
sandyryza updated the diff for D4768: back-compat for ObjectStoreOperationResultData.

up

Tue, Oct 13, 10:18 PM
sandyryza requested review of D4763: remove selector from tutorial.
Tue, Oct 13, 10:18 PM
sandyryza closed D4760: fix test_addresses_for_version on windows.
Tue, Oct 13, 8:18 PM
sandyryza committed R1:87c85d202a82: fix test_addresses_for_version on windows (authored by sandyryza).
fix test_addresses_for_version on windows
Tue, Oct 13, 8:18 PM
sandyryza requested review of D4768: back-compat for ObjectStoreOperationResultData.
Tue, Oct 13, 7:29 PM
sandyryza added a comment to D4739: asset store discussion pseudo code.

We discussed this a bit on this thread: https://threads.com/34386850200

Tue, Oct 13, 4:59 PM
sandyryza accepted D4713: This adds a CLI to print the stored versions of each output produced by a pipeline. It can be executed by running `dagster pipeline print_versions <options>`..

This LGTM. As chatted about in Slack, we should get approval in the #proposed-api-changes channel.

Tue, Oct 13, 1:00 AM
sandyryza closed D4738: have IntermediateStorage determine version addresses.
Tue, Oct 13, 12:47 AM
sandyryza committed R1:5375c346c28c: have IntermediateStorage determine version addresses (authored by sandyryza).
have IntermediateStorage determine version addresses
Tue, Oct 13, 12:47 AM

Mon, Oct 12

sandyryza updated the diff for D4738: have IntermediateStorage determine version addresses.

up

Mon, Oct 12, 6:46 PM
sandyryza updated the diff for D4738: have IntermediateStorage determine version addresses.

up

Mon, Oct 12, 5:59 PM
sandyryza added inline comments to D4738: have IntermediateStorage determine version addresses.
Mon, Oct 12, 5:50 PM

Fri, Oct 9

sandyryza requested review of D4749: Reform docs frontpage.
Fri, Oct 9, 10:29 PM
sandyryza requested review of D4745: Don't load pyspark when importing dagster_aws.emr.
Fri, Oct 9, 9:51 PM
sandyryza added a comment to D4738: have IntermediateStorage determine version addresses.

Awesome stuff - just want to make sure I understand the breaking correctly here: A call such as Output("value", address="some_address") is no longer permitted?

Fri, Oct 9, 8:54 PM
sandyryza closed D4733: test_addresses_for_version wasn't running.
Fri, Oct 9, 8:52 PM
sandyryza committed R1:b4a2d29e01a5: test_addresses_for_version wasn't running (authored by sandyryza).
test_addresses_for_version wasn't running
Fri, Oct 9, 8:52 PM
sandyryza added a comment to D4730: Enabled single address to point to multiple step outputs..

I might not be understanding the issue correctly, but I actually think the current behavior might be the correct behavior. If we have two steps that are writing to the same address, then the step that writes last will presumably be overwriting what was written by any earlier steps. Which would mean that it's incorrect to indicate that we've cached the results written by the earlier steps.

Fri, Oct 9, 6:40 PM
sandyryza closed D4716: Conditional execution docs example.
Fri, Oct 9, 5:35 PM
sandyryza committed R1:10548e74d70f: Conditional execution docs example (authored by sandyryza).
Conditional execution docs example
Fri, Oct 9, 5:35 PM
sandyryza requested review of D4738: have IntermediateStorage determine version addresses.
Fri, Oct 9, 5:13 AM
sandyryza requested review of D4733: test_addresses_for_version wasn't running.
Fri, Oct 9, 5:11 AM
sandyryza added a comment to D4713: This adds a CLI to print the stored versions of each output produced by a pipeline. It can be executed by running `dagster pipeline print_versions <options>`..

Nice! This implementation looks very straightforward. I thought it would need to be more complicated.

Fri, Oct 9, 5:10 AM
sandyryza accepted D4736: [RFC] DagsterInstance.optimize_for_dagit.

This seems straightforward enough. Any idea why we went with NullPool in the first place?

Fri, Oct 9, 12:33 AM
sandyryza accepted D4717: Fix sphinx docs for k8s run launcher.
Fri, Oct 9, 12:32 AM
sandyryza retitled D4717: Fix sphinx docs for k8s run launcher from Fix sphnix docs for k8s run launcher to Fix sphinx docs for k8s run launcher.
Fri, Oct 9, 12:31 AM
sandyryza accepted D4727: Enabled versioning for commonly used builtin types when loading inputs from config..

LGTM

Fri, Oct 9, 12:16 AM

Thu, Oct 8

sandyryza added a comment to D4579: address-store-1 AddressStore.

Consolidating get/set_intermediate with get/set_intermediate_to/from_address seems like the right move to me. These aren't public APIs, so I'm not worried about the change.

Thu, Oct 8, 4:27 PM
sandyryza closed D4678: factor ObjectStoreOperation creation into IntermediateStorage.
Thu, Oct 8, 4:11 PM
sandyryza committed R1:6e312868ebe5: factor ObjectStoreOperation creation into IntermediateStorage (authored by sandyryza).
factor ObjectStoreOperation creation into IntermediateStorage
Thu, Oct 8, 4:11 PM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Thu, Oct 8, 3:13 PM

Wed, Oct 7

sandyryza requested review of D4716: Conditional execution docs example.
Wed, Oct 7, 11:39 PM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Wed, Oct 7, 9:12 PM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Wed, Oct 7, 8:20 PM
sandyryza added a comment to D4678: factor ObjectStoreOperation creation into IntermediateStorage.

Going forward, ObjectStoreOperation sounds only related to object_store. how about create a IntermediateOperation (or maybe AssetOperation) event which has information of either ObjectStoreOperation or AssetMaterialization? - so we could consolidate all the set/get operations and merge all different ways to handle intermediate (both in default intermediate basedir and externally stored through object_store or type materializer/loader)

Wed, Oct 7, 8:18 PM
sandyryza accepted D4659: Enabled hydration of step inputs from address when address exists. Added test boilerplate for entire memoized dev loop..
Wed, Oct 7, 8:12 PM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Wed, Oct 7, 5:10 PM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Wed, Oct 7, 4:35 PM
sandyryza added inline comments to D4659: Enabled hydration of step inputs from address when address exists. Added test boilerplate for entire memoized dev loop..
Wed, Oct 7, 4:26 PM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Wed, Oct 7, 4:05 PM
sandyryza closed D4696: with memoized reexecution, only copy outputs that current plan won't generate.
Wed, Oct 7, 4:04 PM
sandyryza committed R1:f1a162aadb57: with memoized reexecution, only copy outputs that current plan won't generate (authored by sandyryza).
with memoized reexecution, only copy outputs that current plan won't generate
Wed, Oct 7, 4:04 PM
sandyryza added a comment to D4640: publish lakehouse module.

Problem from last time: https://dagster.phacility.com/P98

Wed, Oct 7, 3:36 PM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Wed, Oct 7, 3:11 AM
sandyryza updated the diff for D4696: with memoized reexecution, only copy outputs that current plan won't generate.

up

Wed, Oct 7, 1:33 AM
sandyryza accepted D4694: Make execution plan available to context.
Wed, Oct 7, 1:17 AM
sandyryza requested review of D4696: with memoized reexecution, only copy outputs that current plan won't generate.
Wed, Oct 7, 1:15 AM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Wed, Oct 7, 12:57 AM
sandyryza closed D4684: eliminate InMemoryIntermediateStorage.
Wed, Oct 7, 12:47 AM
sandyryza committed R1:5e0365b3cf8e: eliminate InMemoryIntermediateStorage (authored by sandyryza).
eliminate InMemoryIntermediateStorage
Wed, Oct 7, 12:47 AM
sandyryza added a comment to D4692: Fetch PySpark driver IP if not set in resource config..

As discussed with Johann, if we can get away with avoiding this fanciness and having the user define a custom resource, I think that's preferable. If this becomes a widespread request and source of pain, then worth considering building this into the pyspark_resource at that point.

Wed, Oct 7, 12:05 AM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Wed, Oct 7, 12:01 AM

Tue, Oct 6

sandyryza added a comment to D4694: Make execution plan available to context.

Have you checked to see whether this gets the test perf back into a reasonable range?

Tue, Oct 6, 11:22 PM
sandyryza published D4694: Make execution plan available to context for review.

Thanks for taking this up. This looks good to me. Have you checked to see whether this gets the test perf back into a reasonable range? This cuts out generating the execution plan for every output, but it still resolves versions for every output, so wondering whether it's worth a followup to rein that in.

Tue, Oct 6, 11:21 PM
sandyryza updated the diff for D4678: factor ObjectStoreOperation creation into IntermediateStorage.

up

Tue, Oct 6, 11:03 PM
sandyryza published D4678: factor ObjectStoreOperation creation into IntermediateStorage for review.
Tue, Oct 6, 6:22 PM
sandyryza requested review of D4684: eliminate InMemoryIntermediateStorage.
Tue, Oct 6, 4:41 PM
sandyryza accepted D4659: Enabled hydration of step inputs from address when address exists. Added test boilerplate for entire memoized dev loop..

LGTM!

Tue, Oct 6, 3:46 PM

Mon, Oct 5

sandyryza added a comment to D4659: Enabled hydration of step inputs from address when address exists. Added test boilerplate for entire memoized dev loop..

This seems reasonable to me. Had mostly minor stylistic / readability comments.

Mon, Oct 5, 7:04 PM
sandyryza added inline comments to D4658: added addresses to StepInput.
Mon, Oct 5, 7:00 PM
sandyryza closed D4661: run non-dagit tests when diff includes non-dagit changes.
Mon, Oct 5, 4:12 PM
sandyryza committed R1:4bc546d205ef: run non-dagit tests when diff includes non-dagit changes (authored by sandyryza).
run non-dagit tests when diff includes non-dagit changes
Mon, Oct 5, 4:12 PM
sandyryza closed D4650: enable partial versioning of a pipeline.
Mon, Oct 5, 4:12 PM
sandyryza committed R1:d4d055d5f4be: enable partial versioning of a pipeline (authored by sandyryza).
enable partial versioning of a pipeline
Mon, Oct 5, 4:12 PM

Fri, Oct 2

sandyryza closed D4570: partition asset materializations in longitudinal pipeline.
Fri, Oct 2, 11:31 PM
sandyryza committed R1:7f323b76be9a: partition asset materializations in longitudinal pipeline (authored by sandyryza).
partition asset materializations in longitudinal pipeline
Fri, Oct 2, 11:31 PM
sandyryza updated the diff for D4570: partition asset materializations in longitudinal pipeline.

up

Fri, Oct 2, 10:39 PM
sandyryza updated the diff for D4661: run non-dagit tests when diff includes non-dagit changes.

up

Fri, Oct 2, 10:39 PM
sandyryza closed D4660: fix test_solids snapshot after asset partitions.
Fri, Oct 2, 10:39 PM
sandyryza committed R1:cb42aaeb9546: fix test_solids snapshot after asset partitions (authored by sandyryza).
fix test_solids snapshot after asset partitions
Fri, Oct 2, 10:38 PM
sandyryza requested review of D4661: run non-dagit tests when diff includes non-dagit changes.
Fri, Oct 2, 10:31 PM
sandyryza updated the diff for D4660: fix test_solids snapshot after asset partitions.

up

Fri, Oct 2, 10:20 PM
sandyryza updated the diff for D4570: partition asset materializations in longitudinal pipeline.

up

Fri, Oct 2, 10:14 PM
sandyryza closed D4526: AssetMaterialization partitions.
Fri, Oct 2, 8:34 PM
sandyryza committed R1:8564f4f56f5b: AssetMaterialization partitions (authored by sandyryza).
AssetMaterialization partitions
Fri, Oct 2, 8:34 PM
sandyryza abandoned D4636: typo in dbt apidocs.
Fri, Oct 2, 8:20 PM
sandyryza updated the diff for D4640: publish lakehouse module.

up

Fri, Oct 2, 8:19 PM