Pipeline to download real-time JSON files from Bay Bikes.
Seems fine! Few suggestions and open questions for my edification. We might also want an integration test for this to make sure things work or at least a preset for confirmation?
Is the point of this pipelines to download schemas from gbfs periodically to make sure that schemas haven't changed? If so can we rename the pipeline to something of that effect: 'sync_gbfs_schemas` or something?
Instead of making the same repetitive call which can be a bit overwhelming, I find it to be easier on the eyes to throw the inputs into a constant list which sits outside of the pipeline for example:
SCHEMA_INFORMATION = [ ('schema/gbfs.json', 'https://gbfs.baywheels.com/gbfs/gbfs.json', 'download_and_validate_gbfs'), .... ] @pipeline def download_gbfs_files(): for expected_schema_file, bay_bikes_schema_url, solid_name in SCHEMA_INFORMATION: valid_schema_download_solid = download_and_validate_json(...., bay_bikes_schema_url, name=solid_name) valid_schema_download_solid()
That way you can decouple setup from pipeline logic which makes life easier for everyone when you inevitably start to make changes to the pipeline.
This might just be me, but I actually like when people are explicit about solid factories, but that's up to you!
So this seems legit but why the composite solid route where you have a ton of these factories as supposed to a pipeline that alias's a different types of solids which downloads from json, validates, saves to disk? Is it because you don't actually need to configure them manually via dagit?
Nice. I like this!!!!
hm, no. this assumes the schemas are already present and downloads the actual files, i'll try to find more descriptive names.
what do you mean be explicit?
i'm not sure i understand..