Hacker news

Top
New
Past
Ask
Show
Jobs

Show HN: Streaming gigabyte medical images from S3 without downloading them (https://github.com)

100 points by el_pa_b about 9 hours ago | 31 comments | View on ycombinator

mlhpdx 20 minutes ago |

A while back I worked on a project where s3 held giant zip files containing zip files (turtles all the way down) and also made good use of range requests. I came up with seekable-s3-stream[1] to generalize working with them via an idiomatic C# stream.

[1] https://github.com/mlhpdx/seekable-s3-stream

tomnicholas1 about 1 hour ago |

The generalized form of this range-request-based streaming approach looks something like my project VirtualiZarr [0].

Many of these scientific file formats (HDF5, netCDF, TIFF/COG, FITS, GRIB, JPEG and more) are essentially just contiguous multidimensional array(/"tensor") chunks embedded alongside metadata about what's in the chunks. Efficiently fetching these from object storage is just about efficiently fetching the metadata up front so you know where the chunks you want are [1].

The data model of Zarr [2] generalizes this pattern pretty well, so that when backed by Icechunk [3], you can store a "datacube" of "virtual chunk references" that point at chunks anywhere inside the original files on S3.

This allows you to stream data out as fast as the S3 network connection allows [4], and then you're free to pull that directly, or build tile servers on top of it [5].

In the Pangeo project and at Earthmover we do all this for Weather and Climate science data. But the underlying OSS stack is domain-agnostic, so works for all sorts of multidimensional array data, and VirtualiZarr has a plugin system for parsing different scientific file formats.

I would love to see if someone could create a virtual Zarr store pointing at this WSI data!

[0]: https://virtualizarr.readthedocs.io/en/stable/

[1]: https://earthmover.io/blog/fundamentals-what-is-cloud-optimi...

[2]: https://earthmover.io/blog/what-is-zarr

[3]: https://earthmover.io/blog/icechunk-1-0-production-grade-clo...

[4]: https://earthmover.io/blog/i-o-maxing-tensors-in-the-cloud

[5]: https://earthmover.io/blog/announcing-flux

rwmj about 6 hours ago |

https://dicom.nema.org/dicom/dicomwsi/

Interesting guide to the Whole Slide Images (WSI) format. The surprising thing for me is that compression is used, and they note does not affect use in diagnostics.

Back in the day we used TIFF for a similar application (X-ray detector images).

matthberg about 7 hours ago |

Seems very similar to how maps work on the web these days, in particular protomap files [0]. I wonder if you could view the medical images in leaflet or another frontend map library with the addition of a shim layer? Cool work!

0: https://protomaps.com/

tokyovigilante about 6 hours ago |

This is really a job for JPEG-XL, which supports decode of portions of larger images and has recently been added to the DICOM standard.

Sleaker about 2 hours ago |

Maybe a bit pedantic, but if you're streaming it, then you're still downloading portions of it, yah? Just not persisting the whole thing locally before viewing it.

Edit: Looks like this is a slight discrepancy between the HN title and the GitHub description.

yread about 2 hours ago |

You could probably do it completely clientside. I have a parser for 12 scanner formats in js. It doesnt read the pixels, just parses metadata but jpeg is easy and most common anyway

lametti about 6 hours ago |

Interesting - I'm not so familiar with S3 but I wonder if this would work for WSI stored on-premises. Imposing lower network requirememts and a lightweight web viewer is very advantageous in this use case. I'll have to try it out!

isuckatcoding about 1 hour ago |

Is there a visual demo of this?

invaderJ1m about 4 hours ago |

How does this compare to things like COGs (Cloud Optimised GeoTIFFs) or other binary blob + index raster pyramid formats?

Was there a requirement to work with these formats directly without converting?

Nora23 about 6 hours ago |

How does this handle images with different compression formats?

andrewstuart about 3 hours ago |

Please don’t use AWS S3 there’s vast numbers of much cheaper compatible choices.

tonyhart7 about 7 hours ago |

hey, I need this

huflungdung about 2 hours ago |

[dead]