scaled down minimum viable product#
date: 2025-06-19
inspiration: idc-index
- if they have username and password for private data
med-imagenet download <PatientID>/<SeriesInstanceUID>/<Collection> --nbia_username <x> --nbia_password <x>
- if they only want modalities
- this should only download rtstructs with GTVp and their references
complicated but useful approach:
download stuff: - check if the data already exists - be able to download 'newer' series if projects are ongoing
database building: - use something like tcia's updated series endpoint to only update the new series every month or so - to prevent running a huge job
TODO:#
- Setup releases on this repo for every database scraped from TCIA (crawled)
- options for this:
- download everything and crawl :(
- be able to scrape from the tcia 'GetDicomTags' Endpoint > create DICOM file (without pixel data)
- Setup a way to query the crawl db
- MVP: some minimal query options, to be extended in the future
- Use the output of the query with NBIAToolkit to download
Stretch-goal#
create a quick website using streamlit that uses the crawl db and lets users investigate data - query the db really quickly - plot some metrics - potentially generate manifest easily