scaled down minimum viable product#

date: 2025-06-19

inspiration: idc-index

idc-index download <PatientID>/<SeriesInstanceUID>/<Collection>

med-imagenet download <PatientID>/<SeriesInstanceUID>/<Collection>

if they have username and password for private data

med-imagenet download  <PatientID>/<SeriesInstanceUID>/<Collection> --nbia_username <x> --nbia_password <x>

if they only want modalities

med-imagenet download <Collection> --modalities "CT,RTSTRUCT" --run-<autopipeline>

this should only download rtstructs with GTVp and their references

med-imagenet download <Collection> --query RTSTRUCTS where "GTVp" in "ROINames" \
  # optionally, if i.e CT,PT,RTSTRUCT and CT,RTSTRUCT both exist, and we only want the CT,RTSTRUCT:
  --modalities "CT,RTSTRUCT"

complicated but useful approach:

med-imagenet query <Collection> <QUERY> | med-imagenet download <--from-file>

download stuff: - check if the data already exists - be able to download 'newer' series if projects are ongoing

database building: - use something like tcia's updated series endpoint to only update the new series every month or so - to prevent running a huge job

TODO:#

Setup releases on this repo for every database scraped from TCIA (crawled)
options for this:
- download everything and crawl :(
- be able to scrape from the tcia 'GetDicomTags' Endpoint > create DICOM file (without pixel data)
Setup a way to query the crawl db
MVP: some minimal query options, to be extended in the future
Use the output of the query with NBIAToolkit to download

Stretch-goal#

create a quick website using streamlit that uses the crawl db and lets users investigate data - query the db really quickly - plot some metrics - potentially generate manifest easily