Skip to content

scaled down minimum viable product#

date: 2025-06-19

inspiration: idc-index

idc-index download <PatientID>/<SeriesInstanceUID>/<Collection>
med-imagenet download <PatientID>/<SeriesInstanceUID>/<Collection>
  • if they have username and password for private data
med-imagenet download  <PatientID>/<SeriesInstanceUID>/<Collection> --nbia_username <x> --nbia_password <x> 
  • if they only want modalities
med-imagenet download <Collection> --modalities "CT,RTSTRUCT" --run-<autopipeline>
  • this should only download rtstructs with GTVp and their references
    med-imagenet download <Collection> --query RTSTRUCTS where "GTVp" in "ROINames" \
      # optionally, if i.e CT,PT,RTSTRUCT and CT,RTSTRUCT both exist, and we only want the CT,RTSTRUCT:
      --modalities "CT,RTSTRUCT"
    

complicated but useful approach:

med-imagenet query <Collection> <QUERY> | med-imagenet download <--from-file> 

download stuff: - check if the data already exists - be able to download 'newer' series if projects are ongoing

database building: - use something like tcia's updated series endpoint to only update the new series every month or so - to prevent running a huge job

TODO:#

  • Setup releases on this repo for every database scraped from TCIA (crawled)
  • options for this:
    • download everything and crawl :(
    • be able to scrape from the tcia 'GetDicomTags' Endpoint > create DICOM file (without pixel data)
  • Setup a way to query the crawl db
  • MVP: some minimal query options, to be extended in the future
  • Use the output of the query with NBIAToolkit to download

Stretch-goal#

create a quick website using streamlit that uses the crawl db and lets users investigate data - query the db really quickly - plot some metrics - potentially generate manifest easily