Developer Notes
Removing COVID-19 CT Lung from analysis
2026-01-07
We chose to remove these samples from analysis as they do not have tumours present and only include a small number of samples.
Did not include them in the dataset_anatomy_match.csv we made.
Column setup for aaura index and med-imagetools processing
2026-01-07 For datasets that have been processed by med-imagetools, we will take in the index-simple.csv and extract the subset of columns we use for the aaura index.
Columns that will need to be calculated in addition are:
- annotation_type
- annotation_coords
- largest_slice_index
- lesion_location
- source (Optional, mostly used for datasets composed of multiple other datasets)
Mask indexing for saving starting at 1 to reflect labels
2026-01-07
Starting the mask labelling at 1 to reflect the voxel values they were in the original image. 0 is reserved for background.
Add option to append new processed dataset index to existing index
2026-01-08
Today's solution chosen for handling what to do if processing new data but you want to preserve the already processed data.
append_index argument can be set such that an existing index file will be loaded in, the new processed data index will be concatenated to the end, checked for duplicates, sorted, and then saved. For duplicate checking, the new processed data entry will be kept.
This way, if processing a dataset breaks in the middle, can update the metadata file with what data to process. Didn't want to implement image existence checking yet.
mit_to_aaura_index setup for datasets
2026-01-12
HCC-TACE-Seg
datasource = "TCIA"
dataset = "HCC-TACE-Seg"
ROI_key = "Mass"
image_modality = "CT"
mask_modality = ["SEG"]
disease_site = "Abdomen"
4D-Lung
datasource = "TCIA"
dataset = "4D-Lung"
ROI_key = "Tumor_"
image_modality = "CT"
mask_modality = ["RTSTRUCT"]
disease_site = "Lung"
RIDER-LungCT-Seg
datasource = "TCIA"
dataset = "RIDER-LungCT-Seg"
ROI_key = "GTVp|Neoplasm"
image_modality = "CT"
mask_modality = ["SEG","RTSTRUCT"]
disease_site = "Lung"
RADCURE
* Using mit_RADCURE_windowed
datasource = "TCIA"
dataset = "RADCURE"
ROI_key = "GTVp"
image_modality = "CT"
mask_modality = ["RTSTRUCT"] # THIS HAS TO BE A LIST
disease_site = "HeadNeck"
special_prefix = "OCSCC_"
special_suffix = "_windowed"
CPTAC-CCRCC
datasource = "TCIA"
dataset = "CPTAC-CCRCC"
ROI_key = ".*"
image_modality = "CT"
mask_modality = ["RTSTRUCT"] # THIS HAS TO BE A LIST
disease_site = "Abdomen"
special_prefix = ""
special_suffix = ""
CPTAC-PDA
datasource = "TCIA"
dataset = "CPTAC-PDA"
ROI_key = ".*"
image_modality = "CT"
mask_modality = ["RTSTRUCT"] # THIS HAS TO BE A LIST
disease_site = "Abdomen"
special_prefix = ""
special_suffix = ""