Usage Guide
Project Configuration
Each dataset needs a configuration YAML file with the following settings filled in
DATA_SOURCE: "" # where the data came from, will be used for data organization
DATASET_NAME: "" # the name of the dataset , will be use for data organization
### MED-IMAGETOOLS settings
MIT:
MODALITIES: # Modalities to process with autopipeline
image: CT
mask: RTSTRUCT
ROI_STRATEGY: MERGE # How to handle multiple ROI matches
ROI_MATCH_MAP: # Matching map for ROIs in dataset (use if you only want to process some of the masks in a segmentation)
KEY:ROI_NAME # NOTE: there can be no spaces in KEY:ROI_NAME
The file should be saved in the config
directory and named {DATASET_NAME}.yaml
.
Data Setup
The following sections describe how to set up the data you wish to process with this pipeline following the BHKLab Data Management Protocol (DMP). This will ensure data remains separate from the project directory and accessible to other users.
Raw Data
Set up a separate main data directory outside of the project directory. We'll call this Datasets
.
In Datasets
, set up a directory for the dataset you wish to process as follows:
Datasets
|---- {DATASET_SOURCE}_{DATASET_NAME}
|-- clinical
| `-- {Clinical Data File}.csv OR {Clinical Data File}.xlsx
`-- images
|-- {DATASET_NAME}
| |-- {PatientID}
| | `-- {StudyUID}
| | |-- {Image DICOM directory}
| | | |-- 1-01.dcm
| | | |-- ...
| | | |-- 1-N.dcm
| | |-- {Mask DICOM directory}
| | | `-- 1-01.dcm
| |-- {PatientID}
| |-- ...
| `-- {PatientID}
`-- annotations
`-- {DATASET_NAME}
|-- DICOM-SR_annotation_file.dcm
|-- DICOM-SR_annotation_file.dcm
`-- DICOM-SR_annotation_file.dcm
Image directory structure may vary depending on the source. This example is based on the structure setup by TCIA when downloading with a manifest file.
However, for the pipeline to run correctly, images/{DATASET_NAME}
must exist in the {DATASET_SOURCE}_{DATASET_NAME}
directory. Everything within {DATASET_NAME} may vary though.
!!! note "BHKLab DMP Setup"
If using the BHKLab DMP, the Datasets
directory will be structured with rawdata/{DiseaseRegion}/{DATASET_SOURCE}_{DATASET_NAME}
. In the next step, you can create the symbolic link starting from {DATASET_SOURCE}_{DATASET_NAME}
.
Once this data directory is setup, run the following in a terminal from the main directory of the project.
ln -s /path/to/Datasets/{DATASET_SOURCE}_{DATASET_NAME} data/rawdata
This will create a symbolic link to your dataset in the Datasets
directory.
You can confirm this worked by running:
ls -l data/rawdata
and you should see,
total 5
-rw-rw-r-- 1 bhkuser root 1395 Jun 4 15:21 README.md
lrwxrwxrwx 1 bhkuser root 80 Jun 4 15:46 {DATASET_SOURCE}_{DATASET_NAME} -> /path/to/Datasets/{DATASET_SOURCE}_{DATASET_NAME}
Now, document the dataset you've added on the Data Sources page following the provided template.
Processed Data
If you wish to use the BHKLab DMP strategy, follow the process below.
Create a processed data directory for your dataset in the external Datasets
directory as follows:
mkdir /path/to/Datasets/procdata/{DiseaseRegion}/{DATASET_SOURCE}_{DATASET_NAME}
Note: the Disease Region must match what is in the rawdata
path to the dataset.
Now you can create a symbolic link to this directory in the project directory, like we did for the raw data.
ln -s /path/to/Datasets/procdata/{DiseaseRegion}/{DATASET_SOURCE}_{DATASET_NAME} data/procdata
Results
TODO:: describe results directory setup
Running Your Analysis
1. Running Med-ImageTools
The first step in the pipeline is to run Med-ImageTools index and autopipeline to organize and process the image and mask data.
Requirements:
1. You've set up a dataset config
yaml file as described above.
2. You've set up the symbolic links for the dataset in both rawdata
and procdata
From the home project directory, run the following:
pixi run mit config/{DATASET_NAME}.yaml
This will generate NiFTi files for each image and it's corresponding mask, where the masks will be named KEY__[ROI_NAME]
. The output format should be as follows:
``bash
data
-- procdata
-- {DATASET_SOURCE}_{DATASET_NAME}
-- images
-- mit_{DATASET_NAME}
-- {PatientID}{SampleNumber}
|-- {ImageModality}{SeriesInstanceUID}
| -- {ImageModality}.nii.gz
-- {SegmentationModality}_{SeriesInstanceUID}
`-- {KEY}__[{ROI_name}].nii.gz