AutoPipeline Usage
To use AutoPipeline, follow the installation instructions found at https://github.com/bhklab/med-imagetools#installing-med-imagetools.
Intro to AutoPipeline
AutoPipeline will crawl and process any DICOM dataset. To run the most basic variation of the script, run the following command:
Replace INPUT_DIRECTORY with the directory containing all your DICOM data, OUTPUT_DIRECTORY with the directory that you want the data to be outputted to.
The --modalities
option allows you to only process certain modalities that are present in the DICOM data. The available modalities are:
- CT
- MR
- RTSTRUCT
- PT
- RTDOSE
Set the modalities you want to use by separating each one with a comma. For example, to use CT and RTSTRUCT, run AutoPipeline with --modalities CT,RTSTRUCT
AutoPipeline Flags
AutoPipeline comes with many built-in features to make your data processing easier:
-
Spacing
The spacing for the output image. default = (1., 1., 0.). 0. spacing means maintaining the image's spacing as-is. Spacing of (0., 0., 0.,) will not resample any image.
-
Parallel Job Execution
The number of jobs to be run in parallel. Set -1 to use all cores. default = -1
-
Dataset Graph Visualization (not recommended for large datasets)
Whether to visualize the entire dataset using PyViz.
-
Continue Pipeline Processing
Whether to continue the most recent run of AutoPipeline that terminated prematurely for that output directory. Will only work if the
.imgtools
directory was not deleted from previous run. Using this flag will retain the same flags and parameters carried over from the previous run. -
Processing Dry Run
Whether to execute a dry run, only generating the .imgtools folder, which includes the crawled index.
-
Show Progress
Whether to print AutoPipeline progress to the standard output.
-
Warning on Subject Processing Errors
Whether to warn instead of error when processing subjects
-
Overwrite Existing Output Files
Whether to overwrite existing file outputs
-
Update existing crawled index
Whether to update existing crawled index
Flags for parsing RTSTRUCT contours/regions of interest (ROI)
The contours can be selected by creating a YAML file to define a regular expression (regex), or list of potential contour names, or a combination of both. If none of the flags are set or the YAML file does not exist, the AutoPipeline will default to processing every contour.
-
Defining YAML file path for contours
Whether to read a YAML file that defines regex or string options for contour names for regions of interest (ROI). By default, it will look for and read from
INPUT_DIRECTORY/roi_names.yaml
Path to the above-mentioned YAML file. Path can be absolute or relative. default = "" (each ROI will have its own label index in dataset.json for nnUNet)
-
Defining contour selection behaviour
A typical ROI YAML file may look like this:
By default, all ROIs that match any of the regex or strings will be saved as one label. For example, GTVn, GTVp, GTVfoo will be saved as GTV. However, this is not always the desirable behaviour.
Only select the first matching regex/string
The StructureSet iterates through the regex and string in the order it is written in the YAML. When this flag is set, once any contour matches the regex or string, the ROI search is interrupted and moves to the next ROI. This may be useful if you have a priority order of potentially matching contour names.
If a patient has contours
[GTVp, LNUG, IL1, IVL4]
, with the above YAML file and--roi_select_first
flag set, it will only process[GTVp, LNUG, IL1]
contours as[GTV, LUNG, NODES]
, respectively.Process each matching contour as a separate ROI
Any matching contour will be saved separate with its contour name as a suffix to the ROI name. This will not apply to ROIs that only have one regex/string.
If a patient had contours[GTVp, LNUG, IL1, IVL4]
, with the above YAML file and--roi_sepearate
flag set, it will process the contours as[GTV, LUNG_LNUG, NODES_IL1, NODES_IVL4]
, respectively. -
Ignore patients with no contours
Ignore patients with no contours that match any of the defined regex or strings instead of throwing error.
Additional nnUNet-specific flags
-
Format Output for nnUNet Training
Whether to format output for nnUNet training. Modalities must be CT,RTSTRUCT or MR,RTSTRUCT.
--modalities CT,RTSTRUCT
or--modalities MR,RTSTRUCT
-
Training Size
Training size of the train-test-split. default = 1.0 (all data will be in imagesTr/labelsTr)
-
Random State
Random state for the train-test-split. Uses sklearn's train_test_split(). default = 42
-
Custom Train-Test-Split YAML
Whether to use a custom train-test-split. Must be in a file found at
INPUT_DIRECTORY/custom_train_test_split.yaml
. All subjects not defined in this file will be randomly split to fill the defined value for--train_size
(default = 1.0). File must conform to:
Additional flags for nnUNet Inference
-
Format Output for nnUNet Inference
Whether to format output for nnUNet Inference.
-
Path to
dataset.json
The path to the
dataset.json
file for nnUNet inference.A dataset json file may look like this: