Skip to content

DICOMSorter

imgtools.dicom.sort

Sorting DICOM Files by Specific Tags and Patterns.

This module provides functionality to organize DICOM files into structured directories based on customizable target patterns.

The target patterns allow metadata-driven file organization using placeholders for DICOM tags, enabling flexible and systematic storage.

Extended Summary

Target patterns define directory structures using placeholders, such as %<DICOMKey> and {DICOMKey}, which are resolved to their corresponding metadata values in the DICOM file.

This approach ensures that files are organized based on their metadata, while retaining their original basenames. Files with identical metadata fields are placed in separate directories to preserve unique identifiers.

Examples of target patterns:

- `%PatientID/%StudyID/{SeriesID}/`
- `path/to_destination/%PatientID/images/%Modality/%SeriesInstanceUID/`

Important: Only the directory structure is modified during the sorting process. The basename of each file remains unchanged.

Notes

The module ensures that:

  1. Target patterns are resolved accurately based on the metadata in DICOM files.
  2. Files are placed in directories that reflect their resolved metadata fields.
  3. Original basenames are preserved to prevent unintended overwrites!

Examples:

Source file:

/source_dir/HN-CHUS-082/1-1.dcm

Target directory pattern:

./data/dicoms/%PatientID/Study-%StudyInstanceUID/Series-%SeriesInstanceUID/%Modality/

would result in the following structure for each file:

data/
└── dicoms/
    └── {PatientID}/
        └── Study-{StudyInstanceUID}/
            └── Series-{SeriesInstanceUID}/
                └── {Modality}/
                    └── 1-1.dcm

And so the resolved path for the file would be:

./data/dicoms/HN-CHUS-082/Study-06980/Series-67882/RTSTRUCT/1-1.dcm

Here, the file is relocated into the resolved directory structure:

./data/dicoms/HN-CHUS-082/Study-06980/Series-67882/RTSTRUCT/

while the basename 1-1.dcm remains unchanged.

imgtools.dicom.sort.DICOMSorter

DICOMSorter(
    source_directory: Path,
    target_pattern: str,
    pattern_parser: Pattern = DEFAULT_PATTERN_PARSER,
)

A specialized implementation of the SorterBase for sorting DICOM files by metadata.

This class resolves paths for DICOM files based on specified target patterns, using metadata extracted from the files. The filename of each source file is preserved during this process.

Attributes:

Name Type Description
source_directory Path

The directory containing the files to be sorted.

logger Logger

The instance logger bound with the source directory context.

dicom_files list of Path

The list of DICOM files found in the source_directory.

format str

The parsed format string with placeholders for DICOM tags.

keys Set[str]

DICOM tags extracted from the target pattern.

invalid_keys Set[str]

DICOM tags from the pattern that are invalid.

Methods:

Name Description
execute

Execute the file action on DICOM files.

print_tree

Display the pattern structure as a tree visualization.

validate_keys

Validate extracted keys. Subclasses should implement this method

Source code in src/imgtools/dicom/sort/dicomsorter.py
def __init__(
    self,
    source_directory: Path,
    target_pattern: str,
    pattern_parser: Pattern = DEFAULT_PATTERN_PARSER,
) -> None:
    super().__init__(
        source_directory=source_directory,
        target_pattern=target_pattern,
        pattern_parser=pattern_parser,
    )
    self.logger.debug('All DICOM Keys are Valid in target pattern', keys=self.keys)

format property

format: str

Get the formatted pattern string.

invalid_keys property

invalid_keys: Set[str]

Get the set of invalid keys.

Essentially, this will check pydicom.dictionary_has_tag for each key in the pattern and return the set of keys that are invalid.

Returns:

Type Description
Set[str]

The set of invalid keys.

keys property

keys: Set[str]

Get the set of keys extracted from the pattern.

pattern_preview property

pattern_preview: str

Returns a human readable preview of the pattern.

Useful for visualizing the pattern structure and can be highlighted using Rich Console.

Examples:

>>> target_pattern = '%key1/%key2/%key3'
>>> pattern_preview = '{key1}/{key2}/{key3}'

execute

execute(
    action: FileAction | str = MOVE,
    overwrite: bool = False,
    dry_run: bool = False,
    num_workers: int = 1,
) -> None

Execute the file action on DICOM files.

Users are encouraged to use FileAction.HARDLINK for efficient storage and performance for large dataset, as well as protection against lost data.

Using hard links can save disk space and improve performance by creating multiple directory entries (links) for a single file instead of duplicating the file content. This is particularly useful when working with large datasets, such as DICOM files, where storage efficiency is crucial.

Parameters:

Name Type Description Default

action

FileAction
The action to apply to the DICOM files (e.g., move, copy).
FileAction.MOVE

overwrite

bool
If True, overwrite existing files at the destination.
False

dry_run

bool
If True, perform a dry run without making any changes.
False

num_workers

int
The number of worker threads to use for processing files.
1
Source code in src/imgtools/dicom/sort/dicomsorter.py
def execute(
    self,
    action: FileAction | str = FileAction.MOVE,
    overwrite: bool = False,
    dry_run: bool = False,
    num_workers: int = 1,
) -> None:
    """Execute the file action on DICOM files.

    Users are encouraged to use FileAction.HARDLINK for
    efficient storage and performance for large dataset, as well as
    protection against lost data.

    Using hard links can save disk space and improve performance by
    creating multiple directory entries (links) for a single file
    instead of duplicating the file content. This is particularly
    useful when working with large datasets, such as DICOM files,
    where storage efficiency is crucial.

    Parameters
    ----------
    action : FileAction, default: FileAction.MOVE
            The action to apply to the DICOM files (e.g., move, copy).
    overwrite : bool, default: False
            If True, overwrite existing files at the destination.
    dry_run : bool, default: False
            If True, perform a dry run without making any changes.
    num_workers : int, default: 1
            The number of worker threads to use for processing files.

    Raises
    ------
    ValueError
            If the provided action is not a valid FileAction.
    """
    if not isinstance(action, FileAction):
        action = FileAction.validate(action)

    self.logger.debug(f'Mapping {len(self.dicom_files)} files to new paths')

    # Create a progress bar that can be used to track everything
    with self._progress_bar() as progress_bar:
        ################################################################################
        # Resolve new paths
        ################################################################################
        file_map: Dict[Path, Path] = self._resolve_new_paths(
            progress_bar=progress_bar, num_workers=num_workers
        )
    self.logger.info('Finished resolving paths')

    ################################################################################
    # Check if any of the resolved paths are duplicates
    ################################################################################
    file_map = self._check_duplicates(file_map)
    self.logger.info('Finished checking for duplicates')

    ################################################################################
    # Handle files
    ################################################################################
    if dry_run:
        self._dry_run(file_map)
        return

    with self._progress_bar() as progress_bar:
        task_files = progress_bar.add_task('Handling files', total=len(file_map))
        new_paths: List[Path | None] = []
        with ProcessPoolExecutor(max_workers=num_workers) as executor:
            future_to_file = {
                executor.submit(
                    handle_file, source_path, resolved_path, action, overwrite
                ): source_path
                for source_path, resolved_path in file_map.items()
            }
            for future in as_completed(future_to_file):
                try:
                    result = future.result()
                    new_paths.append(result)
                    progress_bar.update(task_files, advance=1)
                except Exception as e:
                    self.logger.exception(
                        'Failed to handle file',
                        exc_info=e,
                        file=future_to_file[future],
                    )

print_tree

print_tree(base_dir: Path | None = None) -> None

Display the pattern structure as a tree visualization.

Notes

This only prints the target pattern, parsed and formatted. Performing a dry-run execute will display more information.

Source code in src/imgtools/dicom/sort/sorter_base.py
def print_tree(self, base_dir: Path | None = None) -> None:
    """
    Display the pattern structure as a tree visualization.

    Notes
    -----
    This only prints the target pattern, parsed and formatted.
    Performing a dry-run execute will display more information.

    Raises
    ------
    SorterBaseError
        If the tree visualization fails to generate.
    """
    try:
        base_dir = base_dir or Path().cwd().resolve()
        tree = self._setup_tree(base_dir)
        self._generate_tree_structure(self.pattern_preview, tree)
        self._console.print(tree)
    except Exception as e:
        errmsg = 'Failed to generate tree visualization.'
        raise SorterBaseError(errmsg) from e

validate_keys

validate_keys() -> None

Validate extracted keys. Subclasses should implement this method to perform specific validations based on their context.

Validate the DICOM keys in the target pattern.

If any invalid keys are found, it suggests similar valid keys and raises an error.

Source code in src/imgtools/dicom/sort/dicomsorter.py
def validate_keys(self) -> None:
    """Validate the DICOM keys in the target pattern.

    If any invalid keys are found, it
    suggests similar valid keys and raises an error.
    """
    if not self.invalid_keys:
        return

    for key in sorted(self.invalid_keys):
        # TODO: keep this logic, but make the suggestion more user-friendly/readable
        similar = similar_tags(key)
        suggestion = (
            f"\n\tDid you mean: [bold green]{', '.join(similar)}[/bold green]?"
            if similar
            else ' And [bold red]no similar keys[/bold red] found.'
        )
        _error = f'Invalid DICOM key: [bold red]{key}[/bold red].{suggestion}'
        self._console.print(f'{_error}')
    self._console.print(f'Parsed Path: `{self.pattern_preview}`')
    errmsg = 'Invalid DICOM Keys found.'
    raise InvalidDICOMKeyError(errmsg)