Skip to content

Sorter base

sorter_base #

Base module for sorting files based on customizable patterns.

This module provides a foundation for implementing file sorting logic, particularly for handling DICOM files or other structured data.

The SorterBase class serves as an abstract base class for: - Parsing and validating patterns used for organizing files. - Visualizing the target directory structure through a tree representation. - Allowing subclasses to implement specific validation and resolution logic.

Important: While this module helps define the target directory structure for files based on customizable metadata-driven patterns, it does not alter the filename (basename) of the source files. The original filename is preserved during the sorting process. This ensures that files with the same metadata fields but different filenames are not overwritten, which is critical when dealing with fields like InstanceNumber that may have common values across different files.

Examples:

Given a source file: /source_dir/HN-CHUS-082/1-1.dcm

And a target pattern: ./data/dicoms/%PatientID/Study-%StudyInstanceUID/Series-%SeriesInstanceUID/%Modality/

The resolved path will be: ./data/dicoms/HN-CHUS-082/Study-06980/Series-67882/RTSTRUCT/1-1.dcm

The SorterBase class ensures that only the directory structure is adjusted based on metadata, leaving the original filename intact.

Functions:

Name Description
resolve_path

Worker function to resolve a single path.

SorterBase #

SorterBase(
    source_directory: pathlib.Path,
    target_pattern: str,
    pattern_parser: typing.Pattern = imgtools.dicom.sort.sorter_base.DEFAULT_PATTERN_PARSER,
)

Bases: abc.ABC

Abstract base class for sorting files based on customizable patterns.

This class provides functionalities for: - Pattern parsing and validation - Tree visualization of file structures - Extensibility for subclass-specific implementations

Parameters:

Name Type Description Default

source_directory #

pathlib.Path

The directory containing the files to be sorted.

required

target_pattern #

str

The pattern string for sorting files.

required

pattern_parser #

typing.Pattern

Custom regex pattern for parsing patterns uses default that matches placeholders in the format of %KEY or {KEY}: re.compile(r"%([A-Za-z]+)|\\{([A-Za-z]+)\\}").

imgtools.dicom.sort.sorter_base.DEFAULT_PATTERN_PARSER

Attributes:

Name Type Description
source_directory pathlib.Path

The directory containing the files to be sorted.

format str

The parsed format string with placeholders for keys.

dicom_files list of Path

The list of DICOM files to be sorted.

Methods:

Name Description
print_tree

Display the pattern structure as a tree visualization.

validate_keys

Validate extracted keys. Subclasses should implement this method

Source code in src/imgtools/dicom/sort/sorter_base.py
def __init__(
    self,
    source_directory: Path,
    target_pattern: str,
    pattern_parser: Pattern = DEFAULT_PATTERN_PARSER,
) -> None:
    if not source_directory.exists() or not source_directory.is_dir():
        errmsg = f"Source directory {source_directory} does not exist or is not a directory."
        raise SorterBaseError(errmsg)

    self.source_directory = source_directory
    self._target_pattern = target_pattern
    self._pattern_parser = pattern_parser
    self._keys: Set[str] = set()
    self._console: Console = self._initialize_console()
    self.logger = logger.bind(source_directory=self.source_directory)

    try:
        self.dicom_files = find_dicoms(
            directory=self.source_directory,
            check_header=False,
            recursive=True,
            extension="dcm",
        )
        self.logger.info(f"Found {len(self.dicom_files)} files")
    except Exception as e:
        errmsg = "Failed to find files in the source directory."
        raise SorterBaseError(errmsg) from e

    try:
        self._parser = PatternParser(
            self._target_pattern, self._pattern_parser
        )
        self._format, parsed_keys = self._parser.parse()
        self._keys = set(parsed_keys)
    except Exception as e:
        errmsg = "Failed to initialize SorterBase."
        raise SorterBaseError(errmsg) from e
    self.validate_keys()

format property #

format: str

Get the formatted pattern string.

keys property #

keys: typing.Set[str]

Get the set of keys extracted from the pattern.

pattern_preview property #

pattern_preview: str

Returns a human readable preview of the pattern.

Useful for visualizing the pattern structure and can be highlighted using Rich Console.

Examples:

>>> target_pattern = "%key1/%key2/%key3"
>>> pattern_preview = "{key1}/{key2}/{key3}"

print_tree #

print_tree(base_dir: pathlib.Path | None = None) -> None

Display the pattern structure as a tree visualization.

Notes

This only prints the target pattern, parsed and formatted. Performing a dry-run execute will display more information.

Source code in src/imgtools/dicom/sort/sorter_base.py
def print_tree(self, base_dir: Path | None = None) -> None:
    """
    Display the pattern structure as a tree visualization.

    Notes
    -----
    This only prints the target pattern, parsed and formatted.
    Performing a dry-run execute will display more information.

    Raises
    ------
    SorterBaseError
        If the tree visualization fails to generate.
    """
    try:
        base_dir = base_dir or Path().cwd().resolve()
        tree = self._setup_tree(base_dir)
        self._generate_tree_structure(self.pattern_preview, tree)
        self._console.print(tree)
    except Exception as e:
        errmsg = "Failed to generate tree visualization."
        raise SorterBaseError(errmsg) from e

validate_keys abstractmethod #

validate_keys() -> None

Validate extracted keys. Subclasses should implement this method to perform specific validations based on their context.

Source code in src/imgtools/dicom/sort/sorter_base.py
@abstractmethod
def validate_keys(self) -> None:
    """
    Validate extracted keys. Subclasses should implement this method
    to perform specific validations based on their context.
    """
    pass

resolve_path #

resolve_path(
    path: pathlib.Path,
    keys: typing.Set[str],
    format_str: str,
    truncate: int = 5,
    check_existing: bool = True,
    force: bool = True,
) -> typing.Tuple[pathlib.Path, pathlib.Path]

Worker function to resolve a single path.

Parameters:

Name Type Description Default

path #

pathlib.Path

The source file path.

required

keys #

typing.Set[str]

The DICOM keys required for resolving the path.

required

format_str #

str

The format string for the resolved path.

required

check_existing #

bool

If True, check if the resolved path already exists (default is True).

True

truncate #

int

The number of characters to trunctae UID values (default is 5).

5

force #

bool

passed to pydicom.dcmread() to force reading the file (default is False).

True

Returns:

Type Description
typing.Tuple[pathlib.Path, pathlib.Path]

The source path and resolved path.

Source code in src/imgtools/dicom/sort/sorter_base.py
def resolve_path(
    path: Path,
    keys: Set[str],
    format_str: str,
    truncate: int = 5,
    check_existing: bool = True,
    force: bool = True,
) -> Tuple[Path, Path]:
    """
    Worker function to resolve a single path.

    Parameters
    ----------
    path : Path
        The source file path.
    keys : Set[str]
        The DICOM keys required for resolving the path.
    format_str : str
        The format string for the resolved path.
    check_existing : bool, optional
        If True, check if the resolved path already exists (default is True).
    truncate : int, optional
        The number of characters to trunctae UID values (default is 5).
    force : bool, optional
        passed to pydicom.dcmread() to force reading the file (default is False).

    Returns
    -------
    Tuple[Path, Path]
        The source path and resolved path.
    """
    tags: Dict[str, str] = read_tags(
        path, list(keys), truncate=truncate, force=force, default="Unknown"
    )
    resolved_path = Path(format_str % tags, path.name)
    if check_existing and not resolved_path.exists():
        resolved_path = resolved_path.resolve()
    elif check_existing:
        errmsg = f"Path {resolved_path} already exists."
        logger.error(errmsg, source_path=path, resolved_path=resolved_path)
        raise FileExistsError(errmsg)

    return path, resolved_path