Skip to content

Extending the AbstractBaseWriter class

The AbstractBaseWriter is designed to be extended, allowing you to create custom writers tailored to your specific needs. This guide will walk you through the steps to extend the class and implement your custom functionality.


Setting Up Your Writer

To create a custom writer, you need to extend the AbstractBaseWriter and implement the save method. This method is the core of your writer, handling how and where data is saved.

For a walkthrough of all key methods and features, see the Key Methods section below.

Steps to Set Up

  1. Inherit from AbstractBaseWriter:
    Create a new class and extend AbstractBaseWriter with the appropriate type. If you are saving text data, use AbstractBaseWriter[str], for example. If you are saving image data, use AbstractBaseWriter[sitk.Image].

  2. Define the save Method:
    Use resolve_path() or preview_path() to generate file paths.
    Implement the logic for saving data.

  3. Customize Behavior (Optional): Override any existing methods for specific behavior.
    Add additional methods or properties to enhance functionality.

Simple Example

from pathlib import Path
from imgtools.io import AbstractBaseWriter

class MyCustomWriter(AbstractBaseWriter[str]):
    def save(self, content: str, **kwargs) -> Path:
        # Resolve the output file path
        output_path = self.resolve_path(**kwargs)

        # Write content to the file
        with output_path.open(mode="w", encoding="utf-8") as f:
            f.write(content)

        # Log and track the save operation
        self.add_to_index(output_path, **self.context)

        return output_path

Implementing the save Method

The save method is the heart of your custom writer. It determines how data is written to files and interacts with the core features of AbstractBaseWriter.

Key Responsibilities of save

  1. Path Resolution:

    • Use resolve_path() to dynamically generate file paths based on the provided context and filename format.
    • You can optionally use preview_path() as well.
    • Ensure paths are validated to prevent overwriting or duplication.
  2. Data Writing:

    • Define how the content will be written to the resolved path.
    • Use file-handling best practices to ensure reliability.
  3. Logging and Tracking:

    • Log each save operation for debugging or auditing purposes.
    • Use add_to_index() to maintain a record of saved files and their associated context variables.
  4. Return Value:

    • Return the Path object representing the saved file.
    • This allows users to access the file path for further processing or verification.

Example Implementation

Here’s a minimal implementation of the save method for a custom writer.

from pathlib import Path
from mypackage.abstract_base_writer import AbstractBaseWriter

class MyCustomWriter(AbstractBaseWriter[str]):
    def save(self, content: str, **kwargs) -> Path:
        # Step 1: Resolve the output file path
        # you can try-catch this in case set to "FAIL" mode
        # or just let the error propagate
        output_path = self.resolve_path(**kwargs) # resolve_path will always return the path

        # OPTIONAL handling for "SKIP" modes
        if output_path.exists():
            # this will only be true if the file existence mode
            # is set to SKIP
            # - OVERWRITE will have already deleted the file
            # - upto developer to choose to handle this if set to SKIP
            pass

        # Step 2: Write the content to the resolved path
        with output_path.open(mode="w", encoding="utf-8") as f:
            f.write(content)

        # Step 3: Log and track the save operation
        self.add_to_index(output_path, filepath_column="filepath", **kwargs)

        # Step 4: ALWAYS Return the saved file path
        return output_path

Key Methods

The AbstractBaseWriter provides several utility methods that simplify file writing and context management. These methods are designed to be flexible and reusable, allowing you to focus on your custom implementation.

resolve_path

resolve_path(**kwargs: object) -> pathlib.Path

Generate a file path based on the filename format, subject ID, and additional parameters.

Meant to be used by developers when creating a new writer class and used internally by the save method.

What It Does:

  • Dynamically generates a file path based on the provided context and filename format.

When to Use It:

  • This method is meant to be used in the save method to determine the file’s target location, but can also be used by external code to generate paths.
  • It ensures you’re working with a valid path and can handle file existence scenarios.
  • Only raises FileExistsError if the file already exists and the mode is set to FAIL.

Parameters:

Name Type Description Default

**kwargs

typing.Any

Parameters for resolving the filename and validating existence.

{}

Returns:

Name Type Description
resolved_path pathlib.Path

The resolved path for the file.

Source code in src/imgtools/io/writers/abstract_base_writer.py
def resolve_path(self, **kwargs: object) -> Path:
    """
    Generate a file path based on the filename format, subject ID, and
    additional parameters.

    Meant to be used by developers when creating a new writer class
    and used internally by the `save` method.

    **What It Does**:

    - Dynamically generates a file path based on the provided context and
    filename format.

    **When to Use It**:

    - This method is meant to be used in the `save` method to determine the
    file’s target location, but can also be used by external code to
    generate paths.
    - It ensures you’re working with a valid path and can handle file
    existence scenarios.
    - Only raises `FileExistsError` if the file already exists and the mode
    is set to `FAIL`.

    Parameters
    ----------
    **kwargs : Any
        Parameters for resolving the filename and validating existence.

    Returns
    -------
    resolved_path: Path
        The resolved path for the file.

    Raises
    ------
    FileExistsError
        If the file already exists and the mode is set to FAIL.
    """
    out_path = self._generate_path(**kwargs)
    if not out_path.exists():
        if self.create_dirs:
            self._ensure_directory_exists(out_path.parent)
        # should we raise this error here?
        # elif not out_path.parent.exists():
        #     msg = f"Directory {out_path.parent} does not exist."
        #     raise DirectoryNotFoundError(msg)
        return out_path
    match self.existing_file_mode:
        case ExistingFileMode.SKIP:
            return out_path
        case ExistingFileMode.FAIL:
            msg = f"File {out_path} already exists."
            raise FileExistsError(msg)
        case ExistingFileMode.OVERWRITE:
            logger.debug(f"Deleting existing {out_path} and overwriting.")
            out_path.unlink()
            return out_path

preview_path

preview_path(
    **kwargs: object,
) -> typing.Optional[pathlib.Path]

Pre-checking file existence and setting up the writer context.

Meant to be used by users to skip expensive computations if a file already exists and you dont want to overwrite it. Only difference between this and resolve_path is that this method does not return the path if the file exists and the mode is set to SKIP.

This is because the .save() method should be able to return the path even if the file exists.

What It Does:

  • Pre-checks the file path based on context without writing the file.
  • Returns None if the file exists and the mode is set to SKIP.
  • Raises a FileExistsError if the mode is set to FAIL.
  • An added benefit of using preview_path is that it automatically caches the context variables for future use, and save() can be called without passing in the context variables again.

Examples:

Main idea here is to allow users to save computation if they choose to skip existing files.

i.e. if file exists and mode is SKIP, we return None, so the user can skip the computation.

>>> if nifti_writer.preview_path(subject="math", name="test") is None:
>>>     logger.info("File already exists. Skipping computation.")
>>>     continue # could be `break` or `return` depending on the use case

if the mode is FAIL, we raise an error if the file exists, so user doesnt have to perform expensive computation only to fail when saving.

Useful Feature

The context is saved in the instance, so running .save() after this will use the same context, and user can optionally update the context with new values passed to .save().

>>> if path := writer.preview_path(subject="math", name="test"):
>>>     ... # do some expensive computation to generate the data
>>>     writer.save(data)
.save() automatically uses the context for subject and name we passed to preview_path

Parameters:

Name Type Description Default

**kwargs

typing.Any

Parameters for resolving the filename and validating existence.

{}

Returns:

Type Description
pathlib.Path | None

If the file exists and the mode is SKIP, returns None. if the file exists and the mode is FAIL, raises a FileExistsError. If the file exists and the mode is OVERWRITE, logs a debug message and returns the path.

Source code in src/imgtools/io/writers/abstract_base_writer.py
def preview_path(self, **kwargs: object) -> Optional[Path]:
    """
    Pre-checking file existence and setting up the writer context.

    Meant to be used by users to skip expensive computations if a file
    already exists and you dont want to overwrite it.
    Only difference between this and resolve_path is that this method
    does not return the path if the file exists and the mode is set to
    `SKIP`.

    This is because the `.save()` method should be able to return
    the path even if the file exists.

    **What It Does**:

    - Pre-checks the file path based on context without writing the file.
    - Returns `None` if the file exists and the mode is set to `SKIP`.
    - Raises a `FileExistsError` if the mode is set to `FAIL`.
    - An added benefit of using `preview_path` is that it automatically
    caches the context variables for future use, and `save()` can be called
    without passing in the context variables again.

    Examples
    --------

    Main idea here is to allow users to save computation if they choose to
    skip existing files.

    i.e. if file exists and mode is **`SKIP`**, we return
    `None`, so the user can skip the computation.
    >>> if nifti_writer.preview_path(subject="math", name="test") is None:
    >>>     logger.info("File already exists. Skipping computation.")
    >>>     continue # could be `break` or `return` depending on the use case

    if the mode is **`FAIL`**, we raise an error if the file exists, so user
    doesnt have to perform expensive computation only to fail when saving.

    **Useful Feature**
    ----------------------
    The context is saved in the instance, so running
    `.save()` after this will use the same context, and user can optionally
    update the context with new values passed to `.save()`.

    ```python
    >>> if path := writer.preview_path(subject="math", name="test"):
    >>>     ... # do some expensive computation to generate the data
    >>>     writer.save(data)
    ```
    `.save()` automatically uses the context for `subject` and `name` we
    passed to `preview_path`

    Parameters
    ----------
    **kwargs : Any
        Parameters for resolving the filename and validating existence.

    Returns
    ------
    Path | None
        If the file exists and the mode is `SKIP`, returns `None`. if the file
        exists and the mode is FAIL, raises a `FileExistsError`. If the file
        exists and the mode is OVERWRITE, logs a debug message and returns
        the path.

    Raises
    ------
    FileExistsError
        If the file exists and the mode is FAIL.
    """
    out_path = self._generate_path(**kwargs)

    if not out_path.exists():
        return out_path
    elif out_path.is_dir():
        msg = f"Path {out_path} is already a directory that exists."
        msg += " Use a different filename format or context to avoid this."
        raise IsADirectoryError(msg)

    match self.existing_file_mode:
        case ExistingFileMode.SKIP:
            return None
        case ExistingFileMode.FAIL:
            msg = f"File {out_path} already exists."
            raise FileExistsError(msg)
        case ExistingFileMode.OVERWRITE:
            logger.debug(
                f"File {out_path} exists. Deleting and overwriting."
            )
            out_path.unlink()

    return out_path

clear_context

clear_context() -> None

Clear the context for the writer.

Useful for resetting the context after using preview_path or save and want to make sure that the context is empty for new operations.

Source code in src/imgtools/io/writers/abstract_base_writer.py
def clear_context(self) -> None:
    """
    Clear the context for the writer.

    Useful for resetting the context after using `preview_path` or `save`
    and want to make sure that the context is empty for new operations.
    """
    self.context.clear()

add_to_index

add_to_index(
    path: pathlib.Path,
    include_all_context: bool = True,
    filepath_column: str = "path",
    replace_existing: bool = False,
    **additional_context: object
) -> None

Add or update an entry in the shared CSV index file.

What It Does:

  • Logs the file’s path and associated context variables to a shared CSV index file.
  • Uses inter-process locking to avoid conflicts when multiple writers are active.

When to Use It:

  • Use this method to maintain a centralized record of saved files for auditing or debugging.
Relevant Writer Parameters
  • The index_filename parameter allows you to specify a custom filename for the index file. By default, it will be named after the root_directory with _index.csv appended.

  • If the index file already exists in the root directory, it will overwrite it unless the overwrite_index parameter is set to False.

  • The absolute_paths_in_index parameter controls whether the paths in the index file are absolute or relative to the root_directory, with False being the default.

Parameters:

Name Type Description Default

path

pathlib.Path

The file path being saved.

required

include_all_context

bool

If True, write existing context variables passed into writer and the additional context to the CSV. If False, determines only the context keys parsed from the filename_format (excludes all other context variables, and unused context keys).

True

filepath_column

str

The name of the column to store the file path. Defaults to "path".

'path'

replace_existing

bool

If True, checks if the file path already exists in the index and replaces it.

False

**additional_context

typing.Any

Additional context information to include in the CSV passed in as keyword arguments.

{}
Notes

When replace_existing is set to True, the method will check if the file path already exists in the index file using csv.Sniffer and replace the row if it does. If the file path does not exist in the index file, it will add a new row with the file path and context information.

Source code in src/imgtools/io/writers/abstract_base_writer.py
def add_to_index(
    self,
    path: Path,
    include_all_context: bool = True,
    filepath_column: str = "path",
    replace_existing: bool = False,
    **additional_context: object,
) -> None:
    """
    Add or update an entry in the shared CSV index file.


    **What It Does**:

    - Logs the file’s path and associated context variables to a
        shared CSV index file.
    - Uses inter-process locking to avoid conflicts when
        multiple writers are active.

    **When to Use It**:

    - Use this method to maintain a centralized record of saved
    files for auditing or debugging.

    **Relevant Writer Parameters**
    ------------------------------

    - The `index_filename` parameter allows you to specify a
    custom filename for the index file.
    By default, it will be named after the `root_directory`
    with `_index.csv` appended.

    - If the index file already exists in the root directory,
    it will overwrite it unless
    the `overwrite_index` parameter is set to `False`.

    - The `absolute_paths_in_index` parameter controls whether
    the paths in the index file are absolute or relative to the
    `root_directory`, with `False` being the default.

    Parameters
    ----------
    path : Path
        The file path being saved.
    include_all_context : bool
        If True, write existing context variables passed into writer and
        the additional context to the CSV.
        If False, determines only the context keys parsed from the
        `filename_format` (excludes all other context variables, and
        unused context keys).
    filepath_column : str
        The name of the column to store the file path. Defaults to "path".
    replace_existing : bool
        If True, checks if the file path already exists in the index and
        replaces it.
    **additional_context : Any
        Additional context information to include in the CSV passed in as
        keyword arguments.

    Notes
    -----
    When `replace_existing` is set to True, the method will check if the
    file path already exists in the index file using `csv.Sniffer` and
    replace the row if it does. If the file path does not exist in the
    index file, it will add a new row with the file path and context
    information.
    """

    lock_file = self._get_index_lock()
    self._ensure_directory_exists(self.index_file.parent)

    # Prepare context and resolve the file path
    context = {**self.context, **additional_context}
    resolved_path = (
        path.resolve().absolute()
        if self.absolute_paths_in_index
        else path.relative_to(self.root_directory)
    )
    fieldnames = [
        filepath_column,
        *(
            context.keys()
            if include_all_context
            else self.pattern_resolver.keys
        ),
    ]

    rows = []
    # Check if replacing existing entries and if the index file exists
    if replace_existing and self.index_file.exists():
        # Read and validate the index file format
        try:
            with (
                InterProcessLock(lock_file),
                self.index_file.open(mode="r", encoding="utf-8") as f,
            ):
                # Use csv.Sniffer to check if the file has a header
                sniffer = csv.Sniffer()
                if not sniffer.has_header(f.readline()):
                    msg = (
                        f"Index {self.index_file} is missing a header row."
                    )
                    raise ValueError(msg)

                # Reset the file pointer after sampling
                f.seek(0)
                reader = csv.DictReader(f)
                # Check if the required column is present in the index file
                if (
                    reader.fieldnames is None
                    or filepath_column not in reader.fieldnames
                ):
                    msg = (
                        f"Index file {self.index_file} does "
                        f"not contain the column '{filepath_column}'."
                    )
                    raise ValueError(msg)
                # Filter out the existing entry for the resolved path
                rows = [
                    row
                    for row in reader
                    if row[filepath_column] != str(resolved_path)
                ]
        except Exception as e:
            # Log and raise any exceptions encountered during validation
            logger.exception(
                f"Error validating index file {self.index_file}.", error=e
            )
            raise

    # Add the new or updated row
    rows.append({filepath_column: str(resolved_path), **context})

    # Write the updated rows back to the index file
    try:
        with (
            InterProcessLock(lock_file),
            self.index_file.open(
                mode="w", newline="", encoding="utf-8"
            ) as f,
        ):
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(rows)
    except Exception as e:
        logger.exception(
            f"Error writing to index file {self.index_file}.", error=e
        )
        raise

_generate_path

_generate_path(**kwargs: object) -> pathlib.Path

Helper for resolving paths with the given context.

Source code in src/imgtools/io/writers/abstract_base_writer.py
def _generate_path(self, **kwargs: object) -> Path:
    """
    Helper for resolving paths with the given context.
    """
    save_context = {
        **self._generate_datetime_strings(),
        **self.context,
        **kwargs,
    }
    self.set_context(**save_context)
    try:
        filename = self.pattern_resolver.resolve(save_context)
    except MissingPlaceholderValueError as e:
        # Replace the class name in the error message dynamically
        raise MissingPlaceholderValueError(
            e.missing_keys,
            class_name=self.__class__.__name__,
            key=e.key,
        ) from e
    if self.sanitize_filenames:
        filename = self._sanitize_filename(filename)
    out_path = self.root_directory / filename
    logger.debug(
        f"Resolved path: {out_path} and {out_path.exists()=}",
        handling=self.existing_file_mode,
    )
    return out_path

What It Does:

  • A helper method for resolving file paths based on the current context and filename format.
  • Automatically sanitizes filenames if sanitize_filenames=True.

When to Use It:

  • Typically called internally by resolve_path() and preview_path(), which handle additional validation and error handling.
  • Can be called by your class methods to generate paths without the additional context checks.

Example:

custom_path = writer._generate_path(subject="math", name="example")
print(f"Generated path: {custom_path}")

By using these key methods effectively, you can customize your writer to handle a wide range of file-writing scenarios while maintaining clean and consistent logic.