Abstract Base Writer
The AbstractBaseWriter
class is the foundation for all writers in this library.
It provides a standard interface, reusable methods, and tools that writers can extend to handle file writing tasks efficiently and consistently.
If you’re building a writer to manage file outputs with custom paths, filenames, or formats, this is where you start!
For details on implementing the AbstractBaseWriter
in your custom writer, see the
Implementing Writers guide.
Introduction
What is the AbstractBaseWriter
?
The AbstractBaseWriter
is:
- A reusable template: Manage file-writing tasks consistently across different writer implementations.
- Customizable: Extend it to handle your file formats and workflows.
- Safe and robust: Features context management, filename sanitization, and optional CSV indexing.
When should you extend AbstractBaseWriter
for your custom writer?
If you write many files with dynamic paths and filenames, or need
to manage file existence scenarios, you might consider extending AbstractBaseWriter
(or even one of its subclasses) to simplify your implementation.
AbstractBaseWriter
is useful when you need:
- Dynamic paths and filenames (e.g.,
{subject_id}/{study_date}.nii.gz
). - Configurable handling of existing files (
OVERWRITE
,SKIP
, etc.). - Logging of saved files via an optional CSV index.
- Thread-safe and multiprocessing-compatible file writing.
- A consistent interface across different types of writers.
Core Concepts
Root Directory and Filename Format Parameters
Root Directory:
- Base folder for all saved files, automatically created if missing (via
create_dirs
parameter)
Filename Format:
- A string template defining your file and folder names.
- Uses placeholders like
{key}
to insert context values dynamically.
Example:
writer = ExampleWriter(
root_directory="./data",
filename_format="{person_name}/{date}_{message_type}.txt",
)
# Save a file with context variables
data = "Hello, World!"
writer.save(
data,
person_name="JohnDoe",
date="2025-01-01",
message_type="greeting"
)
# Saved file path:
# ./data/JohnDoe/2025-01-01_greeting.txt
File Existence Modes
Why It Matters:
- When your writer saves a file, it needs to decide what to do if a file with the same name already exists.
- This is especially important in batch operations or when writing to shared directories.
- The
AbstractBaseWriter
provides several options to handle this scenario through the use of anenum
calledExistingFileMode
.
It is important to handle these options carefully in your writer's save()
method to
avoid data loss or conflicts.
ExistingFileMode
Enum to specify handling behavior for existing files.
Attributes:
Name | Type | Description |
---|---|---|
OVERWRITE |
str
|
Overwrite the existing file. Logs as debug and continues with the operation. |
FAIL |
str
|
Fail the operation if the file exists. Logs as error and raises a FileExistsError. |
SKIP |
str
|
Skip the operation if the file exists. Meant to be used for previewing
the path before any expensive computation. |
Advanced Concepts
Lifecycle Management
Context Manager Support:
- Writers can be used with
with
statements to ensure proper setup and cleanup.
What Happens on Exit?:
- Removes lock files used for the index file.
- Deletes empty directories created during the writing process (if no files were written).
Example:
with TextWriter(root_directory="/data", filename_format="{id}.txt") as writer:
data = "Hello, World!"
writer.save(data, id="1234")
Previewing File Paths and Caching Context
In the simplest usage of a writer, users can pass in the context information as
keyword arguments to each save()
call.
However, this can become cumbersome when the same context variables are used across multiple save operations.
Example:
In the above example, the date
and message_type
context variables are the same for all
students. Instead of passing them in every time, you can store these variables in the writer
itself and update them as needed.
Let's use the following example to illustrate this:
Say we want to save greetings for students in a particular highschool class:
writer = TextWriter(
root_directory="./data",
filename_format="{grade}/{class_subject}/{person_name}/{date}_{message_type}.txt",
)
We see here that the context variables for grade
,
class_subject
, date
, and message_type
are the same for all students.
This can become even worse with more context variables, allowing for mistakes, and making the code harder to read.
student, message = "Alice", "Hello, Alice!"
writer.save(
message,
person_name=student,
grade="12",
class_subject="math",
date="2025-01-01",
message_type="greeting"
)
student, message = "Bob", "Good morning, Bob!"
writer.save(
message,
person_name=student,
grade="12",
class_subject="math",
date="2025-01-01",
message_type="greeting"
)
Instead of passing in the context variables every time,
you can store these variables in the writer and update them as needed
using the set_context()
method.
Then only pass in the unique context variables for each .save()
operation.
If majority of the context variables are the same across all save operations, you can set context when initializing the writer.
Note that here, we must pass as a dictionary to the context
parameter
instead of individual keyword arguments.
writer = TextWriter(
root_directory="./data",
filename_format="{class_subject}/{person_name}/{date}_{message_type}.txt",
context={"grade": "12", "class_subject": "math", "date": "2025-01-01", "message_type": "greeting"}
)
student, message = "Alice", "Hello, Alice!"
writer.save(message, person_name=student)
student, message = "Bob", "Good morning, Bob!"
writer.save(message, person_name=student)
Previewing File Paths
Oftentimes, you may want to check if a file exists before performing an expensive computation.
If you set the existence mode to ExistingFileMode.SKIP
, the preview_path()
method will return None
if the
file already exists, allowing you to skip the computation.
This method also caches the additional context variables for future use.
Here's an example of how you might handle this:
# assuming writer is already initialized with `existing_file_mode=ExistingFileMode.SKIP`
# set some context variables
writer.set_context(class_subject="math", date="2025-01-01", message_type="greeting")
if (path := writer.preview_path(person_name="Alice")) is None:
print("File already exists, skipping computation.")
else:
print(f"Proceed with computation for {path}")
...
# perform expensive computation
...
writer.save(content="Hello, world!")
Index File Management
What is the Index File?:
- A CSV file used to log details about saved files, like their paths and context variables.
- Helps track what files have been written, especially useful in batch operations.
- Additionally can save all the context variables used for each file, convienient for saving additional metadata, while improving traceability.
How It Works:
- By default, it is named
{root_directory.name}_index.csv
. - You can customize the filename or provide an absolute path for more control.
- This is something that the
save()
method in theAbstractBaseWriter
should optionally implement, or let users decide to include the index by callingadd_to_index(path)
aftersave()
.
Key Features:
- Customizable Filename: Use
index_filename
to set a custom name or absolute path. - Absolute/Relative Paths: Control file paths in the index with
absolute_paths_in_index
. - Inter-process Locking: Prevents conflicts in concurrent writing environments.
Sanitizing Filenames
Why Sanitize Filenames?:
- To ensure that filenames are safe and compatible across different operating systems.
How It Works:
- Replaces illegal characters (e.g.,
<
,>
,:
,"
,/
,\
,|
,?
,*
) with underscores. - Trims leading or trailing spaces and periods to avoid issues.
When Is It Applied?:
- Automatically applied when generating filenames, unless disabled by setting
sanitize_filenames=False
.
Multiprocessing Compatibility
Why It Matters:
- In batch operations or high-performance use cases, multiple processes may write files simultaneously.
Key Features:
- Supports multiprocessing with inter-process locking to ensure thread-safe file writes.
- Avoids conflicts or data corruption when multiple instances of a writer are running.