Understanding the .gitignore
File
What and why?
The .gitignore
file tells Git which files and
folders to ignore when tracking changes. Think of it as keeping the party clean by
not letting in the uninvited guests (like temporary files, build artifacts, and
system-specific files).
Our template comes with two .gitignore
files:
-
root:
TEMPLATE/.gitignore
crafted to work with both Python and R development workflows. It's organized into logical sections to make it easy to understand and maintain.
-
data:
TEMPLATE/data/.gitignore
specifically designed to ignore files in thedata
directory pre-configured to ignore everything except theREADME.md
files, to prevent users from accidentally rawdata files.
Why shouldn't I commit my data files?
Answer
-
Large files bloat the repo: Every clone/download pulls the whole history, so a single 500 MB dataset balloons to gigabytes after a few updates.
-
Git is text‑oriented: Binary data can’t be delta‑compressed well, so each revision is stored almost in full, slowing every operation.
-
Privacy!: Patient or proprietary data in a public repo is dangerous and can lead to Data Use Agreements being violated.
-
Reproducibility best practice: Keep code in Git and store immutable data elsewhere (e.g. Zenodo, FigShare) so others can fetch the exact snapshot you used AND use it across different projects.
-
Backup strategy separation: Repos are for source; archives belong in object storage, not in version control.
-
CI/CD efficiency: Smaller repositories mean faster pipelines and lower bandwidth costs for every contributor.
What's inside the main .gitignore
file?
The .gitignore
file is organized into these major sections:
- Operating System files - Keeps those pesky
.DS_Store
files (macOS), Thumbs.db (Windows), and other OS-specific clutter out of your repository - Python related files - Ignores bytecode, package builds, and other Python-specific temporary files
- R related files - Skips R history, session data, and package build files
- IDE/Editor files - Prevents editor configs from PyCharm, VS Code, and others from being shared
- Dependency Management - Handles ignoring appropriate files from tools like poetry, pipenv, and pdm
- Build/Test artifacts - Keeps build directories and test results from cluttering your repo
- Documentation builds - Ignores generated documentation that should be built on-demand
- Project-specific entries - A section reserved for your specific project needs
Why is this important?
A well-configured .gitignore
file:
- Keeps your repository clean - No more accidental commits of temporary files
- Reduces conflicts - Prevents system-specific files from causing merge headaches
- Improves performance - Git works faster when it doesn't have to track thousands of irrelevant files
- Maintains security - Prevents accidental commits of sensitive information (like environment files)
Tip
If you find yourself repeatedly using git add -f
to force-add ignored files,
that might be a sign you need to adjust your .gitignore
file.
Handle with care
The .gitignore
file in this template has been carefully optimized for data
science projects using Python and R. Modify it only if you know what you're
doing!
Removing patterns can lead to system files, caches, or even sensitive information being accidentally committed to your repository.
Customizing for your project
Need to add project-specific patterns? Look for the "PROJECT-SPECIFIC ENTRIES" section at the end of the file:
#############################################################################
# 8. PROJECT-SPECIFIC ENTRIES
#############################################################################
# Add your project-specific entries here
# For example:
# models/
# sandbox/
# etc.
This is where you can safely add patterns specific to your project without disrupting the carefully balanced patterns above.
Common additions
You might want to add patterns for:
- Large data files (
sandbox/
directory if you use it) - Model checkpoints or weights (
.h5
,.pkl
) - Generated figures or outputs
- Environment-specific configuration files (
.env
,.local
) - Temporary directories (
temp/
,cache/
)