Skip to content

Methodology#

Data Collection and Curation Process#

  • Source Identification: Identify and collect diverse imaging data from trusted sources, ensuring a broad representation of cancer types and demographic groups.
  • Data Cleaning and Preprocessing: Implement rigorous data cleaning to remove inconsistencies and prepare images for standardized processing.
  • Annotation Standards: Apply consistent annotation protocols across all datasets, with expert validation to ensure quality and reliability.

MedImage-Tools Development#

  • Tool Design: Develop tools that streamline the standardization process, allowing researchers to easily convert raw imaging data into a format compatible with Med-ImageNet standards.
  • Automation and Scalability: Design MedImage-Tools with automation capabilities to handle large-scale data efficiently, ensuring the dataset remains up-to-date with minimal manual intervention.

Integration of FAIR Principles#

  • Findability: Ensure that all dataset components are tagged and indexed for easy discoverability within the research community.
  • Accessibility: Provide an open-access platform where researchers can securely access and download data for AI model training and validation.
  • Interoperability: Structure data in widely accepted formats, facilitating cross-platform compatibility and collaboration.
  • Reusability: Design data with future adaptability in mind, enabling its application across a variety of oncology research initiatives.