Description
Health data is today one of the most complex and valuable types of data: it is produced by Electronic Health Records (EHR), Personal Health Records (PHR), medical imaging systems, genomic analyses, wearable devices, and sensors. Its proper management is a prerequisite for personalized medicine (P4 medicine), clinical decision support systems, and the application of Artificial Intelligence techniques. The course presents a comprehensive introduction to the fundamental principles, technologies, and challenges of health data management.
Weekly Course Outline
Week 1: Introduction: Types, sources and particularities of health data. From “reactive” to P4 medicine. Overview of the medical informatics field. European Health Data Space (EHDS), EUCAIM.
Week 2: Health Information Systems: HIS, LIS, RIS, Electronic Health Record (EHR/EMR) versus PHR. Architectures, data flows, key challenges.
Week 3: Interoperability Standards I — HL7: interoperability levels, HL7 v2/CDA, HL7 FHIR (resources, RESTful APIs, profiles, IGs). Lab: queries on a public FHIR server.
Week 4: Standards II — Terminologies, Ontologies, DICOM: SNOMED CT, ICD-10/11, LOINC, MeSH, RxNorm, ATC. BioPortal, OBO Foundry. DICOM, PACS. Lab: mapping clinical terms to SNOMED/LOINC.
Week 5: Common Data Models (CDMs): OMOP (OHDSI), i2b2, Sentinel, PCORnet. ETL from EHR to CDM. ATLAS, cohort definition. Lab: exploring a synthetic OMOP dataset in SQL.
Week 6: Semantic Web and Biomedical Ontologies: RDF, RDFS, OWL, SPARQL. Linked Data principles. Bio2RDF, UMLS. Designing a health ontology. Lab: SPARQL on biomedical endpoints.
Week 7: Semantic Integration and Knowledge Graphs: Mediator/wrapper architectures, Ontology-Based Data Access (OBDA), R2RML/RML mappings. Knowledge graphs in healthcare as a foundation for AI applications.
Week 8: Big Data Management in Health: NoSQL databases (document, graph, column-family), data warehouses, data lakes, lakehouses. Health Data Spaces — the European strategy (EHDS, EHDS2). Lab: storing/querying FHIR resources in MongoDB.
Week 9: Data Quality and FAIR: quality dimensions (completeness, accuracy, consistency, timeliness), data curation, data profiling. FAIR principles (Findable, Accessible, Interoperable, Reusable). Summaries of large data sources.
Week 10: Privacy, Security, Federated Analysis: GDPR, anonymization vs pseudonymization (k-anonymity, l-diversity, differential privacy). Federated learning / federated analytics. Personal Health Trains. Lab: anonymizing a dataset with ARX.
Week 11: AI-based Data Management & Real-World Evidence: cohort discovery, automated data integration, learned indexes, and semantic search in EHR. RWD/RWE. Relationship with Clinical Decision Support Systems.
Week 12: Data Infrastructures for AI in Medical Imaging: from local PACS to European federated repositories. Case studies: ProCancer-I, EUCAIM, GenoMed4All, RadioVal. Curation pipelines, annotation, quality control. Invited talk.
Week 13: Current trends & Project Presentations: mHealth, wearables, patient-generated health data, patient empowerment. LLMs/Generative AI for health data — opportunities and risks. Group project presentations.
Group Project (indicative options)
• FHIR-based mini EHR — REST API over FHIR resources.
• OMOP cohort study — ETL of synthetic data, OMOP, phenotypic cohort definition, statistical analysis.
• Knowledge graph for clinical data — integration of sources (DrugBank, SIDER, ClinicalTrials) with SPARQL.
• Federated analytics demo — training a model across two “hospitals” without data transfer.
• Privacy-preserving release — applying anonymization techniques to an EHR set, evaluating the utility/privacy trade-off.
Learning Outcomes:
Upon successful completion of the course, students will acquire the following:
- Knowledge: They will acquire knowledge about the nature, sources, and particularities of health data (EHR, PHR, medical imaging, genomic, sensor data), as well as the ecosystem of health information systems. They will become familiar with the dominant interoperability standards (HL7 FHIR, DICOM), the biomedical terminologies/ontologies (SNOMED CT, ICD-10/11, LOINC, MeSH), and the common data models (OMOP CDM, i2b2).
- Comprehension: They will understand the dimensions of interoperability (technical, syntactic, semantic, organizational), the concept of health data quality, the FAIR principles, as well as the requirements of GDPR and privacy-protection techniques (k-anonymity, differential privacy, federated learning).
- Application: They will apply modern semantic technologies (RDF/OWL/SPARQL) and knowledge graphs to integrate heterogeneous health data sources. In addition, they will develop applications on top of FHIR APIs, execute ETL flows towards OMOP CDM, and use NoSQL databases and data lakes.
- Analysis: They will learn to analyze and critically evaluate the quality of a health dataset, to identify integration and security problems, and to propose well-documented solutions. They will also learn to recognize when a problem requires a centralized, distributed, or federated architecture.
- Synthesis: They will learn to design and implement integrated health data management solutions that combine interoperability standards, semantic technologies, big data infrastructures, and AI-based data management techniques for specific clinical scenarios.
- Evaluation: They will learn to assess the advantages, disadvantages, and limitations of different technological approaches to health data management, as well as the ethical, legal, and social stakes of managing sensitive data.
Student Performance Evaluation:
Assessment method (summative with formative elements):
• Group Project (submission and oral presentation): 40%
• Lab assignments (3 deliverables): 25%
• Final written exam (combination of short answer and problem solving): 30%
• Participation / scientific paper presentation: 5%
The assessment criteria for each component are explicitly defined and available to students through the course website and the eLearn platform. The minimum passing grade is 5/10 overall, with mandatory submission of the project.