Course Description
The course runs over 13 weeks (4 hours of lectures per week) and is organized into four work cycles (sprints) of approximately 3 weeks each, with a presentation (demo) at the end of each sprint.
Week 1 — Introduction and field overview: Why programming agents matter now; the tools ecosystem (Copilot, Cursor, Claude Code, Devin); course organization; responsible AI framework; project topics announcement.
Week 2 — Core concepts of language models: Tokens and context windows; how text is generated; prompt engineering; reasoning models (chain-of-thought, o1/o3, R1); cost, latency, and capabilities; first steps with the API. (Sprint 1 begins)
Week 3 — Context management and RAG: The context stack; RAG architecture; embeddings; vector databases; hybrid search; memory systems; prompt injection.
Week 4 — Agent architecture and tool use: ReAct loops; tool/function calling; state management; agent harnesses; implementing an agent in fewer than 200 lines of code; decomposing complex tasks.
Week 5 — Sprint 1 presentation + agent development frameworks: LangGraph, CrewAI, PydanticAI, OpenAI Agents SDK—comparison and trade-offs; sequential, routing, and collaborative patterns. (Sprint 1 ends / Sprint 2 begins)
Week 6 — Multi-agent systems: Multi-agent orchestration; specialized roles; debate, reflection, and mixture-of-agents patterns; AutoGen and CrewAI in depth; common failure modes.
Week 7 — Coding agents and code generation: Historical overview; HumanEval and SWE-bench; Chain-of-Thought, Tree of Thoughts, Reflexion, LATS; agent configuration (CLAUDE.md, agents.md); vibe coding and its limitations.
Week 8 — Sprint 2 presentation + correctness verification and CI/CD: Automated testing of AI-generated code; CI/CD pipelines for AI applications; correctness, security, and performance. (Sprint 2 ends / Sprint 3 begins)
Week 9 — Production operations and observability: Cloud deployment (AWS, Azure AI Foundry); observability, logging, and cost management; scaling; data flywheel and sustainable AI products.
Week 10 — Security, protection, and responsible AI: Prompt injection; sandboxing; trust and verification; the EU AI Act; fairness, bias, and interpretability.
Week 11 — Sprint 3 presentation + evaluation methods: Evaluation frameworks; red teaming; LLM-as-a-judge; AgentBench, WebArena; human-in-the-loop. (Sprint 3 ends / Sprint 4 begins)
Week 12 — Guest lectures + project refinement: Guest lecture from FORTH or industry; final project improvements, documentation, and security assessment.
Week 13 — Demo Day: Final project presentations before a committee of faculty members and invited guests; Q&A session; peer evaluation; course reflection.
Learning Outcomes
Upon successful completion of the course, students will be able to:
Knowledge and Understanding
-
Explain the architecture of programming agents based on large language models, including context windows, tool use, memory, reasoning loops, and multi-agent coordination.
-
Describe the state of the art in code-generation evaluation benchmarks (e.g., HumanEval, SWE-bench) and their limitations.
-
Analyze the main risks associated with autonomous AI systems, including prompt injection, hallucinations, overreliance, and security vulnerabilities.
-
Identify the requirements of the European Union AI Act (EU AI Act) as they apply to software-development tools and autonomous programming systems.
Skills and Application
-
Design and implement the core logic of a programming agent (reasoning loop, tool integrations, memory, and orchestration) and demonstrate an end-to-end solution using modern frameworks such as LangGraph and CrewAI.
-
Build a Retrieval-Augmented Generation (RAG) pipeline and integrate it into an agent-based system.
-
Apply software engineering best practices—including testing, CI/CD, observability, and version control—to AI-enabled projects.
-
Systematically evaluate agents with respect to functional correctness, security, and reliability, and deploy them to cloud-based production environments with appropriate monitoring and guardrails.
Judgment and Critical Thinking
-
Critically assess when AI-generated code should be trusted, verified, corrected, or rejected.
-
Analyze the ethical, legal, and societal implications of using autonomous programming agents in professional software development.
Assessment
Assessment is continuous, and there is no final written examination.
Language of assessment: Greek or English.
The assessment consists of the following components:
Weekly Quizzes (×10) — 40%
Short in-class quizzes consisting of 5–8 questions (Weeks 2–11), based on the week’s readings and/or videos.
Sprint Presentations (×3) — 30%
Team presentations held during Weeks 5, 8, and 11.
Evaluation criteria:
Each presentation contributes 10% to the final grade.
Demo Day – Final Project — 20%
Final team presentation before a faculty and invited-expert review panel.
Evaluation criteria:
-
Technical quality and ambition: 40%
-
Software engineering practices: 25%
-
Responsible AI assessment: 20%
-
Presentation clarity and effectiveness: 15%
Participation and Peer Evaluation — 10%
-
Active participation in class activities.
-
Contribution to feedback during presentations.
-
Submission of a peer-evaluation form.
All assessment criteria are announced at the beginning of the semester and are available through the course platform.