Instructor
Course Description
Agentic AI systems are autonomous entities capable of perceiving, reasoning, learning, and acting toward goals using large language models (LLMs) with minimal human oversight. While these systems offer significant potential advantages, they also introduce systemic risks. Misaligned or poorly defined objectives can drive agents to take unsafe shortcuts, bypass safeguards, or behave deceptively. As AI agents become increasingly embedded in real-world applications, ensuring their security, reliability, and alignment is becoming a critical priority.
In this class we will study architectures and applications of agentic AI systems, understand threat models and attacks against them, and study existing proposed defenses.
The objectives of the course are the following:
- Provide an overview of current frameworks to develop agentic AI systems, and threat models relevant in this context.
- Read recent, state-of-the-art research papers from both security and machine learning conferences focused on attacks against agentic AI systems and proposed defenses, and discuss them in class. Students will actively participate in class discussions, and lead discussions on multiple papers during the semester.
- Experiment with agentic AI systems through programming exercises and a semester-long research project. Students can select the topic of the research project.
Grade
The grade will be based on participation in paper discussions in class (PD), presentations of papers in class and discussion lead (PL), one programming assignment (PA) and a research project (RP). Paper reviews are due by 9pm the day before the lecture when the paper is discussed. Submission is through Gradescope.
Grade is computed as follows:
Grade = 15%*PD + 15%*PL + 20%*PA + 50%*RP.
Academic Integrity
Academic Honesty and Ethical behavior are required in this course,
as it is in all courses at Northeastern University. There is zero
tolerance to cheating.
You are encouraged to talk with the professor about any questions
you have about what is permitted on any particular assignment.
Resources
- How to read research papers: [PDF]
- How to write a review [PDF]
- How to prepare presentations [HTML]
- Computing ecosystem literacy [HTML]
- Docker tutorial [HTML]
|
A tentative schedule is posted below for public access.
Class platform is Canvas available through mynortheastern.
All additional material for the class and all class communication
will take place on Canvas. For the most updated information check
Canvas.
| Week |
Topics |
| Week 1 |
Introduction.
- Class overview. Introduction to security, LLMs, and Agentic AI.
|
| Week 2 |
Attacks against LLMs.
- Taxonomy of adversarial attacks on predictive and generative AI. Chapters 1 and 2.
PDF
- Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack.
PDF
|
| Week 3 |
Governing Agentic Systems.
- Practices for Governing Agentic AI Systems PDF
- Harms from Increasingly Agentic Algorithmic Systems. PDF
|
| Week 4 |
Agent Directory Services
- SAGA: A Security Architecture for Governing AI Agentic Systems.
PDF .
- The AGNTCY Agent Directory Service: Architecture and Implementation
PDF .
|
| Week 5 |
Attacks against MCP
- When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation
PDF .
- MINDGUARD: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack
via Decision Dependence Graph
PDF .
- Securing the Model Context Protocol (MCP):
Risks, Controls, and Governance
PDF
|
| Week 6 |
Attacks against web-agents
- Mind the Web: The Security of Web Use Agents PDF
- WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
PDF
- Context manipulation attacks : Web agents are susceptible to corrupted memory
PDF
|
| Week 7 |
Attacks against multi-agent systems
- Multi-Agent Systems Execute Arbitrary Malicious Code
PDF
- On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents
PDF
- Demonstrations of Integrity Attacks in Multi-Agent Systems
PDF
|
| Week 8 |
Project proposal presentation.
|
| Week 9 |
Spring break.
|
Week 10 |
Solutions against prompt injection.
- Defeating Prompt Injections by Design PDF
- StruQ: Defending Against Prompt Injection with Structured Queries
PDF
|
| Week 11 |
Defenses against attacks in agent systems
- Breaking and Fixing Defenses Against Control-Flow Hijaking in Multi-Agent Systems
PDF
- ACE: A Security Architecture for LLM-Integrated App Systems PDF
- Progent: Programmable Privilege Control for LLM Agents
PDF
|
| Week 12 |
Defenses against attacks in web-based agents
- BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents User
PDF
- AI Kill Switch for Malicious Web-Based LLM Agent
PDF
|
| Week 13 |
Defenses against attacks in agent systems
- Systems Security Foundations for Agentic Computing
PDF
- Trusted AI Agents in the Cloud
PDF
|
| Week 14 |
Privacy issues in agentic systems
- Terrarium: Revisiting the Blackboard for Multi-Agent
Safety, Privacy, and Security Studies
PDF
- Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for
LLM-Powered Agents
PDF
|
| Week 15 |
Automated red-teaming
- RedCodeAgent: Automatic Red-Teaming Agent
Against Diverse Code Agents
PDF
- Multi-Agent Penetration Testing AI for the Web
PDF
|
| Week 16 |
Project presentations |
| |