Security of Agentic AI Systems
(CS 7670)


This site is maintained for public access, if you are enrolled in the class see the Canvas webpage for detailed information.

Class Information
Instructor
Course Description

Agentic AI systems are autonomous entities capable of perceiving, reasoning, learning, and acting toward goals using large language models (LLMs) with minimal human oversight. While these systems offer significant potential advantages, they also introduce systemic risks. Misaligned or poorly defined objectives can drive agents to take unsafe shortcuts, bypass safeguards, or behave deceptively. As AI agents become increasingly embedded in real-world applications, ensuring their security, reliability, and alignment is becoming a critical priority. In this class we will study architectures and applications of agentic AI systems, understand threat models and attacks against them, and study existing proposed defenses.

The objectives of the course are the following:

  • Provide an overview of current frameworks to develop agentic AI systems, and threat models relevant in this context.
  • Read recent, state-of-the-art research papers from both security and machine learning conferences focused on attacks against agentic AI systems and proposed defenses, and discuss them in class. Students will actively participate in class discussions, and lead discussions on multiple papers during the semester.
  • Experiment with agentic AI systems through programming exercises and a semester-long research project. Students can select the topic of the research project.

Grade

The grade will be based on participation in paper discussions in class (PD), presentations of papers in class and discussion lead (PL), one programming assignment (PA) and a research project (RP). Paper reviews are due by 9pm the day before the lecture when the paper is discussed. Submission is through Gradescope. Grade is computed as follows:

Grade = 15%*PD + 15%*PL + 20%*PA + 50%*RP.
Academic Integrity

Academic Honesty and Ethical behavior are required in this course, as it is in all courses at Northeastern University. There is zero tolerance to cheating.

You are encouraged to talk with the professor about any questions you have about what is permitted on any particular assignment.

Resources

Schedule

A tentative schedule is posted below for public access. Class platform is Canvas available through mynortheastern. All additional material for the class and all class communication will take place on Canvas. For the most updated information check Canvas.

Week Topics
Week 1 Introduction.
  • Class overview. Introduction to security, LLMs, and Agentic AI.
Week 2 Attacks against LLMs.
  • Taxonomy of adversarial attacks on predictive and generative AI. Chapters 1 and 2. PDF
  • Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. PDF
Week 3 Governing Agentic Systems.
  • Practices for Governing Agentic AI Systems PDF
  • Harms from Increasingly Agentic Algorithmic Systems. PDF
Week 4 Agent Directory Services
  • SAGA: A Security Architecture for Governing AI Agentic Systems. PDF .
  • The AGNTCY Agent Directory Service: Architecture and Implementation PDF .
Week 5 Attacks against MCP
  • When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation PDF .
  • MINDGUARD: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack via Decision Dependence Graph PDF .
  • Securing the Model Context Protocol (MCP): Risks, Controls, and Governance PDF
Week 6 Attacks against web-agents
  • Mind the Web: The Security of Web Use Agents PDF
  • WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks PDF
  • Context manipulation attacks : Web agents are susceptible to corrupted memory PDF
Week 7 Attacks against multi-agent systems
  • Multi-Agent Systems Execute Arbitrary Malicious Code PDF
  • On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents PDF
  • Demonstrations of Integrity Attacks in Multi-Agent Systems PDF
Week 8 Project proposal presentation.
Week 9
Spring break.
Week 10
Solutions against prompt injection.
  • Defeating Prompt Injections by Design PDF
  • StruQ: Defending Against Prompt Injection with Structured Queries PDF
Week 11 Defenses against attacks in agent systems
  • Breaking and Fixing Defenses Against Control-Flow Hijaking in Multi-Agent Systems PDF
  • ACE: A Security Architecture for LLM-Integrated App Systems PDF
  • Progent: Programmable Privilege Control for LLM Agents PDF
Week 12 Defenses against attacks in web-based agents
  • BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents User PDF
  • AI Kill Switch for Malicious Web-Based LLM Agent PDF
Week 13 Defenses against attacks in agent systems
  • Systems Security Foundations for Agentic Computing PDF
  • Trusted AI Agents in the Cloud PDF
Week 14 Privacy issues in agentic systems
  • Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies PDF
  • Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents PDF
Week 15 Automated red-teaming
  • RedCodeAgent: Automatic Red-Teaming Agent Against Diverse Code Agents PDF
  • Multi-Agent Penetration Testing AI for the Web PDF
Week 16
Project presentations



Additional Reading List




Copyright© 2025 Cristina Nita-Rotaru. Send your comments and questions to Cristina Nita-Rotaru