Safety Guardrails for LLM-Enabled Robots

1University of Pennsylvania 2Carnegie Mellon University

RoboGuard is a general-purpose guardrail for ensuring the safety of LLM-enabled robots.

Centered Image

RoboGuard is configured offline with high-level safety rules and a robot description, reasons about how these safety rules are best applied in robot's context, then synthesizes a plan that maximally follows user preferences while ensuring safety.

Abstract


Although the integration of large language models (LLMs) into robotics has unlocked transformative capabilities, it has also introduced significant safety concerns, ranging from average-case LLM errors (e.g., hallucinations) to adversarial jailbreaking attacks, which can produce harmful robot behavior in real-world settings. Traditional robot safety approaches do not address the novel vulnerabilities of LLMs, and current LLM safety guardrails overlook the physical risks posed by robots operating in dynamic real-world environments. In this paper, we propose RoboGuard, a two-stage guardrail architecture to ensure the safety of LLM-enabled robots. RoboGuard first contextualizes pre-defined safety rules by grounding them in the robot's environment using a root-of-trust LLM, which employs chain-of-thought (CoT) reasoning to generate rigorous safety specifications, such as temporal logic constraints. RoboGuard then resolves potential conflicts between these contextual safety specifications and a possibly unsafe plan using temporal logic control synthesis, which ensures safety compliance while minimally violating user preferences. Through extensive simulation and real-world experiments that consider worst-case jailbreaking attacks, we demonstrate that RoboGuard reduces the execution of unsafe plans from 92% to below 2.5% without compromising performance on safe plans. We also demonstrate that RoboGuard is resource-efficient, robust against adaptive attacks, and significantly enhanced by enabling its root-of-trust LLM to perform CoT reasoning. These results underscore the potential of RoboGuard to mitigate the safety risks and enhance the reliability of LLM-enabled robots.

Results


We evaluate RoboGuard's ability to prevent malicious behavior (harming people, blocking emergency exits, etc.) We also evalaute RoboGuard's tendency to allow for nominal behavior. RoboGuard significantly reduces the realization of unsafe plans, while still allowing for nominal behavior. RoboGuard runs in the control loop of an LLM-enalbed robot, so it constantly reasons over the robot's world model (we use a semantic graph in our implementation).

Example safe and unsafe mission



We find that RoboGuard reduces the execution of unsafe plans from 92% to under 3%, without compromising performance on safe plans.

Centered Image

BibTeX

@article{ravichandran_roboguard,
  title={Safety Guardrails for LLM-enabled Robots},: 
  author={Zachary Ravichandran and Alexander Robey and Vijay Kumar and George J. Pappas and Hamed Hassani},
  year={2025},
  journal={arXiv preprint arXiv:2503.07885},
  url={https://arxiv.org/abs/2503.07885}
}