RoboGuard

Safety Guardrails for LLM-Enabled Robots

¹University of Pennsylvania ²Carnegie Mellon University

Abstract

Although the integration of large language models (LLMs) into robotics has unlocked transformative capabilities, it has also introduced significant safety concerns, ranging from average-case LLM errors (e.g., hallucinations) to adversarial jailbreaking attacks, which can produce harmful robot behavior in real-world settings. Traditional robot safety approaches do not address the novel vulnerabilities of LLMs, and current LLM safety guardrails overlook the physical risks posed by robots operating in dynamic real-world environments. In this paper, we propose RoboGuard, a two-stage guardrail architecture to ensure the safety of LLM-enabled robots. RoboGuard first contextualizes pre-defined safety rules by grounding them in the robot's environment using a root-of-trust LLM, which employs chain-of-thought (CoT) reasoning to generate rigorous safety specifications, such as temporal logic constraints. RoboGuard then resolves potential conflicts between these contextual safety specifications and a possibly unsafe plan using temporal logic control synthesis, which ensures safety compliance while minimally violating user preferences. Through extensive simulation and real-world experiments that consider worst-case jailbreaking attacks, we demonstrate that RoboGuard reduces the execution of unsafe plans from 92% to below 2.5% without compromising performance on safe plans. We also demonstrate that RoboGuard is resource-efficient, robust against adaptive attacks, and significantly enhanced by enabling its root-of-trust LLM to perform CoT reasoning. These results underscore the potential of RoboGuard to mitigate the safety risks and enhance the reliability of LLM-enabled robots.

@article{ravichandran_roboguard, title={Safety Guardrails for LLM-enabled Robots}, author={Zachary Ravichandran and Alexander Robey and Vijay Kumar and George J. Pappas and Hamed Hassani}, year={2025}, journal={arXiv preprint arXiv:2503.07885}, url={https://arxiv.org/abs/2503.07885} }

Safety Guardrails for LLM-Enabled Robots

RoboGuard is a general-purpose guardrail for ensuring the safety of LLM-enabled robots.

RoboGuard is configured offline with high-level safety rules and a robot description, reasons about how these safety rules are best applied in robot's context, then synthesizes a plan that maximally follows user preferences while ensuring safety.

Abstract

Results

Example safe and unsafe mission

BibTeX