
As Large Language Models (LLMs) become more integrated into our daily lives, powering everything from customer support chatbots to advanced coding assistants, they also become targets for sophisticated adversarial attacks such as prompt injections and jailbreak attempts. Recognizing these vulnerabilities, our research introduces JavelinGuard, a suite of transformer-based model architectures specifically optimized to safeguard LLMs against these threats efficiently and accurately.
We systematically explored five distinct model architectures designed to capture subtle malicious intent, provide interpretability, and deliver rapid, robust predictions.
We then optimized each to:
Our models were evaluated against nine diverse benchmarks, including our newly introduced JavelinBench—which emphasizes hard-to-classify, borderline prompts.
We found that JavelinGuard models consistently outperformed existing open-source solutions and large commercial LLMs (like GPT-4o) in terms of accuracy, inference speed, and false-positive minimization.
Our research also addressed key practical concerns like the "lost in the middle" issue, demonstrating strategies to maintain detection accuracy even with very long inputs. Future work will explore even more advanced architectures, innovative tokenization methods, and domain-specific benchmark development to further strengthen LLM security. We extend special thanks to Erick Galinkin from NVidia and our dedicated reviewers, whose valuable insights greatly enhanced this research.
Explore the full paper for a deep dive into each architecture and benchmark. We have been actively using these models for internal validations and are excited to see how our customers use them in real-world AI workflows across a spectrum of use cases.
🔍 Read the full paper: Link
🚀 Secure your LLM stack with Javelin: www.getjavelin.com
If you are interested in JavelinGuard models or our Security Fabric that includes runtime enforcement, we'd love to hear from you.