Skip to main content

PI-004: Semantic Smuggling

CategoryPrompt Injection
FrameworksOWASP: LLM01 · FG-I003

Encode payloads to bypass keyword-based filters while preserving meaning for the model. Exploits the gap between filter logic and model comprehension.

Technique

# Encoding techniques
- Base64: encode payload, ask model to decode
- ROT13: simple substitution cipher
- Pig Latin / word reversal
- Unicode homoglyphs (visually identical chars)
- Token splitting: "pass" + "word" = "password"
- Language switching: payload in another language
- Leetspeak: r00t, p4ssw0rd

Key Concepts

  • Semantic smuggling exploits a fundamental architectural gap: input filters and the LLM operate at different levels of comprehension. Keyword filters match strings; models understand meaning. Any encoding that changes the string representation while preserving recoverable meaning bypasses the filter but not the model.
  • Base64 encoding is particularly effective because LLMs can decode Base64 natively (it appears extensively in training data). The filter sees an opaque string, but the model decodes and executes the payload.
  • Language switching exploits the fact that most safety filters are optimized for English. A payload in Mandarin, Arabic, or a low-resource language may pass through filters entirely while the multilingual model comprehends and follows it.
  • Token splitting breaks keywords into fragments that individually appear benign. The model's contextual understanding reassembles the fragments, but substring-matching filters never see the complete keyword.
  • These techniques can be layered: encode a payload in Base64, split it across multiple messages, and request the model decode and concatenate. Each layer adds bypass coverage against different filter types.

Detection

  • Deploy decoding-aware input filters that detect and decode Base64, ROT13, and other common encodings before evaluating content against safety rules.
  • Implement multilingual safety classification rather than English-only keyword matching, covering the languages the model supports.
  • Use anomaly detection to flag inputs with unusual character distributions, encoding patterns, or mixed-script content that may indicate smuggling attempts.

Mitigation

  • Shift from keyword-based to semantic-based input filtering that evaluates meaning regardless of encoding, language, or character substitution.
  • Implement a canonicalization step in the input pipeline that normalizes Unicode homoglyphs, decodes common encodings, and standardizes text before safety evaluation.
  • Add output-side semantic classification as a second layer, catching cases where smuggled input produces restricted output even if the input filter was bypassed.