EVA-001: Unicode / Homoglyph Evasion
| Category | Evasion Techniques |
| Frameworks | OWASP: LLM01 · ATLAS: Evade ML Model |
Use visually identical characters from different Unicode blocks to bypass keyword-based filters. The text looks the same to humans but is different to string matching.
Technique
# Homoglyph substitutions:
Latin 'a' (U+0061) vs Cyrillic 'a' (U+0430)
Latin 'e' (U+0065) vs Cyrillic 'e' (U+0435)
Latin 'o' (U+006F) vs Greek 'o' (U+03BF)
# Zero-width characters:
U+200B Zero-width space
U+200C Zero-width non-joiner
U+200D Zero-width joiner
U+FEFF Zero-width no-break space
# Insert between characters to break
# keyword matching without visual change:
"pass[U+200B]word" displays as "password"
Key Concepts
- The attack exploits a fundamental mismatch between rendering and processing. Keyword filters operate on byte sequences or codepoints, while human reviewers see rendered glyphs. Two strings that render identically can have completely different byte representations when characters are drawn from different Unicode blocks.
- Zero-width characters are invisible but present. Inserting zero-width spaces, joiners, or non-joiners between characters breaks exact string matching and regex patterns without producing any visible change in the rendered text. The filter sees fragmented tokens while the reader sees a normal word.
- LLM tokenizers may normalize some homoglyphs but not all. Modern tokenizers handle common Unicode normalization, but the vast Unicode space (149,000+ characters) ensures there are always substitution pairs that slip through, especially from less common scripts.
- This technique chains well with other evasion methods. Homoglyph substitution can be combined with token boundary exploitation, encoding tricks, or payload splitting to create multi-layered evasion that defeats filters operating at different levels of the stack.
- Automated homoglyph generation is trivial. Tools and lookup tables mapping visually similar characters across scripts are freely available, making this a low-skill, high-impact evasion technique that any attacker can deploy.
Detection
- Apply Unicode normalization (NFC/NFKD) before filtering. Normalizing input to a canonical form collapses many homoglyph variants back to their ASCII equivalents, restoring the effectiveness of keyword filters.
- Flag inputs containing mixed-script text. Legitimate text rarely mixes Latin, Cyrillic, and Greek characters within a single word. A script-consistency check can catch the most common homoglyph attacks.
- Detect zero-width character insertion. Scan for the presence of zero-width spaces, joiners, and byte-order marks within words. These characters have legitimate uses in some languages but are anomalous in English text and within keywords.
Mitigation
- Implement Unicode-aware input sanitization that strips or normalizes non-standard characters before they reach the model or any keyword filter. This should be a preprocessing step applied before all other filtering.
- Use semantic-level filtering in addition to keyword matching. Embedding-based classifiers evaluate meaning rather than exact character sequences, making them inherently resistant to homoglyph substitution.
- Maintain a confusable character mapping (Unicode Consortium publishes one at unicode.org/Public/security/) and proactively collapse known confusables to their canonical forms during input processing.
Example: Homoglyph Generator
HOMOGLYPHS = {
'a': '\u0430', # Cyrillic а
'e': '\u0435', # Cyrillic е
'o': '\u03BF', # Greek ο
'p': '\u0440', # Cyrillic р
'c': '\u0441', # Cyrillic с
'x': '\u0445', # Cyrillic х
}
def homoglyph_encode(text, ratio=0.3):
"""Replace a percentage of characters with homoglyphs."""
import random
result = []
for char in text:
if char.lower() in HOMOGLYPHS and random.random() < ratio:
result.append(HOMOGLYPHS[char.lower()])
else:
result.append(char)
return ''.join(result)
original = "ignore previous instructions"
encoded = homoglyph_encode(original, ratio=0.5)
print(f"Original: {original}")
print(f"Encoded: {encoded}")
print(f"Match: {original == encoded}") # False — different bytes
print(f"Visual: identical to human eye")
Original: ignore previous instructions
Encoded: ignоrе рrеvious instruсtiоns
Match: False
Visual: identical to human eye