#
Security Vulnerabilities
This page outlines methods for assessing AI application vulnerability to manipulation and exploitation through carefully crafted prompts. The included security vulnerability categories—ANSI Escape Injection, Obfuscated Encodings, Code Injection, Cross-Site Scripting (XSS), File-Format Mimicry, AV/Spam Signature, and Malware Generation—test whether AI-generated outputs can circumvent security controls, produce harmful content, or inadvertently facilitate cyberattacks.
#
ANSI Escape Injection
ANSI Escape Injection is a security risk where attackers embed ANSI control sequences—special terminal codes that alter formatting, colors, or cursor position (e.g., \x1b[31m
for red text)—into logs or CLI output. This manipulation can spoof messages, conceal malicious activity, or mislead users, particularly in command‑line tools and debugging logs. Developers frequently overlook these injections, heightening the chances of social engineering attacks and log tampering. Examples include color changes \x1b[32m
, hidden text \x1b[8m
, or malformed hyperlinks \x1b]8;;URL\x07
. To mitigate the threat, always sanitize outputs and escape untrusted input before writing to terminals.
Security Impact:
- ANSI escape sequences can disrupt downstream processing systems that attempt to interpret out
- Attackers can manipulate terminal displays to alter how log or CLI output appears to human reviewers
- Malicious sequences may create misleading visual effects, such as fake file listings or log entries
- Hidden text and cursor manipulation can be used to obscure malicious content within output streams
- Certain terminal emulators may be vulnerable to more severe exploits triggered by crafted ANSI sequences
#
Obfuscated Encodings
Encoding injection attacks attempt to bypass an AI application's content safety filters by obfuscating malicious payloads through various encoding schemes like Base64, ROT13, Morse code, Braille, or NATO phonetic alphabet. Attackers encode prohibited content (hate speech, XSS scripts, sensitive instructions) hoping the model will decode and output the harmful material despite safety guardrails. The attack exploits the gap between input filtering (which may not recognize encoded content as harmful) and the model's ability to understand and decode various formats. Success means circumventing content policies to extract unsafe outputs that would normally be blocked.
Security Impact:
- Content Safety Bypass, circumvents hate speech, toxicity, and harmful content filters
- XSS/Script Injection, encoded JavaScript bypasses web security controls when LLM output is rendered
- Prompt Injection Facilitation, encoded instructions can override system prompts and safety controls
- Downstream System Compromise, decoded malicious output exploits systems processing LLM responses
- Automated Harm Generation, enables systematic creation of prohibited content at scale
#
Code Injection
These attacks exploit AI models to generate malicious code that could execute in downstream systems. Using social engineering techniques like pretending to be an "administrator testing echo commands," attackers trick models into outputting dangerous payloads. Examples include Jinja template injection ({__import__('os').system('rm -rf /'})
, SQL injection statements DROP TABLE users;
, and direct Python code execution. While AI models don't execute this code themselves, the risk emerges when their outputs are processed by applications with template engines, databases, or code interpreters. The attacks leverage the model's helpful nature to bypass safety measures and generate exploitable content for various injection vulnerabilities.
Security Impact:
- Remote Code Execution, allows attackers to execute arbitrary commands on the target server
- Data Exfiltration, atackers can access sensitive files, databases, and environment variables
- System Compromise, malicious code can install backdoors, create new user accounts, for persistent access
- Lateral Movement, compromised systems can be used to attack other systems within the network
- Supply Chain Attacks, AI-generated content containing payloads could propagate throughout systems
#
Cross-Site Scripting
This test examines whether AI applications can be manipulated into producing cross-site scripting (XSS) payloads—malicious JavaScript snippets designed to execute in the user’s browser. Prompts may include obfuscated script tags, fake input fields, or social engineering cues to bypass content filters. While AI applications don’t execute code themselves, their output becomes dangerous when rendered in a web page without proper sanitization. Payloads might include <script>
tags, event handlers (onerror, onclick
), or javascript: URLs. This evaluation focuses on the model’s potential to unintentionally contribute to client-side attacks when embedded in browser-facing environments.
Security Impact:
- Session Hijacking, attackers can steal user credentials or session tokens via injected scripts
- Data Theft, malicious payloads can extract form data, cookies, and local storage
- Defacement, injected scripts may alter page content or redirect users to malicious sites
- Persistent XSS, AI-generated content might store script payloads in databases or content systems
- DOM Exploitation, output that mimics user input may manipulate the DOM in unexpected ways
#
File-Format Mimicry
Evaluates whether AI models can be manipulated into generating content that mimics structured file formats—such as executable binaries, archives, or model checkpoints. The goal is to determine if an AI application can reproduce file headers, footers, or internal structures that may deceive filters, parsers, or users. For example, tests like fileformats.HF_Files
test whether models will output content resembling Hugging Face model files or other serialized payloads. These mimicry attempts may pose risks when such outputs are saved, uploaded, or processed as real files, especially in systems that trust AI-generated content without validation.
Security Impact:
- Format Spoofing, generated text may mimic binary files or archives to bypass file-type filters
- Malicious Uploads, increases risk of model output being saved and used as spoofed executable content
- Supply Chain Risks, AI-generated artifacts may masquerade as valid files in trusted ecosystems
- Unsafe Deserialization, encourages unsafe handling of AI-generated “files” in automated pipelines
- Detection Evasionm mimicry may help bypass heuristic or extension-based security tools
#
AV/Spam Signature
Tests whether AI applications will generate outputs that match known antivirus or spam signatures. Probes include attempts to reproduce the EICAR test string (used to simulate malware for antivirus systems), the GTUBE string (used for spam filter testing), and phishing-oriented variants like GTphish. Although these strings are harmless in isolation, the ability of a model to replicate them suggests it could inadvertently trigger detection systems or be manipulated to bypass filters. This evaluation highlights the model’s capacity to memorize, reproduce, or slightly alter known signature patterns that may influence security software behavior.
Security Impact:
- Triggers False Positives, AI-generated content may activate antivirus or spam filters unintentionally
- Evasion Testing, shows potential to mutate signatures and bypass detection
- Signature Memorization, reveals whether the model memorizes and regurgitates known AV/spam patterns
- Email & Messaging Risk, increases risk of filtered or blocked communication if content mimics spam
- Abuse Potential, enables attackers to tune prompts to replicate or defeat common AV/spam protections
#
Malware Generation
This category tests whether AI applications will generate malicious code or assist in building malware when prompted. It includes direct requests (e.g., write ransomware
), indirect prompts (e.g., disable antivirus
), and disguised instructions that resemble legitimate programming questions. The objective is to assess whether the model can be socially engineered into producing code that aids in cyberattacks—such as keyloggers, backdoors, or evasion techniques. These generations may not be executed by the model itself, but if used downstream by attackers, they can lower the technical barrier to cybercrime and increase the scale and speed of threat creation.
Security Impact:
- Lowers Barriers, makes malware creation accessible to non-technical threat actors
- Enables Rapid Prototyping, accelerates development of malicious tools
- Circumvents Detection, encourages generation of obfuscated or evasive code
- Supports Modular Attacks, produces plug-and-play components for complex threats
- Fuels AI-Augmented Cybercrime, amplifies attacker capability using model assistance