Back to blog

Disrupting malicious uses of AI | February 2026

Sebastian Relard

Sebastian Relard

Disrupting malicious uses of AI | February 2026
AISicherheitBedrohungen
relard.dev

AI integrations need security measures at multiple layers. I show how companies can fend off abuse scenarios such as prompt injection, malicious links, and manipulation via social media with concrete building blocks. The occasion is the new OpenAI report on malicious use, which shows how attackers combine AI with websites and accounts.

What the new OpenAI report means for your architecture

OpenAI describes in its current threat report very clearly how real attacks work today: AI is rarely abused in isolation, but in combination with websites, social media accounts, and automations. The quote is clear: "Threat activity is rarely limited to a single platform." And further: Threat actors use "different AI models at different points in their operational workflow." One case study centers on a Chinese influence actor that operates exactly this way.

For companies, this means: The risk is not only in the model. It lies in the interplay of your web app, your integrations, your agents that open links, and your connected platform accounts. OpenAI separately points to two key attack surfaces that I regularly secure in projects:

  • "Keeping your data safe when an AI agent clicks a link" makes it clear how quickly an agent can, via a simple click, get pulled into a chain of phishing, malware, or data exfiltration.
  • "How we continuously harden ChatGPT Atlas against prompt-injection attacks" shows that you cannot rely on a single prompt pattern. Injection defense is a process, not a switch.

In addition, "Trusted Access for Cyber" was introduced. I read this as a signal for stricter trust boundaries: identities, accesses, and contexts must be more tightly scoped and controlled per use case.

In short: If attackers think multi-model and multi-platform, we must build defensive layers along the entire workflow. That's exactly where I come in with relard.dev.

Security building blocks I include by default in AI integrations

I integrate AI into web apps, n8n workflows, and existing systems. Security is not an add-on but part of the design. The following building blocks have proven themselves in German companies.

  1. Input hardening and tool control
  • Strict function calls and validation: I use JSON schemas, type validators, and limits. No free text may flow directly into dangerous tools.
  • Prompt architecture with guardrails: System prompts define usage boundaries, combined with runtime checks. Example: A support bot may only quote from approved knowledge sources, never touch confidential systems.
  • Content and policy checks before the model: Toxicity, PII, and business policies are classified upfront, not retrospectively. This lets us stop risky contexts early.
  1. Secure retrieval and knowledge access
  • Source allowlist and signing: RAG pulls only from permitted buckets, indexes, and domains. Documents are hashed and signed. Unsigned content is rejected.
  • Metadata fencing: I filter by document type, department, approval status. A sales chatbot sees no HR documents, even if a user prompts for them.
  • Query sandbox: I limit query complexity and set per-request limits. Injection-like patterns in queries are rejected.
  1. Heavily restrict agents that click links The OpenAI article on link clicking is required reading for me. In practice I implement:
  • Network sandbox: headless browser without persistent storage, strict egress filtering, DNS pinning, no access to internal networks. SSRF is actively blocked.
  • Allowlist over blocklist: the agent visits only preapproved domains. Unknown means stop, not warn.
  • Content-safe reader: default is text-only rendering without JavaScript. Binary downloads are blocked. File size limits and MIME checks apply before download.
  • Reputation and scans: URL reputation, AV scan for attachments, HTML sanitizing. Suspicious redirects are aborted.
  • Human-in-the-loop: risky actions like form submissions or social posts require approval. I add approval stages to the UI or to n8n.
  1. Control outputs and make abuse harder
  • Moderation of model responses: before display or action, a policy check runs. Violations are regenerated or discarded.
  • Confidence and provenance labels: responses get a score and source attributions. Low confidence forces a follow-up question rather than an action.
  • Social posting with dual control: AI drafts go into a queue. Publish only after approval. Optionally with delay and four-eyes principle.
  1. Keys, identities, rates
  • Scoped API keys: separated per user and service. Minimal privileges, automated rotation. Secrets live in the vault, not in code.
  • Rate limiting and pattern detection: per user, per IP, per route. Anomalies like sudden model switching or high error rates trigger blocks and alerts.
  • Model router with policies: only approved models for defined tasks. No free mixing in production.
  1. Logging, data protection, compliance
  • Logs without secrets: I log metadata and masks, not raw prompts with sensitive content. PII is hashed or removed.
  • Retention and region: logs in EU regions, clear retention. GDPR and BSI baseline protection are guardrails, not footnotes.
  • Red-team playbooks: I document known attack patterns and countermeasures. That reduces response time in an incident.
  1. Operate n8n workflows securely
  • Roles, secrets, least privilege: workflows run with service accounts, each connection has minimal rights. Secrets in the n8n vault.
  • Signed webhooks and replay protection: only requests with a valid signature are processed. Nonces or timestamps prevent replays.
  • Approval nodes and audit: critical steps require approvals. Every run is traceable. Changes to workflows are versioned.
  • Isolated environments: prod and staging separated. I test prompts and tools with simulated attacks before anything goes live.

Concrete example from my day-to-day: A retailer wanted an agent to reply to Instagram comments. Risk per the OpenAI report: the social-and-web combination becomes an entry point. My solution: the agent drafts responses that go into a moderation queue. It may read only product data, not the CRM. Links in comments are never visited automatically. Domains for any references are allowlisted. Result: fast response time, but without an open flank.

A second example: Internal knowledge bot on the intranet. I built a content pipeline that signs and classifies documents and writes only approved labels into the vector index. Even if someone slips in a crafted PDF, without a signature it won’t make it into the index. Prompt injection in the document runs into a runtime check that blocks tool calls. I implement the OpenAI guidance on continuously hardening against prompt injection as a recurring test run.

How I implement this with your team

I work iteratively, with a clear focus on impact rather than buzzwords.

  • Security discovery and threat modeling: 1 to 2 workshops, data flows, attack paths, assumptions. I use a lightweight STRIDE model for AI workflows, augmented with specific patterns such as prompt injection, jailbreak, tool abuse, data exfiltration.
  • Security architecture and prioritization: I map your use cases to the building blocks above. This yields a prioritized backlog that can be delivered within 2 to 6 weeks.
  • Implementation in web app and n8n: guardrails in the API, sandbox for link handling, policy engine for outputs, approval flows in n8n, secret management, rate limiting.
  • Tests and monitoring: prompt-injection test catalog, link-click simulations, load profiles, alerting. Dashboards with metrics like block rate, error rate, time to approve, model costs.
  • Compliance by design: data minimization, EU regions, data processing agreements, TOMs. Aligned with ISO 27001 and BSI. No sensitive logs. Clear retention periods.
  • Enablement: short trainings for developers and business teams. Playbooks for incidents. Clarity about who stops what, and when.

My experience: Most companies achieve 80 percent risk reduction with 6 to 10 targeted measures, without killing the user experience. The rest is continuity. The OpenAI report reminds us that attackers iterate. So we iterate faster.

Conclusion

The OpenAI report shows that AI abuse is rarely monocausal. "Threat activity is rarely limited to a single platform" and leverages "different AI models" along a workflow. The answer is a multilayered security architecture across your entire integration. I build these layers into your web apps and n8n flows, with clear rules, measurable effects, and minimal friction. If you want to deploy AI productively and safely, we start today with the building blocks that remove the most risk.

Frequently asked questions

Do these protections noticeably slow down our AI applications?

In the short term minimally, in the long term not at all. Most checks are lightweight and parallelizable. More expensive steps like approvals only kick in for risky actions. In projects, the added latency of typical requests was usually in the tens of milliseconds.

No. But we switch from free browsing to controlled, sandboxed visits with an allowlist, JS-free rendering, and human-in-the-loop for critical actions. OpenAI addresses exactly this in "Keeping your data safe when an AI agent clicks a link." Here, security is an operating mode, not a ban.

What is the fastest first step if we want to start today?

Three measures deliver immediate impact: source allowlist for RAG, rate limiting with anomaly detection, and a moderation layer before output or action. In parallel we plan sandboxing and approval for link clicks. In 2 to 3 weeks the most important gaps are closed.

Share

AI Systems

Want AI in your app?

I build AI features with RAG, LLMs, and intelligent workflows. Production-ready, privacy-compliant, measurable.

Discuss AI Project