{"id":25398,"date":"2026-03-30T17:45:28","date_gmt":"2026-03-30T13:45:28","guid":{"rendered":"https:\/\/me-en.kaspersky.com\/blog\/?p=25398"},"modified":"2026-03-30T17:45:28","modified_gmt":"2026-03-30T13:45:28","slug":"ironcurtain-ai-agent-security","status":"publish","type":"post","link":"https:\/\/me-en.kaspersky.com\/blog\/ironcurtain-ai-agent-security\/25398\/","title":{"rendered":"Why AI agents need an iron curtain"},"content":{"rendered":"<p>Many AI visionaries see the universal smart assistant \u2014 one that takes over all sorts of routine tasks \u2014 as the key direction for the technology\u2019s evolution. Experiments in this field are already in high gear and are yielding some results. Since the start of the year, the internet has been buzzing with stories of the miracles <a href=\"https:\/\/www.kaspersky.com\/blog\/openclaw-vulnerabilities-exposed\/55263\/\" target=\"_blank\" rel=\"noopener nofollow\">worked by the open-source AI agent OpenClaw<\/a>, also known as Clawdbot and Moltbot.<\/p>\n<p>If you\u2019ve been following our blog, you already know the drill: every leap forward in AI innovation right now seems to come with serious issues regarding security and privacy. To actually get things done, these agents require access to virtually all of your digital services: email, calendars, cloud storage, messaging apps, and many more.<\/p>\n<p>However, until recently, not a single project \u2014 OpenClaw included \u2014 could actually put a leash on these agents, or provide any real guarantee that they wouldn\u2019t go off the rails. But that\u2019s finally starting to change thanks to a new concept name <a href=\"https:\/\/www.wired.com\/story\/ironcurtain-ai-agent-security\/\" target=\"_blank\" rel=\"noopener nofollow\">IronCurtain<\/a> \u2014 the brainchild of researcher Niels Provos.<\/p>\n<h2>The dangers of AI agents<\/h2>\n<p>Let\u2019s keep the suspense going for a little longer, and first discuss what an AI agent gone rogue is actually capable of. It\u2019s important to remember that at the most basic level, any modern AI tool is <a href=\"https:\/\/www.kaspersky.com\/blog\/chat-gpt-changes-all\/47405\/\" target=\"_blank\" rel=\"noopener nofollow\">built on a language model<\/a> \u2014 essentially a text-processing algorithm fed a massive volume of data in its training phase. The result is a statistical model capable of determining the probability of which word will most likely follow another.<\/p>\n<p>A language model is a black box. In practice, this means nobody \u2014 not even its creators \u2014 fully understands exactly how an AI tool works under the hood. An obvious consequence is that AI developers themselves don\u2019t entirely know how to control or restrict these systems at the model level; instead, they have to invent external guardrails of varying degrees of effectiveness and reliability.<\/p>\n<p>Meanwhile, the methods used to bypass these safeguards often prove to be quite unexpected. For example, we recently shared how chatbots can be coaxed into forgetting almost all their safety instructions if you charm them with <a href=\"https:\/\/www.kaspersky.com\/blog\/poetry-ai-jailbreak\/55171\/\" target=\"_blank\" rel=\"noopener nofollow\">prompts written in verse<\/a>.<\/p>\n<p>But back to the threats posed by AI agents. The inability to fully control or predict the actions of smart assistants often leads to outcomes that no one could have expected. A prime example is the high-profile case where OpenClaw <a href=\"https:\/\/x.com\/summeryue0\/status\/2025774069124399363?s=20\" target=\"_blank\" rel=\"noopener nofollow\">nuked every single email in its owner\u2019s Gmail inbox<\/a> \u2014 despite being explicitly told to wait for confirmation before doing anything \u2014 only to apologize afterwards and promise it wouldn\u2019t happen again.<\/p>\n<div id=\"attachment_55529\" style=\"width: 953px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2026\/03\/30174120\/ironcurtain-ai-agent-security-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-55529\" class=\"wp-image-55529 size-full\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2026\/03\/30174120\/ironcurtain-ai-agent-security-01.jpg\" alt=\"This chat between the OpenClaw bot and its owner resembles a conversation with a teenager who's just messed up\" width=\"943\" height=\"1284\"><\/a><p id=\"caption-attachment-55529\" class=\"wp-caption-text\">This chat between the OpenClaw bot and its owner resembles a conversation with a teenager who\u2019s just messed up: \u201cWhat did I tell you?!\u201d \u2013 \u201cGeez, Mom, I\u2019m sorry, I won\u2019t do it again \u2014 I promise.\u201d <a href=\"https:\/\/x.com\/summeryue0\/status\/2025774069124399363\/photo\/3\" target=\"_blank\" rel=\"nofollow noopener\">Source<\/a><\/p><\/div>\n<p>In another instance, a journalist testing an AI agent\u2019s capabilities found that the system had pivoted to a highly questionable plan of action while executing a task. Instead of attempting a constructive solution, the agent decided to <a href=\"https:\/\/www.wired.com\/story\/malevolent-ai-agent-openclaw-clawdbot\/\" target=\"_blank\" rel=\"noopener nofollow\">launch a phishing attack on the user<\/a>. Seeing the system\u2019s logic unfolding on the screen, the journalist immediately pulled the plug on the experiment.<\/p>\n<p>Beyond spontaneous bad behavior, AI remains vulnerable to prompt injection attacks. In this type of attack, a threat actor smuggles their own malicious instructions into a command or the data being processed (direct prompt injection), or, in more sophisticated cases, even into third-party content used by the agent to do its job (indirect prompt injection). The large language model perceives these instructions as part of the user\u2019s request; as a result, the AI may ignore its original constraints and help the attacker.<\/p>\n<p>Additional danger stems from <a href=\"https:\/\/securityscorecard.com\/blog\/how-exposed-openclaw-deployments-turn-agentic-ai-into-an-attack-surface\/\" target=\"_blank\" rel=\"noopener nofollow\">vulnerabilities within AI agents<\/a> that could potentially allow attackers to access user data the agent is authorized to see \u2014 including passwords, encryption keys, and other secrets \u2014 or even grant the ability to execute arbitrary code on the host system.<\/p>\n<p>Of course, this list of threats is by no means exhaustive. As we\u2019ve said time and again, no one knows the full extent of the risks associated with AI. However, researcher Niels Provos recently proposed an approach to help put a leash on AI agents to make them more controllable and mitigate the potential threats.<\/p>\n<h2>How Iron Curtain makes AI agents safe to use<\/h2>\n<p><a href=\"https:\/\/ironcurtain.dev\" target=\"_blank\" rel=\"noopener nofollow\">IronCurtain<\/a>, Niels Provos\u2019s new open-source solution, uses an added security buffer between the AI agent and the user\u2019s system.<\/p>\n<p>Instead of giving the AI agent free rein on your system, it forces the agent to work from inside an isolated virtual machine that sits between the bot and your actual accounts. This isolation allows the agent\u2019s actions to be separated from the user\u2019s own, reducing risks if the agent decides to go rogue.<\/p>\n<blockquote><p>Why did Provos use the name \u201cIronCurtain\u201d? Many will presume it\u2019s a reference to the notional barrier that divided Western Europe and the Warsaw Pact countries of Eastern Europe in the second half of the 20<sup>th<\/sup> century. However, the author himself states there is no such connection.<\/p>\n<p>The project\u2019s name doesn\u2019t refer to a political metaphor at all, but rather\u2026 to a theatrical term. In a theater, an iron curtain is a fireproof partition between the stage and the auditorium. If a fire breaks out on stage, the curtain drops to prevent the flames from spreading. By this analogy, the AI agent is \u201con stage\u201d, while the user\u2019s system with all its files and data is in the \u201cauditorium\u201d. IronCurtain acts as that protective barrier between them.<\/p><\/blockquote>\n<p>However, isolation is only part of the solution. At the heart of the system is a security policy that determines which actions the agent is permitted to perform. The design of IronCurtain allows the user to write their own security instructions \u2014 defining what the agent can and can\u2019t do \u2014 in plain English (no word of support for other languages yet).<\/p>\n<p>The system then uses AI to transform these instructions into a formalized security policy applied to the agent\u2019s actions across the board. Every request it makes to external services \u2014 whether email, messaging, or file management \u2014 is run through this policy to make sure the agent isn\u2019t overstepping its bounds.<\/p>\n<p>The security policy set during the initial configuration can \u2014 and should \u2014 evolve over time. According to Provos\u2019s vision, when encountering ambiguous situations, the AI should reach out to the user with follow-up questions and update the instructions from their responses.<\/p>\n<p>IronCurtain is available to anyone on <a href=\"https:\/\/github.com\/provos\/ironcurtain\" target=\"_blank\" rel=\"noopener nofollow\">GitHub<\/a>, but making it work on your computer takes some serious engineering skills. Remember too that, for now, this is merely an R&amp;D prototype.<\/p>\n<h2>Can IronCurtain be a proper fix?<\/h2>\n<p>Niels Provos\u2019s solution sure does look interesting, and aligns with some experts\u2019 views on an ideal approach to AI safety. However, it\u2019s too early to consider IronCurtain a definitive solution to the problem.<\/p>\n<p>Its biggest obvious flaw is that it\u2019s a resource hog. Using an isolated environment for every AI agent requires serious computing power, and complicates infrastructure \u2014 especially when multiple agents are running simultaneously.<\/p>\n<p>Furthermore, as mentioned, IronCurtain is still very much in the prototype phase: practical effectiveness hasn\u2019t been proven yet. In particular, there\u2019s a significant question mark over how accurately natural language instructions can be converted into formalized security policies.<\/p>\n<p>It\u2019s also a coin toss as to whether this architecture can truly stop prompt injection. Sadly, the root of the problem is the fundamental <a href=\"https:\/\/arxiv.org\/html\/2403.06833v1\" target=\"_blank\" rel=\"noopener nofollow\">inability<\/a> of modern LLMs to distinguish between data and instructions.<\/p>\n<p>Despite all its limitations, IronCurtain represents a major step toward safer and tamer AI agents. At a minimum, this approach provides a vital blueprint for future development, allowing for a substantive debate on how to make such systems reliable and effective.<\/p>\n<h2>How to use AI assistants safely<\/h2>\n<p>While architectures like IronCurtain remain experimental in nature, the responsibility for using AI safely rests primarily with users themselves. So, to wrap things up, let\u2019s break down a few simple rules to help mitigate risks when working with AI assistants.<\/p>\n<ul>\n<li><strong>Evaluate the risks properly<\/strong> before experimenting with the next big thing. Think about what could go wrong and the possible fallout. The internet is already full of real-life examples from users, so you can learn from that collective experience.<\/li>\n<li><strong>Avoid giving AI agents excessive access privileges<\/strong>. If an assistant only needs access to a calendar or a specific folder, don\u2019t connect your entire email, cloud storage, and work accounts to it.<\/li>\n<li><strong>Verify AI actions before they\u2019re executed<\/strong>. Even if your agent offers to automate a task, it\u2019s better to manually confirm important operations like sending emails, deleting data, or making payments. Yes, the agent might still misbehave, but you should at least try to rein it in.<\/li>\n<li><strong>Install a <a href=\"https:\/\/me-en.kaspersky.com\/premium?icid=me-en_bb2022-kdplacehd_acq_ona_smm__onl_b2c_kdaily_lnk_sm-team___kprem___\" target=\"_blank\" rel=\"noopener\">reliable security solution<\/a><\/strong>\u00a0on all the devices you use, just in case a mischievous AI agent brings back some nasty malware as a souvenir from its uncontrolled wanderings across the web.<\/li>\n<\/ul>\n<blockquote><p>What else you should know about using AI safely:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/chatgpt-privacy-and-security\/54607\/\" target=\"_blank\" rel=\"noopener nofollow\">Privacy settings in ChatGPT<\/a><\/li>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/deepseek-privacy-and-security\/54643\/\" target=\"_blank\" rel=\"noopener nofollow\">DeepSeek: configuring privacy and deploying a local version<\/a><\/li>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/how-to-switch-off-ai\/55383\/\" target=\"_blank\" rel=\"noopener nofollow\">Unplugged: how to disable AI on your computer and smartphone<\/a><\/li>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/openclaw-vulnerabilities-exposed\/55263\/\" target=\"_blank\" rel=\"noopener nofollow\">Don\u2019t get pinched: the OpenClaw vulnerabilities<\/a><\/li>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/ai-sidebar-spoofing-atlas-comet\/54769\/\" target=\"_blank\" rel=\"noopener nofollow\">AI sidebar spoofing: a new attack on AI browsers<\/a><\/li>\n<\/ul>\n<\/blockquote>\n<input type=\"hidden\" class=\"category_for_banner\" value=\"premium-geek\">\n","protected":false},"excerpt":{"rendered":"<p>Researcher Niels Provos\u2019 prototype IronCurtain architecture: a system designed to restrict the actions of AI agents through isolation and security policies.<\/p>\n","protected":false},"author":2726,"featured_media":25401,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1226],"tags":[1481,1583,1217,2611,2859,2739,2822,1415,2117,2873,43],"class_list":{"0":"post-25398","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technology","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-chatbots","11":"tag-chatgpt","12":"tag-deepseek","13":"tag-gemini","14":"tag-llm","15":"tag-machine-learning","16":"tag-neural-networks","17":"tag-openclaw","18":"tag-privacy"},"hreflang":[{"hreflang":"en-ae","url":"https:\/\/me-en.kaspersky.com\/blog\/ironcurtain-ai-agent-security\/25398\/"},{"hreflang":"en-in","url":"https:\/\/www.kaspersky.co.in\/blog\/ironcurtain-ai-agent-security\/30349\/"},{"hreflang":"en-gb","url":"https:\/\/www.kaspersky.co.uk\/blog\/ironcurtain-ai-agent-security\/30195\/"},{"hreflang":"ru","url":"https:\/\/www.kaspersky.ru\/blog\/ironcurtain-ai-agent-security\/41602\/"},{"hreflang":"x-default","url":"https:\/\/www.kaspersky.com\/blog\/ironcurtain-ai-agent-security\/55526\/"},{"hreflang":"ru-kz","url":"https:\/\/blog.kaspersky.kz\/ironcurtain-ai-agent-security\/30463\/"},{"hreflang":"en-au","url":"https:\/\/www.kaspersky.com.au\/blog\/ironcurtain-ai-agent-security\/36084\/"},{"hreflang":"en-za","url":"https:\/\/www.kaspersky.co.za\/blog\/ironcurtain-ai-agent-security\/35736\/"}],"acf":[],"banners":"","maintag":{"url":"https:\/\/me-en.kaspersky.com\/blog\/tag\/ai\/","name":"AI"},"_links":{"self":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/25398","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/users\/2726"}],"replies":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/comments?post=25398"}],"version-history":[{"count":2,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/25398\/revisions"}],"predecessor-version":[{"id":25402,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/25398\/revisions\/25402"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/media\/25401"}],"wp:attachment":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/media?parent=25398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/categories?post=25398"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/tags?post=25398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}