“Delete This” — And My AI Almost Did the Wrong One

There are two major concerns people raise about the risks of adopting OpenClaw. One is security risk—attacks from outside and related exposure. The other is the risk of the AI taking dangerous actions on its own. This second category includes both security incidents and operational risks such as unintended data deletion. I almost ran into a minor but similar incident this morning, so I’m writing it down.

For a while now, I’ve been using my OpenClaw agent, N.I.C.K., to help manage my Naver Mail—one of the major email services in Korea. Every morning, it creates a Discord thread and posts a summary of that day’s Naver Mail inbox. I review the emails and tell it what to delete or archive. Once that’s done, I’m essentially done with the thread.

This morning started as usual. There was only one email—something from Instagram—so I casually told N.I.C.K., “Delete this.” And then it suddenly tried to delete the Discord thread. Luckily, it failed—probably due to permission constraints. After failing, it even tried to explain to me how to delete the thread myself. I corrected it: “No, delete that email.” Only then did it actually delete the email.

The truth is, human natural language is highly ambiguous. Yet we communicate without issues most of the time for two reasons. First, we understand context. Second, when something is unclear, we ask questions to remove ambiguity. This incident was exactly that kind of case. In the command “Delete this,” the word “this” could have referred to the email, or it could have referred to the thread. So N.I.C.K.’s attempt to delete the thread wasn’t completely unreasonable.

Still, this could have turned into a bigger problem. It made me think it’s better to explicitly instruct the agent to ask when a target is ambiguous. After discussing it with N.I.C.K., I decided to add the following section to AGENT.md.

### Confirm destructive actions
For irreversible operations such as delete/modify:
- If the target is ambiguous (pronouns, omissions, context-dependent references),
always ask for clarification.
- Reconfirm explicitly (e.g., “Do you want to delete the email?” /
“Do you mean delete the calendar event?”).
- Applies to all targets: emails, calendar events, files, messages, and more.

Hope this works.


[The conversation in the captured image was originally conducted in Korean and has been translated into English using Google Lens.]

Leave a comment