ChatGPhish: How Attackers Are Using ChatGPT to Deliver Phishing Links Inside Your Own Responses
ChatGPT has not been hacked. Something more subtle is happening. Attackers have found a way to make ChatGPT generate phishing links, fake alerts, and malicious QR codes inside its own interface — delivered in responses that look completely indistinguishable from normal ChatGPT output. The attack is called ChatGPhish, and it is already active.
What is ChatGPhish?
ChatGPhish is a class of attack that exploits ChatGPT's ability to read and summarise external content. When a user asks ChatGPT to summarise a webpage, the model fetches and processes that page's content. ChatGPhish works by hiding attacker instructions inside that content — instructions that ChatGPT then follows, producing phishing output that appears to originate from the model itself.
ChatGPhish is not a compromise of ChatGPT. It is an indirect prompt injection attack — a technique that plants instructions inside external content that an AI model is asked to process, hijacking the model's output without ever touching the model itself.
This distinction matters. A hacked ChatGPT would be OpenAI's problem to fix. An indirect prompt injection attack is a structural challenge for every AI model that reads external content — and the attack surface grows every time someone asks an AI to summarise, analyse, or interact with a webpage they did not write themselves.
What is an indirect prompt injection attack?
To understand why ChatGPhish works, you need to understand how large language models process instructions.
When you send a message to ChatGPT, the model treats your input as a prompt — an instruction to follow. When ChatGPT is asked to summarise a webpage, it fetches that page and treats its content as data to process. The vulnerability is that the model cannot reliably distinguish between data it was asked to summarise and instructions it was asked to follow.
Data is processed as data
The model reads the webpage content, extracts the relevant information, and returns a neutral summary. Instructions embedded in the page content are ignored because they are data, not commands.
Data is processed as instructions
Attacker-written text on the page — formatted to resemble legitimate instructions — is processed by the model as commands. The model follows them, generating whatever output the attacker specified, including phishing links and fake alerts.
Indirect prompt injection differs from direct prompt injection (where you manipulate the model by crafting your own input) because the malicious instructions never come from the user. They come from a third-party source the user trusted enough to ask an AI to read.
How the ChatGPhish attack unfolds
The attack requires no malware, no account compromise, and no technical skill from the victim. It requires only that the victim ask ChatGPT to interact with attacker-controlled content.
The attacker prepares a weaponised page
An attacker creates or compromises a webpage and embeds hidden instructions in the content — formatted text that instructs ChatGPT to behave differently when it reads the page. These instructions may be invisible to human readers (white text on white background, hidden in metadata, or buried in page structure) but are fully visible to an AI model processing the raw content.
The victim asks ChatGPT to summarise the page
This is the normal, routine use case. The victim pastes a URL into ChatGPT and asks for a summary, a translation, key points, or any other content task. Nothing about this step looks suspicious. It is something millions of people do every day.
ChatGPT reads the attacker's instructions
As ChatGPT processes the page, it encounters the embedded instructions. Because the model cannot reliably distinguish between data and commands embedded in external content, it begins following the attacker's directives instead of — or alongside — completing the user's original request.
ChatGPT generates attacker-controlled output
The response the victim receives contains what the attacker specified: phishing links, fake security alerts, malicious QR codes, or instructions designed to extract credentials or personal information. This output appears inside ChatGPT's own interface, in ChatGPT's own visual style, formatted as if ChatGPT itself generated it.
The victim cannot tell the difference
There is no visual indicator that the response has been manipulated. No warning, no changed formatting, no suspicious sender. The phishing content looks exactly like a normal ChatGPT response — because structurally, it is one. The attacker used the model as the delivery mechanism.
Why you cannot tell the difference
Every visual and contextual signal you rely on to identify a phishing attempt is absent. The message is inside a tool you trust. It uses that tool's interface. It was generated by that tool's model. The only thing that came from the attacker is the instruction that shaped the output — and that is invisible.
Traditional phishing detection relies on signals: an unfamiliar sender, a suspicious domain, an unusual request from an unexpected channel. ChatGPhish eliminates all of them. The channel is ChatGPT. The sender is ChatGPT. The domain the link appears inside is ChatGPT's own interface.
This is what makes indirect prompt injection categorically different from standard phishing. It does not trick you into trusting a fake version of something real. It hijacks something real and uses it against you.
"The attack does not hack ChatGPT. It uses ChatGPT as a weapon."
Who is at risk right now
The risk is highest for users who routinely ask AI tools to process external content: researchers summarising articles, professionals reviewing documents, developers reading technical pages, journalists analysing sources. These are exactly the workflows that indirect prompt injection attacks are designed to exploit — high-trust, high-volume, low-suspicion.
The scale of the underlying AI threat problem is visible in Uncovai's own usage data. The video detection tool alone serves over 2,700 unique visitors every month — users who have already identified a need to verify AI-generated content before trusting it. ChatGPhish extends that need from video and images into AI chat interfaces themselves, a surface that existing detection habits do not yet cover.
Business and enterprise users
Teams using ChatGPT to process supplier documents, competitor pages, client materials, or external reports. Any document from outside the organisation is a potential injection vector if it passes through an AI model. The same teams increasingly rely on AI audio detection and image verification — ChatGPhish adds a third surface to protect.
Researchers and journalists
Users who regularly ask AI tools to summarise sources, analyse public documents, or process content from unfamiliar websites. The routine nature of these tasks makes the attack nearly invisible in a normal workflow. Checking sources with Uncovai's AI text detector before acting on summaries is the practical mitigation.
Consumers
Anyone who asks ChatGPT to summarise a product review page, a travel booking site, or a news article. User-generated content platforms — where attackers can publish weaponised pages directly — are a particular risk surface. Over 3,800 users already use Uncovai's AI scam detector monthly for exactly this category of threat.
Developers using the API
Applications that pipe external content through the OpenAI API — scraping workflows, document processing pipelines, automated summarisation tools — carry this risk for every piece of content they process from untrusted sources. Integrating URL phishing detection at the pipeline layer is the architectural fix.
What to do right now — before a patch exists
There is no complete technical fix available at the time of writing. Mitigating this attack requires behavioural changes, not software updates.
Do not click links inside AI summaries
Any link that appears in a ChatGPT response to a summarisation request could be attacker-generated. If you need a link from a page, go to the original source directly — not via a URL returned by the AI.
Do not scan QR codes from AI responses
QR codes appearing in AI-generated content are a known ChatGPhish output vector. Treat any QR code generated in response to an external content request as untrusted until this class of attack has a mitigation in place.
Avoid asking AI to summarise user-generated content
User-generated content platforms are the easiest place for attackers to publish weaponised pages — anyone can post there. Summarising forum posts, review pages, comment threads, or community wikis carries the highest injection risk.
Be suspicious of unexpected urgency in AI responses
A ChatGPT response that suddenly asks you to verify your account, click a security link, or enter credentials is a strong indicator of an injection attack. ChatGPT does not generate unprompted security alerts — if one appears, treat the entire response as compromised.
Enterprise teams: audit AI workflows that process external content
Any automated pipeline that feeds external URLs or documents into an AI model is a potential injection surface. Review which sources those pipelines process and whether the output is acted upon automatically — those are the highest-risk configurations.
Why this matters beyond ChatGPT
ChatGPhish is not a ChatGPT-specific problem. It is the first widely documented instance of a vulnerability class that applies to every AI model with access to external content: Gemini, Claude, Copilot, Perplexity, and any AI-powered browser extension or summarisation tool.
The attack works wherever three conditions are present: an AI model processes external content, that content can be influenced by an attacker, and the model's output is trusted by the user. All three conditions apply across the entire category of AI tools that read the web.
As AI tools become the primary interface through which people consume and process information, indirect prompt injection becomes one of the most scalable phishing vectors ever developed. The attacker does not need to reach the user. They need to reach any page the user might ask an AI to read.
This is the structural shift. Traditional phishing required the attacker to get their message in front of the victim directly — via email, SMS, or a malicious link. Indirect prompt injection inverts that model. The attacker poisons the source. The victim's own trusted AI tool delivers the payload. It is the same logic that makes dark web deepfake operations so effective — the attack surface is not the platform, it is the content flowing through it.
Gemini and Google tools
Google's AI integrations across Search, Workspace, and Chrome process external content at scale. The same injection vectors apply wherever those tools summarise or analyse third-party pages.
Microsoft Copilot
Copilot's deep integration with Office and the browser means enterprise documents and external pages routinely pass through the model. Injections in supplier documents or external reports could affect entire organisations.
AI browser extensions
Extensions that summarise pages as you browse them process every page you visit. A weaponised page encountered during normal browsing — not even one you actively asked to summarise — could trigger an injection in a sufficiently autonomous tool.
How Uncovai can help
ChatGPhish exposes a gap that traditional security tools were never designed to close. Antivirus software does not scan ChatGPT responses. Email filters do not inspect AI-generated output. Browser security extensions do not flag phishing links that appear inside an AI chat interface. Uncovai was built for exactly this category of threat — AI-generated and AI-delivered content that bypasses every conventional detection layer.
Uncovai already serves thousands of users every month across its detection tools — over 3,800 on the core AI detector, over 2,700 on video detection, and over 1,000 on image detection. The URL phishing detection page is the fastest-growing surface — and ChatGPhish is exactly why: as AI tools begin delivering phishing links directly inside trusted interfaces, independent URL verification becomes the last reliable check before a click causes damage.
Phishing URL detection
Uncovai's phishing link detector analyses URLs for structural anomalies, domain registration signals, AI-generated page content, and redirect chain behaviour — catching malicious links even when they carry no prior reputation history. If a ChatGPhish-injected response drops a link into your session, Uncovai identifies it before you click.
AI-generated text detection
Uncovai's AI text detection identifies content produced or manipulated by large language models — including the kind of synthetic alerts, fake security notices, and urgency-injected copy that ChatGPhish generates inside AI responses. If the output you are reading was shaped by an attacker rather than the model, the statistical fingerprint is there.
Real-time detection, no GPU required
Uncovai's detection engine returns verdicts in under three seconds on standard CPU infrastructure. There is no latency penalty, no hardware investment, and no workflow disruption. It integrates directly via REST API into any pipeline that processes AI-generated output — including enterprise tools that feed external content through AI models automatically.
Enterprise and on-premises deployment
For organisations running AI workflows over sensitive internal content, Uncovai's on-premises deployment option ensures that detection processing never leaves the organisational perimeter. Security teams can audit every URL and AI-generated text block that passes through their systems — with full data residency control under GDPR, NIS2, and DORA frameworks.
Until AI model providers implement robust mitigations for indirect prompt injection, the most reliable layer of protection is independent URL and content verification at the point of use. Uncovai's phishing URL detection and AI content detection provide that layer — running in real time, on any content, from any source.
The attack is novel. The detection principle is not. A phishing link is a phishing link whether it arrives in an email or inside a ChatGPT response. A manipulated AI output carries detectable statistical signatures regardless of how it was triggered. Uncovai analyses both — giving you a verification step that does not depend on the AI tool that was compromised to deliver the threat.
Frequently asked questions
Has ChatGPT been hacked?
No. ChatGPT itself has not been compromised. ChatGPhish is an indirect prompt injection attack — it exploits the way AI models process external content, not a vulnerability in ChatGPT's systems or OpenAI's infrastructure. The attack uses ChatGPT as a delivery mechanism by planting instructions in content that the model is asked to read. OpenAI's systems are not breached; the model is being manipulated through normal usage.
What is an indirect prompt injection attack?
An indirect prompt injection attack is a technique that embeds malicious instructions inside external content — a webpage, a document, an image caption — that an AI model is asked to process. When the model reads the content, it follows the embedded instructions as if they were legitimate commands, generating attacker-controlled output. Unlike direct prompt injection (where you manipulate the model through your own input), indirect injection attacks are invisible to the user because the instructions come from a third-party source.
How can I tell if a ChatGPT response has been injected?
In most cases, you cannot — which is precisely what makes this attack effective. There are no visual indicators in ChatGPT's interface that a response has been shaped by injected instructions. Behavioural red flags include: unexpected links appearing in a summary response, security alerts or account verification requests, QR codes in AI-generated content, and urgent calls to action that are unrelated to your original request. If any of these appear in a response to a summarisation task, treat the entire response as potentially compromised.
Is this vulnerability fixed yet?
There is no complete patch available at the time of writing. Indirect prompt injection is a structural challenge for large language models, not a simple software bug. Mitigations require a combination of model-level defences (better separation between data and instructions), platform-level controls (sandboxing external content), and user-level behaviour changes (not clicking links or scanning QR codes from AI summaries of external content).
Does this affect other AI tools, not just ChatGPT?
Yes. Any AI model that processes external content — Gemini, Copilot, Claude, Perplexity, AI browser extensions — is theoretically vulnerable to indirect prompt injection. ChatGPhish is the first widely documented exploitation of this technique in a consumer AI context, but the underlying vulnerability applies across the entire category of AI tools that read web content.
ChatGPT is not the vulnerability. Trust is.
ChatGPhish works because users trust AI output. That trust is rational — AI tools are genuinely useful and generally reliable. But indirect prompt injection exploits exactly that trust, using the reliability of the tool as the attack's primary mechanism. Until robust mitigations exist at the model level, the practical defence is an independent verification layer. Thousands of users already rely on Uncovai's phishing link detector and AI content detection to verify what AI tools are showing them — so they are never relying solely on the tool that may have been turned against them.
Stay Ahead of AI Threats with Uncovai →
