CASE STUDY | TECHNICAL SEO | MALWARE RECOVERY

Eliminating 20,000 malicious URLs

How I diagnosed, contained, and reversed a malware-driven index pollution on a WordPress site, using a deliberate four-phase sequence.

20,000+

Fraudulent URLs indexed at peak

138,000+

Additional URLs blocked before indexing

Malicious URLs remaining at close

CONTEXT

The problem that wasn't visible until it was

The site is a personal project, a travel agency I built and manage on WordPress. After migrating from a local development environment to a live hosting provider, I noticed anomalies in Google Search Console, a sudden spike in indexed pages far beyond the actual content of the site.

The source turned out to be a compromised PHP file from another project hosted on the same shared server. The infection had spread silently, generating thousands of spam URLs that Google had begun crawling and indexing, pages that had nothing to do with the site and that, left unchecked, would have caused lasting reputational and ranking damage.

20,000+

INDEXED AT DISCOVERY

Fraudulent spam URLs appearing in Google’s index under my domain

100,000+

QUEDED FOR INDEXING

Additional malicious URLs pending in Search Console, not yet indexed but actively being crawled

DIAGNOSIS

Finding the source before fixing the symptoms

The first step was not to start submitting removal requests. That would have been treating the symptom while the source continued generating new URLs. The priority was locating and eliminating the infection before anything else.

The instinct is to act immediately on what’s visible in Search Console. But submitting removal requests against an active infection is whack-a-mole, for every URL removed, new ones are generated. Containment had to come first, deindexing second.

I ran a deep manual inspection of the server file structure alongside Wordfence, a WordPress security plugin, to scan for modified or injected files. After a couple days of methodical investigation, I located the compromised PHP file. Once identified, I could map the full scope of the damage and plan the recovery sequence.

What the infection had done
The malicious file was generating spam URLs following three distinct root paths. All 20,000+ indexed URLs and the 100,000+ queued fell into one of these three patterns, which meant a targeted blocking approach was viable rather than URL-by-URL removal.
Scope of server exposure
The same shared hosting account contained other projects. A full server inspection revealed additional infected files across those projects, all of which needed to be addressed as part of the same remediation, not just the primary site.

STRATEGY APPROACH

Why sequencing mattered more than speed

The recovery required four distinct phases executed in a specific order. Doing them out of sequence, for example, starting deindexing before containment, would have wasted effort and potentially prolonged the damage.

Phase order: Contain → Block → Restrict → Accelerate. Each phase was a prerequisite for the next. Skipping or reordering any of them would have made the subsequent steps less effective or entirely ineffective.

This wasn’t a complex problem technically, it was a problem that required correct diagnosis and disciplined prioritization. The tools involved were all standard: .htaccess, robots.txt, Search Console, and XML sitemaps. The outcome depended entirely on the sequence in which they were used.

RECOVERY SEQUENCE

Four phases, executed in order

Phase 1: Containment, eliminate the source
Deleted the compromised PHP file. Inspected and cleaned all other projects on the same hosting account. Changed all FTP, hosting panel, and database credentials. Activated daily automated security scanning via plugin. Until this step was complete, no other action made sense. I manually submitted close to 200 removal requests in Search Console, targeting key examples of the spam URLs. This served two purposes: reduce visibility of the malicious content and alert Google that something was off. [Pictures 1]
Phase 2: Server-level blocking, 410 Gone directives
Since all malicious URLs followed just three root paths, I applied 410 Gone directives in .htaccess for each path. A 410 response tells search engines the content is permanently gone, stronger than a 404, and the correct signal for content that should never return.
Phase 3: Crawl restriction, robots.txt reinforcement
Added Disallow directives in robots.txt for the same three paths. This prevented Googlebot from crawling the remaining queued URLs while the 410 signals propagated, reducing the volume of new crawl activity against URLs that would never contain legitimate content.
Phase 4: Accelerated deindexing, forced recrawl via sitemap
Once the indexed URL count had dropped from 20,000+ to under a few hundred through natural deindexing, I created a dedicated XML sitemap containing only the remaining malicious URLs, then temporarily removed the robots.txt Disallow directives. This forced Googlebot to recrawl those specific URLs immediately, encounter the 410 responses, and remove them from the index faster than passive deindexing would allow [Picture 2]. Once complete, the Disallow directives were reinstated.

RECOVERY SEQUENCE

Four phases, executed in order

Down from over 20,000 indexed at peak. The remaining 8 are in the final deindexing queue after 1-2 months wornking on it. The 138,000+ URLs that were blocked before indexing never became a live problem, they were contained at Phase 1 and never required individual remediation.

Today, the remaining URLs is 0 and the URLs blocked are 888 [Picture 3].

TIME TO CONTAINMENT

Days

Source identified and eliminated within days of discovery

INDEXED REDUCTION

99.9%

From 20,000+ fraudulent URLs indexed to 8 remaining

TAKEWAYS

What this demonstrates

Security and SEO are the same problem at the infrastructure level. A compromised server doesn’t create a security incident and an SEO problem separately, it creates one compounding problem that worsens daily while undetected. Treating them as separate disciplines would have delayed the diagnosis and extended the damage window.

Sequence over speed. The temptation in a crisis is to act immediately on what’s visible. Submitting removal requests against an active infection would have produced no lasting result. The correct move was containment first, then systematic deindexing, in that order. Reversing the sequence would have made every subsequent step less effective.

Search Console as a diagnostic tool, not just a reporting tool. The index anomaly, a sudden spike in indexed pages, was detectable through GSC patterns before the malicious pages were even visible in normal browsing. Regular monitoring of index health, not just traffic and rankings, is what made early detection possible and kept the damage from becoming unrecoverable.

Eliminating 20,000 malicious URLs

The problem that wasn't visible until it was

20,000+

100,000+

Finding the source before fixing the symptoms

What the infection had done

Scope of server exposure

Why sequencing mattered more than speed

Four phases, executed in order

Phase 1: Containment, eliminate the source

Phase 2: Server-level blocking, 410 Gone directives

Phase 3: Crawl restriction, robots.txt reinforcement

Phase 4: Accelerated deindexing, forced recrawl via sitemap

Four phases, executed in order

Days

99.9%

What this demonstrates

Contact Me

Ontario | Open to agency, in-house, and remote roles

Links

About me

Projects