CASE STUDY | TECHNICAL SEO | MALWARE RECOVERY

Eliminating 20,000 malicious URLs

How I diagnosed, contained, and reversed a malware-driven index pollution on a WordPress site, using a deliberate four-phase sequence.

20,000+

Fraudulent URLs indexed at peak

138,000+

Additional URLs blocked before indexing

0

Malicious URLs remaining at close

CONTEXT

The problem that wasn't visible until it was

The site is a personal project, a travel agency I built and manage on WordPress. After migrating from a local development environment to a live hosting provider, I noticed anomalies in Google Search Console, a sudden spike in indexed pages far beyond the actual content of the site.

The source turned out to be a compromised PHP file from another project hosted on the same shared server. The infection had spread silently, generating thousands of spam URLs that Google had begun crawling and indexing, pages that had nothing to do with the site and that, left unchecked, would have caused lasting reputational and ranking damage.

20,000+

INDEXED AT DISCOVERY

Fraudulent spam URLs appearing in Google’s index under my domain

100,000+

QUEDED FOR INDEXING

Additional malicious URLs pending in Search Console, not yet indexed but actively being crawled

DIAGNOSIS

Finding the source before fixing the symptoms

The first step was not to start submitting removal requests. That would have been treating the symptom while the source continued generating new URLs. The priority was locating and eliminating the infection before anything else.

The instinct is to act immediately on what’s visible in Search Console. But submitting removal requests against an active infection is whack-a-mole, for every URL removed, new ones are generated. Containment had to come first, deindexing second.

I ran a deep manual inspection of the server file structure alongside Wordfence, a WordPress security plugin, to scan for modified or injected files. After a couple days of methodical investigation, I located the compromised PHP file. Once identified, I could map the full scope of the damage and plan the recovery sequence.

  • What the infection had done

    The malicious file was generating spam URLs following three distinct root paths. All 20,000+ indexed URLs and the 100,000+ queued fell into one of these three patterns, which meant a targeted blocking approach was viable rather than URL-by-URL removal.

  • Scope of server exposure

    The same shared hosting account contained other projects. A full server inspection revealed additional infected files across those projects, all of which needed to be addressed as part of the same remediation, not just the primary site.

STRATEGY APPROACH

Why sequencing mattered more than speed

The recovery required four distinct phases executed in a specific order. Doing them out of sequence, for example, starting deindexing before containment, would have wasted effort and potentially prolonged the damage.

Phase order: Contain → Block → Restrict → Accelerate. Each phase was a prerequisite for the next. Skipping or reordering any of them would have made the subsequent steps less effective or entirely ineffective.

This wasn’t a complex problem technically, it was a problem that required correct diagnosis and disciplined prioritization. The tools involved were all standard: .htaccess, robots.txt, Search Console, and XML sitemaps. The outcome depended entirely on the sequence in which they were used.

RECOVERY SEQUENCE

Four phases, executed in order

RECOVERY SEQUENCE

Four phases, executed in order

8

Down from over 20,000 indexed at peak. The remaining 8 are in the final deindexing queue after 1-2 months wornking on it. The 138,000+ URLs that were blocked before indexing never became a live problem, they were contained at Phase 1 and never required individual remediation.

Today, the remaining URLs is 0 and the URLs blocked are 888 [Picture 3].

TIME TO CONTAINMENT

Days

Source identified and eliminated within days of discovery

INDEXED REDUCTION

99.9%

From 20,000+ fraudulent URLs indexed to 8 remaining

Pictures 1 - Dexindesed URLs
Picture 2 - Sitemap 410 removal
Picture 3 - Indexed pages

TAKEWAYS

What this demonstrates

  • Security and SEO are the same problem at the infrastructure level. A compromised server doesn’t create a security incident and an SEO problem separately, it creates one compounding problem that worsens daily while undetected. Treating them as separate disciplines would have delayed the diagnosis and extended the damage window.

 

  • Sequence over speed. The temptation in a crisis is to act immediately on what’s visible. Submitting removal requests against an active infection would have produced no lasting result. The correct move was containment first, then systematic deindexing, in that order. Reversing the sequence would have made every subsequent step less effective.

 

  • Search Console as a diagnostic tool, not just a reporting tool. The index anomaly, a sudden spike in indexed pages, was detectable through GSC patterns before the malicious pages were even visible in normal browsing. Regular monitoring of index health, not just traffic and rankings, is what made early detection possible and kept the damage from becoming unrecoverable.

Contact Me

Available from May 13 | Ontario | Open to agency, in-house, and remote roles

© 2025 Juanjo González