A face-search engine is not searching the live internet when someone uploads a photo of you. It's searching its own pre-built database — a private copy of faces it harvested ahead of time and indexed for instant lookup. The whole business depends on having already collected you before anyone searches. Here's how that collection actually happens, stage by stage.
Stage 1 — Crawling: where the images come from
Engines run web crawlers that pull images from the parts of the internet that are publicly reachable. The richest sources, roughly in order of value to them:
- Public social media. Open profiles, public posts, and anything not locked down. Profile photos are gold — frontal, well-lit, labeled with a name.
- Other people's posts of you. Tagged photos, group shots, event galleries. You don't control these, which is why a private account doesn't make you safe.
- Professional and institutional pages. Company "team" pages, conference speaker lists, university directories, bar-association and licensing profiles, news articles. Your LinkedIn headshot is among the most-scraped photos of you.
- News, blogs, and forums. Any site that published a photo with your face — a local paper, a race-results page, a hobby forum avatar.
- Aggregated and resold image sets. Some engines bootstrap from large existing image datasets rather than crawling everything from scratch.
Crucially, the crawler doesn't need your permission and doesn't ask. If an image is reachable by a public URL, it's collectible.
Stage 2 — Face detection: finding the faces in the images
A crawled image is just pixels. The engine runs a face-detection model over every image to locate the faces in it — drawing a box around each one and cropping it out. A single group photo can yield a dozen separate face crops, each treated as its own candidate. This is the step that turns "a photo that happens to contain you" into "a record about your face specifically."
Stage 3 — Embedding: turning your face into a faceprint
This is the heart of it. Each cropped face is fed through a neural network that outputs a faceprint — a vector of numbers (a "face embedding") that encodes the geometry of your face. Two photos of the same person produce two faceprints that sit very close together in that mathematical space; two different people sit far apart.
Stage 4 — Indexing: making billions of faceprints instantly searchable
Storing faceprints isn't enough; the engine has to find matches in milliseconds across potentially billions of records. It builds a specialized index (a vector / nearest-neighbor index) so that when someone uploads a query photo, the engine can compute that photo's faceprint and near-instantly return the stored faces closest to it — along with the source URLs where those faces were found.
Stage 5 — The lookup: a photo becomes your identity
When someone runs a search, the engine: (1) detects the face in their uploaded photo, (2) embeds it into a faceprint, (3) finds the nearest stored faceprints, and (4) returns the matches plus the web pages they came from. Those source pages are what carry your name, your accounts, and your context. The face is the key; the source URLs are the payoff. This is the de-anonymization chain in action.
Why "just delete the photo" is whack-a-mole
Now the pipeline makes the futility obvious:
- They already have a copy and a faceprint. Deleting your source post doesn't reach into their database. The faceprint they extracted persists independently — see why deleted photos still show up.
- You don't control most of the inputs. Even a perfectly scrubbed personal presence leaves the tagged photos, the event galleries, and the institutional pages other people posted.
- They recrawl continuously. The web keeps producing new images of you, and the crawlers keep coming back. A successful removal today can be undone by a freshly scraped photo next month.
That last point is the one people miss. Because collection is continuous, removal can't be a one-time act — it has to be a standing process that re-checks and re-files whenever you reappear.
What removal actually targets
Given the pipeline, getting "out" means getting your faceprint deleted from each engine's database — not just hiding a photo. Each engine has an opt-out / erasure process (usually requiring a reference photo so they can scope the deletion to your faces). We identify which engines hold you, file those requests, fight the rejections, and keep monitoring so the next recrawl doesn't quietly put you back.
The searchable thing isn't your photo — it's the faceprint they built from it.
We find the engines that hold a faceprint of you, file the erasure requests, handle rejections, and keep monitoring as the crawlers come back around. That's what the pipeline requires.
Start your removals →