SPUDSTALKER

Anonymize Your HTML

Table of Contents

  1. Don't Use HTML
  2. Anonymizing Your HTML with HTML Tidy
  3. What About CSS?
  4. Conclusion

Your coding style says a lot about you.

Anybody who's talked to programmers know how petty this argument can get.

The same arguments and differing styles apply to XML, which HTML is very similar too.

These slight differences between different programmers can be a dead giveaway as to who's behind a certain website or article (assuming you're writing articles for somebody else's website in HTML and not something like markdown).

Don't believe me? Here's a link to another paper on a related subject: matching up programmers to their binaries using nothing but those binaries.

So, how do we prevent this? Read on.

Don't Use HTML

If you're using someting like a content management system, or posting on forums which (for some unbeknownst reason ) allow posting HTML as a configurable option, DON'T USE HTML.

Not only does it (usually) break things when you try to force your horrible inline styling on the content, it also allows other people to fingerprint you.

If you are generating your own pages (through whatever means), you can easily implement any number of existing libraries for handling markdown which will take some of your signature away... that is, unless you modify the library's functions to generate your style of HTML!

Anonymizing Your HTML with HTML Tidy

Tidy is a small utility which "cleans up" XML and HTML. If you're using HTML, either for generating templates or content, you should consider passing your documents through Tidy to remove any "traces" of your style.

While I enjoy looking at perfectly indented markup all day when editing either the template or my pages, it's still just HTML that most people will never see.

What About CSS?

CSS is a rather special case. While you can make the formatting look a certain way, the actual content of the stylesheet can give you away. There are several different ways to do anything in CSS, and certain tricks are sometimes rare. At the end of the day, try to leave as little of a signature as possible, avoid any clever tricks other than inheritance, and get over it.

Conclusion

While it is certainly almost pointless for this site, given that even the "Whois Privacy" registration offered by most domain registrars could be subject to subpoena, there are other use cases.

For instance, if you're posting on somebody else's site or a free hosting service properly registered for using Tor and fake emails, you should also make sure people can't apply these "simple" heuristics to your code to identify you from other projects/works.

Certainly, if you run a Tor service serving hand-edited static files, you should consider this heavily.