So, if I was an evil user, I’d create a 3
megabyte HTML page, and I’d “trackback” your
site every second. Or, I could have my zombie
web farm send you a bunch of trackbacks,
hundreds per second, pointing to garbage URLs.
Perhaps, but it might take a long time for any trackback spammers get to that point. Not to mention that it would affect their bandwdith costs.
This would be easy to circumvent. Only grab the first X KB of data from a blog. If the link is not there, so be it.
Love the blog, Jeff, it’s a daily stop for me, but, when I went to read your post today the new, larger fonts slapped me in the eyeballs! Why such a jump? Just for my two cents it makes it less readable than before. I feel like I’m in the large print section of my local library.
After blogger B spent the time and effort to write an entire post to reply to blogger A’s post, he could also post a comment at A’s with a two-line summary of the reply and a link to B. It’s negligible extra work compared with writing the reply.
Hmm… random specification for a track-back system, eh? Well, I tend to randomly come up with specifications. I’ve come up with about a dozen over the years. Email clients that allow threading on messages (closer in functionality to a web forum than an email) and participation from other people, messaging clients that allow exchange of small widgets and effectively allow people to “code” together (can you imagine two people collaborating over the same piece of code real-time looking at the same information and both able to modify the code live? I can.) And even a new protocol for a file-sharing system that someone later wound up coming up with as well that became know as Bit.Torrent (his idea was way better, though, because it broke the file into chunks. Mine was just whole-sale the entire file across the network, sharable from anyone. Sort of a queue-based system where you just downloaded from the first available person. I got the idea after Hotline sucked so much.)
Maybe a track-back protocol wouldn’t be that hard to invision.
Marius, by asking the poster to wait for a response and then manually decode the image you are negating one of the advantages of track back - speed and ease of use.
Granted, it’s slightly quicker to wait, then read a word. However, it doesn’t take much more effort on the persons part to just copy, paste and post a link, and then both servers have less work to do.
An absolute travesty? Really? […] so too did the first version of TrackBack have shortcomings.
That’s fine, but why is the latest version of the Trackback spec two and a half years old? You’d think a small, nimble company like Six Apart could do better than the W3C, but I guess not.
The fact that there’s been zero update to the trackback spec in the last 2 1/2 years to address the ongoing epidemic of trackback spam is, indeed, an absolute travesty. Really.
There are millions of blogs now, so all they’d have to do is send you a permalink to every blog post ever created and bang, they’ve gotten you to DOS yourself by leveraging the small number of bytes they sent you by whatever number of bytes you decide to read from the linked post
Exactly. It makes an inverted DOS attack trivial to mount-- your server is doing all the work!
The problem I find with trackbacks, is when you’re browsing a blog, and see a trackback in the comments, it’s often a quote from the blog entry you’ve just read. And if you follow the link it goes to someone elses blog, where there’s just a link back to the blog you came from with that quote as the text - and no extra comment.
I guess it’s different if you’re the blog author involved.
An evil user can use a post verification system to DOS you even if you only read 100 bytes from every post to verify. They’d just create a distributed DOS with a large number of posts to hit your trackback URL. They don’t even have to be their own posts. There are millions of blogs now, so all they’d have to do is send you a permalink to every blog post ever created and bang, they’ve gotten you to DOS yourself by leveraging the small number of bytes they sent you by whatever number of bytes you decide to read from the linked post.
So far all I’ve been able to come up with is a system that is (presumably PHP) script based, where basically the ping back is received by a script that then takes the URL from the referrer and fetches the first page from the referring URL. It then parses the entire HTML document it’s fed, looking for an exact duplicate of the original post’s URL. If it finds it, the site is accepted. If it does not find it, the site is rejected.
The pros to this are that it forces them to actually link to you on a page. The cons are, well, they’re just large. It’s so easy to defeat that it’s not funny.
There are additional safety features you could embed in the parsing script, where while parsing for the originating URL the script also looks for meta refreshes that would take the user away from that page and also potentially malicious java-script. But the drawback there is that the script may deny a genuine blogger. I mean, where do you draw the line?
Thus far, I can’t think of any method that wouldn’t be cracked almost immediately. Basically, the nature of links on the web are too fluid. Anyone can link to anything, and the only way to weed out the good from the bad is by hand. There’s already been a ton of proof that filters are imperfect and can only handle so much before they’re bypassed. There’s always some clever monkey.
I will say that the track-back service you have above reminds me a great deal of the old Web Rings that used to exist… I always knew that tech would make its way back in, it was just a matter of time…
It doesn’t look like the Technorati link is working. It says ‘6 blog reactions’, but when I click on the link is shows no results. Trying the same search with Google Blogsearch:
There like those jerks from nigeria that keep sending me emails on how I’ve wone $20,000,000 and all I have to do is sent them my lifes savings to claim it.
They should all have their hands cut off before their shot…
Regarding the DDOS attack, certainly that’s a worry for the large sites. But that’s hardly a worry for the great majority of blogs.
Just like my Invisible CAPTCHA control. It should be trivially easy for someone to break it, but have they? No. It won’t happen until it makes monetary sense for them to.
Speaking of external dependencies, why choose Technorati over Akismet? You’ve mentioned the problem of having to review for false positives. I say don’t. If a few accidentally get caught by the filter and don’t show up, so be it. That still seems more accurate than Technorati. How many blogs that reference your posts are being missed by Technorati?
But that’s hardly a worry for the great majority of blogs.
I’m sure that’s exactly what they thought about trackbacks originally.
How many blogs that reference your posts are being missed by Technorati
But there were also blogs that referenced my posts that never sent trackbacks, either. Google search would be best, but I can’t scope the linking query the way I want. Plus I get weird hits on forums, and other oddball places that aren’t conversational. Technorati, for all its warts, is very blog-focused, unlike a generic Google search. And Google blog search is even worse…
The potential for spam was already well known when trackback was specified, but there wasn’t a critical mass necessary for spammers to exploit it. It would have died an early death if the bloggers proactively spammed trackback before it’s widely deployed (?)(!!).
It would have spawned protective features like link condoms before trackback spam got out of hand.
Jeff Atwood:
when a trackback ping comes in, I read the putative trackback URL
and look for a link to my post there. No link, no trackback. (This
was Nikhil’s idea, to be clear.)
This is a reasonable idea, but it doesn’t scale. Furthermore, it
could easily become a huge DDOS (distributed denial of service)
vector. The last time I checked, I was getting 75 spam trackbacks
I submit that, rather than the original commenter’s idea not scaling, your mental implementation doesn’t scale.
The implementation you’re imagining is as naive as the original trackback specification. A validation algorithm would need some degree of intelligence in terms of remembering and rating URLs (to prevent flooding the same URL) and paying attention to page size.
Defensive coding would make this a workable solution.