jump to navigation

Identifying comment spam April 28, 2006

Posted by curtmonash in Uncategorized.
2 comments

It's no problem to make up a word list that catches half or more of all spam without a major false positives problem. Just think of the things people most commonly advertise through spam — porn, medication, loans. Porn and medication words are unlikely to give false positives, although loan words are more of a problem. And some of the worst offender sites can get added to the list manually.

Well, actually, on some of my blogs one might explicitly discuss the problem of spam, in which case any word could show up in the comments. Whoops.

Anyhow, that problem even aside, a large minority of spam can't easily be identify with keyword/keyphrase filters. They're just a bland friendly sentence or two (changing in form often enough that you can't catch even the majority of them with keyphrase filters, although I've gotten some mileage out of filtering on "nice blog" and the like). Their purpose is simply to put a URL onto your site.

And thus the key to filtering comment spam is, in even a more extreme form, the same as the key to filtering email spam — filter on the "call to action" (usually a URL). Whether it's to click on a URL, buy a stock, or whatever, almost all spammers want you to do something specific. The part of the spam in which they describe precisely what that is the part that is sufficiently invariant to make effective filtering possible.

I hope (and believe) that's part of what Aksimet is doing. But I gotta say this — based on the Monash Report, which is one of the blogs I've turned it on for, it definitely lets pretty obvious spam through. I mean, c'mon now — just how many comments from bettingonhorses.org is it going to take before Aksimet starts filtering them out??

On the plus side, Akismet does capture half or so of my spam so far, in a small sample size, and I've decided safe to turn on even on my busiest blog. The problem is that it tells you how much new spam there is, and also how much old spam, and doesn't make it clear whether it grabbed old "spam" just from your already-deleted-comments file.

However, a web search doesn't show people screaming about a real problem, so there probably isn't one. What's more, the UI text is a little more comforting when it has both old and new spam in the queue than just old spam. So, as I said, I've now turned it on even on my busiest blog.

By the way, that failed search for complaints did turn up a couple of interesting things. One is pretty much the original Akismet-popularizing blog post. The other contains a long list of "social engineering" comments appearing to be real, an all-too-high fraction of which are quite familiar to me already.

Different themes April 26, 2006

Posted by curtmonash in Uncategorized.
add a comment

One thing I’m using this blog for is to experiment with different themes. On my main blogs, set up in WordPress 1.5, I went with all the same theme — Sharepoint-like, which I lightly customized. Mainly, I selected it for its efficient use of space. And nested categories; at first, I wasn’t will to hack around that on my own. And I like the colors; blue/yellow/white is what decorated my room when I was growing up, and were major colors in my favorite New York City apartment as well.

But if I keep it I need to fiddle with the colors a little; that yellow text is pretty unreadable where used. And I’ve found I make a lot of long posts, so a theme that automatically truncated them on the first page (with suitable links to the whole thing) might be good for me. Like Hemingway, the theme I’m playing with here at the moment I write this. Another nice thing about Hemingway is that the About part seems to just be an editable page, rather than being hardwired into the code the way it is in Sharepoint-like. (I just deleted it altogether, put that info into initial blog posts, and then made one of my post categories “About This Blog”. Not the worst idea, if I may say so myself …) And yet another one is that there’s a convient place for the titles of recent posts, something that for some reason doesn’t wind up being in the themes I otherwise prefer.

No one theme ever exactly suits me. I foresee a little more hacking in my future …

WordPress and Akismet weirdness April 26, 2006

Posted by curtmonash in Uncategorized.
add a comment

I had no intention of starting yet another blog. However, in order to activate the much-needed Akismet antispam feature, I seemingly was forced to sign up for this account.

So it occurs to me — I might as well use it to document my dealings with WordPress and similar software, when I am in the mood to write on such subjects.

Come to think of it, listing the series of steps that led me here is too tedious. Suffice it to say that setup of the Aksimet plug-in is — well, it's weird.

One interesting thing is that I learned deleted old comments aren't really deleted. The "last 15 days of Aksimet spam" on DBMS2.com turned out to include 626 items, almost all of which I'd already deleted, most of which were more than 15 days old.

What's particularly weird is that the system helpfully suggests that since a mass delete can't be undone, I should check all 626 items to make sure I want to delete them — and then doesn't provide any visible way of actually checking more than the first 150 of them. That is, and I use the term advisedly, an interesting design choice.

Another odd design choice, although this may just be a matter of difficulty in coding, is this: There's no way to Mass Edit spam that Akismet missed, and mark it as spam. I currently have 20 spam comments that Aksimet missed on the blog. I can Mass Delete them in one go, or I can tell Akismet about all 20 of them, one after the other. Interesting choice on my part; just how public-minded am I?

Actually, I just went back and deactivated Akismet on DBMS2.com. A pity. I need something like that, and maybe after I find discussion forums on it I'll feel more secure with it and turn it back on, but at the moment it seems just too risky to use exactly where it is most needed — on an active blog getting inundated with comment spam.

But I left it turned on for now on the other three.

Mass Edit still REALLY needs the feature "Mark as Spam"