Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The "unintended side effect of filtering spam" explanation makes no sense.

How likely is it that the same company which builds bleeding-edge machine-learning systems to track and predict our behavior online, and which uses these AI predictions constantly to maximize their ad revenue, somehow cannot find a better way to filter out spam invites?

How likely is it that the same company that houses the likes of Hinton, Norvig and Kurzweil under the same roof can't find a better way?

Google is packed with experts at solving the "spam filtering" (i.e., pattern recognition) problem.

It appears this was done on purpose[1], driven by a corporate culture that no longer cares as much about openness. [Please read jholman's responses below. He's right, I went too far with this last sentence.]

--

[1] http://mail.jabber.org/pipermail/operators/2013-February/001...

--

Edits: added "it appears" at the end, to tone down the language. Also, reworded and added sentences to make my point clearer, and corrected text to refer to invites, not messages (thanks for pointing that out, mdc!) and point out that this was indeed done on purpose.



I don't know what's going on, and I prefer to trust the FSF, but I notice that they are wilfully misrepresenting the statement by pergu@google on the operators@xmpp list, and so are you, cs702.

Per said, a month ago, "is there anything you can do about it in that case, otherwise we will have to institute very tight limits of invites per day being sent from federated domains", speaking about specific domains, and speculating about a possible future strategy.

FSF says (paraphrasing): "we have this symptom, we're convinced it's for this technical cause... " (so far so good) " ... "and this email thread says that Google is doing it on purpose". Bullshit, that thread says nothing of the kind. Stick to what you know.


jholman: could you imagine Google doing the same thing for email? ("We will have to institute very tight limits of emails per day being sent from external domains.")


Please go back and read my comment again.

My point is not that this is okay [0]. My point is that FSF is claiming "Google is doing this on purpose and this link says so", and YOU said "this link says so", and that's not what that links says. So stop making shit up. Stick to what you know, and/or to your opinions.

FSF could have avoided my complaint by changing "According to this thread, Google is doing this on purpose" to "Based on this thread we're guessing Google may be doing this on purpose". You could have avoided my complaint by leaving your whole last sentence off.

[0] Supposing that this IS what's happening, I'm not defending it (nor opposing it, I'm not forming an opinion). But as a side note, my understanding is that nearly all email providers DO do this thing for email; if a given domain is spamming hard enough, eventually email providers start dropping mail silently. Don't they?


It also says that they were considering limiting invites per domain per day, which is quite a different thing than blocking all external invites, and it actually isn't all that different from what is done with email from domains that tend to send out a lot of spam.

However, the problems with this solution are a) that it doesn't discriminate between domains with a lot of traffic and domains with a lot of spammy traffic (though, again, that's just speculation and maybe they do have that data), and b) there's no message to the user that is being ignored. I don't know if jabber supports rejection messages, but it would be much better if they could get a message that let them know why they were being turned down so they could pressure their chat server operator to reign in the spam accounts.

It appears from the FSF post that they've actually tested invite requests, though, so presumably they are rejecting all invites, which makes the earlier email even less relevant as evidence of what is going on now. Surely the FSF has a contact at Google?


I agree.

It's really unclear what's going on, actually.

We don't know exactly what the FSF is seeing, nor if they did fair tests, nor if they're reporting fairly (though I tend to assume so, tentatively).

We don't know if Google did something deliberately, nor who did it. We don't know how it relates to Per's mailling list message, if at all. If there was action, we don't know if it's an experiment, or a bug, or who it affects, or how often.

More generally, it doesn't seem like this is an urgent issue that requires panicking. The world can get along fine for a month without people being able to invite gmail users. But, on the other hand, if the worst case is true, that Google has started blocking invites from ALL other domains, ALL the time, then I think the FSF's political position is sound.

So far, though, this really isn't like Reader.


Sure the world may get by, but federated IM should be the future of online communications, possibly even more so than email. People love to chat online, SMS, iMessage, etc. But these systems shouldn't be so fragmented into isolated little islands. Apple's iMessage shouldn't even exist and should just redirect people to an XMPP service, while SMS should just be seen as a legacy.


I agree, which is why I said "If <blah blah blah>, then I think the FSF's political position is sound."


jholman: after reading everything again and thinking about it, you're right, I went too far with my last sentence. I added a note to my comment. Thank you.


I have spam-filtered email addresses through 3 different email providers, one of them being Google.

Only Google manages to constantly produce falls positives (including mail from, sweet irony, Google services like Analytics) and regularly allow spam and phishing mails through.

The other two, ran by relatively small providers, are nearly perfect.

Don't overestimate Google.

(The same "intelligent" Google also seems to be unable to figure out which language I use, despite me telling them on a regular basis.)


How many spammy emails do you get at each address? You seem to be measuring the absolute rate of false negatives/positives, but not measuring the relative rate.


I have no numbers, but the non-Google addresses are much older (one dating back to the mid-90's) and have been liberally strewn around the internet for well over a decade.

They both get several times more spam than the much more recent business-only Google address, yet if their filtering lets through one per month it's a lot. I can't even remember the last false positive.

The majority of the mail that ends up in my Google spambox consists of legitimate email from reputable sources (Amazon, Facebook, Google itself), and barely any actually spam. Extra annoying: perfectly fine email from our own services regularly gets flagged as spam by Google, and we often have no f-ing clue why.

And don't get me started on Google Groups spam filter, which for some reason is even worse. I have to turn it off for any group-address I want to make accessible to non-members.


They may be much older, but if your Google address is @gmail then I guarantee you it gets far more spam, just from scattershot spammers.

As an anecdotal example, I have an email address that's been strewn about the internet for almost 2 decades. It's currently hosted on Google Apps, but it has a non-Google domain. I get a spam message in my inbox maybe once every couple of months.

I also have a gmail address. I never use the thing. But it gets inundated with spam, and every month or so when I go look at it I have to clean lots of junk out of the inbox.

Since they're both hosted by Google, I'm forced to conclude that the gmail one gets many orders of magnitude more spam merely by virtue of ending in @gmail.com.


It's not an @gmail.com address.

And why do people keep making excuses for Google?

Google simply isn't very good at filtering spam, something most regular ISP's can handle perfectly well, and the lack of options in Gmail and Groups clearly show that they don't care very much about it either.


Fun fact, Google doesn't play nicely with ESPs either. Most mail providers (Yahoo, Hotmail, etc) use a feedback loop when you report spam. After you mash the spam button, the ESP that sent the mail is notified that a particular email was flagged as spam.

This allows the ESP to curtail spam problems on their end (for example, Mailchimp heavily throttles your emails or outright bans you if your spam rate creeps past a very low percentage). It's an all-around good thing for the ecosystem, with the exception perhaps of publishers that get the unlucky "spam instead of unsubscribe" user action.

But Gmail does not participate in this loop. They don't tell any ESP that a user has marked an email as spam...that data all stays in house. Why? Hell if I know - perhaps they don't want to tip off spammers to being detected. On the flipside, reputable ESPs get less leverage on spammers in their network.


because it would allow spam houses to train their software to avoid the Google spam filter. No feedback loop makes it harder to train (not impossible, just harder)


My experience has been the opposite. I get very few spams per month in gmail, but other mail providers I've used had done far worse spam filtering (walla mail, yahoo mail, netvision).


Every day I get hit by about 100k connections for email to my domain (0x58.com), the spammers hit <random>@0x58.com. So far they haven't hit a single actual email address that exists.

Out of those 100k connections, one or two emails come through to my valid email account. So yes, scattershot makes sense, but from looking at my logs, unless your account includes a lot of numbers you aren't going to get hit :P


I have the opposite anecdote; which is that all of my email addresses redirect to my Gmail, and I don't recall a false positive for at least 6 months. And in the past they have usually been things like activation emails.


I get a maximum of one spam mail/month. The rest is all filtered by google. False positives happen but weirdly enough the only thing I repeatedly see there are plus.google.com notifications…


What other e-mail providers do you use, if you don't mind? A primary reason I use Google Apps mail is for the spam filtering.


Spam filtering by now is pretty trivial if you don’t mind being an ass about standards - require valid HELOs, valid hostnames, valid PTR records and valid A/AAAA records for these PTR records and hostnames listing the connecting IP and you will hardly get any spam.

Add to this ‘temporary addresses’[0] and train your spam filter on everything send to invalid such addresses and basically nothing gets through.

[0] I use a scheme where my website and mailing list addresses are of the form claudius_YYMM@example.com. Mails to these addresses are marked as spam after the 15th of MM+1 and before the 15th of MM-1. Obviously only works for mailing lists if they’re open to non-subscribers as well.


I've found this to be untrue in my experience filtering spam. Being a hard ass drops a lot of legitimate email, and a lot of spammers follow the RFCs beautifully now.


Surely YMMV, but so I only observed two problems: Daily wikipedia articles when delivered via IPv6 (IPv4 worked) and StackExchange while on the west coast due to one of these nice hurricanes.


Maybe Hinton didn't make any progress on it yesterday so they decided to call it quits?

It does suck that they did this. I have been hit with many spam requests through Google Talk recently though. You should at least be able to whitelist people in your address book or something. Yeah, you can still send them an invite, but what if the third party's service adopted the same policy as Google?

They should at least let you opt in to requests.


The article mentions blocking invites, not messages themselves, so there's not really any content to use as a basis for spam filtering.


A good starting point for that would be to allow invites from people you have already added to your contact list, i.e. X@example.com authorised Y@gmail.com to get status updates and Y added them to their contact list, but authorisation requests from X to Y still appear to be dropped.


This makes perfect sense to me. Chat is an important part of a service that my company provides to our users. We received great response after launching "gtalk integration" for chat until some new users started reporting problems due to this issue. We tried the other way around ie. having them send an invite to us but sadly that doesn't work as well. Hope google comes up with a better solution soon.


You could allow invites, then use subsequent content to determine spam vs non-spam. Block content when its spam and notify the user, blacklist the JID where it came from and eventually domains where there is a high proportion of spam. Also allow users to report spammers. You could possibly even get clever and learn to recognise patterns in the JIDs and domains chosen by spammers, but this is bound to block legitimate content as well.

You could perhaps increase the requirements for sending invites, such as having the recipients server send a CAPTCHA, although spammers seem to be able to get around CAPTCHAs anyway. Perhaps there would be some other solutions that I haven't thought of.


> How likely is it that the same company which builds bleeding-edge machine-learning systems to track and predict our behavior online, and which uses these AI predictions constantly to maximize their ad revenue, somehow cannot find a better way to filter out spam invites?

Spam detection algorithms, even Google's, are rarely perfect. They probably came to conclusion that blocking foreign invites is a good tradeoff. We don't have enough information to evaluate if it was a good tradeoff. I think it was. I was recently getting lot of spammy chat invites (chat bots that tried to convince me to do an online payment for something), it was quite annoying.


Honestly, it makes perfect sense to me - most of the spam I get through non-google instant messenger accounts is in the form of invite spam, and if they're seeing a massive spike of that then temporarily blocking them isn't entirely unreasonable.

So long as it's temporary (and given you can request whitelisting in the meantime), I don't really see the problem.


So if it's not about spam, what is it about? Trying to get everyone to use Google Talk? Do enough people use other Jabber servers to have the tiniest effect on Google's business?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: