4. Provide a compressed archive of the data the scrapers want and make it availa...

FuzzyDunlop · on May 17, 2012

One might argue that indexing from a data-dump will lead to search results that are only as up to date as the last dump.

In StackExchange's case, most of these are now a week or more old.

Maybe it's a good idea, but I'm not sure how many would want to dump their data on a daily basis to keep Google updated, when Google can quite easily crawl their sites as and when it needs to.

barbazfoo12 · on May 17, 2012

Have you considered rysnc? Dropbox uses it. So lots of people who don't even know what rsync is are now using it. We could all be using it for much more than just Dropbox. And if you have ever used gzip on html you know how well it compresses. The savings are quite substantial. Do you think most browsers are normally requesting compressed html?

minikomi · on May 17, 2012

It could be /data.zip like /robots.txt