Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The vast majority of open source "big-data" infrastructure is in Java (Hadoop, HBase, Cassandra, Solr, Elastic Search etc). It works pretty well.

I'm not sure what your question is, but I've experimented with loading netflow data in Solr and I'm averaging sub-2 second query times. That's on a laptop, with a couple of minutes of netflow (around 10Gb).

With proper indexing your search response time shouldn't increase lineally with your data size.



loading 10gb on a traditional hdd takes more than 2s (that's 5gb/s read speed. Nice hard drive.). your data is either in ram and you've a lot of ram, either, it's just not 2s, or its not a 10gb index.

And i'm talking 100gb+ indexes ;-)

Obviously 2min of netflow data ain't much. I would want to see the result over 200h (or more) of netflow data, for example


No, querying the data takes less than 2 seconds. I can't remember the load time.

Obviously 2min of netflow data ain't much

Depends where you work...

I just checked, and it was 2Gb of netflow I tested on. That seemed small, so I looked a bit deeper and indeed I was only using a small fraction of our total netflow for that period. Tt was adequate for what I was trying, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: