Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Public Apache Iceberg datasets via a REST catalog (googleblog.com)
13 points by talatuyarer 50 days ago | hide | past | favorite | 6 comments
Hi HN,

I’m one of the creators of this project.

We noticed that while many developers want to experiment with Apache Iceberg the "entry cost" is often high. You usually have to set up your own storage buckets, configure a catalog (like Hive), and ingest data before you can even run a single SELECT statement.

We wanted to lower that barrier. We’ve hosted a production-grade Iceberg REST Catalog on BigLake with public datasets (starting with the NYC Taxi data) that anyone can query.

You can point Spark, Trino, or Flink directly at the REST endpoint and start querying immediately.

You do need a Google Cloud Project ID for authentication/quota, but the data access itself is free and public.

I’d love to hear your thoughts. Are there specific datasets or Iceberg features you’d like to see added to the dataset?



This looks really useful. Is it possible to access the REST catalog and query the datasets directly from Python?

If so, do you have a minimal Python example?



Nice work, this really does lower the barrier to actually trying Iceberg instead of just reading about it. Looking forward to poking at it with Trino/Spark..


Nice way to lower the barrier to trying Iceberg!

Do you have any plans or timeline for supporting the Iceberg v3 spec?


Thank you.

Yes We have plan to publish Dataset for Apache V3 spec features such as Variant, Deletion Vector. I can update this comment when we have release date.


You should combine this with a Trino Docker config.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: