Hi HN,
I’m one of the creators of this project.
We noticed that while many developers want to experiment with Apache Iceberg the "entry cost" is often high. You usually have to set up your own storage buckets, configure a catalog (like Hive), and ingest data before you can even run a single SELECT statement.
We wanted to lower that barrier. We’ve hosted a production-grade Iceberg REST Catalog on BigLake with public datasets (starting with the NYC Taxi data) that anyone can query.
You can point Spark, Trino, or Flink directly at the REST endpoint and start querying immediately.
You do need a Google Cloud Project ID for authentication/quota, but the data access itself is free and public.
I’d love to hear your thoughts. Are there specific datasets or Iceberg features you’d like to see added to the dataset?
If so, do you have a minimal Python example?