I wrote a simple queue implementation after reading the Turbopuffer blog on queues on S3. In my implementation, I wrote complete sqlite files to S3 on every enqueue/dequeue/act. it used the previous E-Tag for Compare-And-Set.
The experiment and back-of-the-envelope calculations show that it can only support ~ 5 jobs/sec. The only major factor to increase throughput is to increase the size of group commits.
I dont think shipping CDC instead of whole sqlite files will change the calculations as the number of writes mattered in this experiment.
So yes, the number of writes (min. of 3) can support very low throughputs.
I’m building an open-source project to reduce GitHub Actions CI costs by running jobs on self-hosted runners on owned hardware.
The motivation is to fill the gap between local workflow execution by projects like https://github.com/nektos/act and self-hosted runner setups on the cloud.
My team’s requirements are simple and we don’t require all the features. We hope to keep ops simple and save costs. Any efficiency boost due to caching will be A bonus
I am creating a couple of open source tools for data governance. The first one is a data catalog (1) with tags for PII data. The second one is a data lineage application (2). The goal is to keep these as simple as possible to install and use.
IMO the current options are too complicated or expensive and appropriate for the largest companies. I cannot hack a simple application for data discovery or usage statistics. So I am building a dead simple data catalog that I can reuse. The data lineage app is the first app on it.
The experiment and back-of-the-envelope calculations show that it can only support ~ 5 jobs/sec. The only major factor to increase throughput is to increase the size of group commits.
I dont think shipping CDC instead of whole sqlite files will change the calculations as the number of writes mattered in this experiment.
So yes, the number of writes (min. of 3) can support very low throughputs.
reply