![]() |
|
![]() |
| I recently took a big gamble on DuckDB. I've added it to the core of qStudio a free SQL editor. Now within the editor you can save table results from any database to the local duckdb instance or you can even perform joins amongst any server and I do mean almost any server. Out of the box DuckDB supports mysql/postgresql but I've added support for 20+ other database though it probably doesn't support all their types. If you want to get started with DuckDB qStudio is a good IDE:
https://www.timestored.com/qstudio/duckdb-file-viewer#qduckd...
|
![]() |
| Why wouldn't you mail your co-worker the SQL text of this query? What is the advantage of the binary database file with no data? |
![]() |
| You then also have to mail them every update to the SQL and hope they're handling versioning well
With this approach, next time they attach the DB, they automatically see the latest version |
![]() |
| Not a whole lot of operational difference between sending them a snowflake view name or sending an S3 URL but there's probably two orders of magnitude cost difference |
![]() |
| The future is here, it's just not evenly distributed. Wait a decade (or less) and this will be in open source. Heck, it might already be. |
![]() |
| Shameless plug - Data Tap[1] is a custom made Lambda function with embedded DuckDB and AWS managed ingestion URL where you can HTTP post your data as much as you like. It will buffer your data and then use DuckDB SQL clause to land the datas Parquet on S3.
- Deploy to your own account (BYOC) - "S3 first": Partitioned and compressed Parquet on S3 - Secure sharing of write access to the Data Tap URL - 50x more cost efficient than e.g. "Burnhose" whilst also having unmatched scalability of Lambda (1000 instances in 10s). [1] https://www.taps.boilingdata.com/ (founder) |
![]() |
| I think you touched on the real reason: we separate services now and they all have different databases. We shifted up to the application layer not because of technical reasons but social reasons. |
![]() |
| Maybe because claiming they are bad with out explaining why, giving explicit examples, or really digging into the nuance of the more than one feature mentioned makes the claim feel unsubstantiated. |
![]() |
| i feel like i addressed this in a different subthread https://news.ycombinator.com/reply?id=40526231&goto=threads%...
> You can source control them, but then you have a deployment coordination problem. Often changes to business logic will be half in application code and half in the sproc, and you have to coordinate the deployment of that change. You have to somehow reason about what version of the sprocs are on your server. Add on that usually in enviroments that use sprocs, changes to them are gatekept by DBAs. Just write your business logic in application code. |
![]() |
| It makes me think that for many projects, recutils would be a better fit if you’re not gonna use the advanced functionality anyway. It should do just fine on medium sized DBs and is so much simpler. |
![]() |
| Yep, that's me. I use sqlite as a key value store. Usually just two columns per table, a key and value, that I map to a python dict, and pickle the value if I need something more complex. |
![]() |
| > Implementing constraints in application code is certainly a lot easier
I can’t imagine doing this. You’re basically saying that you will only ever have one program writing to your database. |
![]() |
| I generate my pl/pgsql using Lisp. Which is another technology that has been dying for decades. The money it generates seems to work just as well as the money I earned writing js. |
![]() |
| I don’t get this mentality, they’re just code that lives closed to your data, and can do bulk processing on data in your database way more efficiently than anything living in the application layer. |
![]() |
| the approach in the article is for analytics / DW type use cases, not OTLP. So a dig about kids these days not understanding constraints is not particularly relevant. |
![]() |
| The reason a lot of more advanced database features aren't used is simply because a lot (most?) web developers don't know them and have a very light understanding of databases in general. |
![]() |
| The article shows dynamically fetching a database, with database level views, that dynamically fetch data.
You're just showing a query that dynamically fetches data. |
![]() |
| I think what was implied was that the user just references the view so as long as you update the view in concert with the data format change then the user is none the wiser about the change. |
![]() |
| Yeah, I guess you could equivalently put a SQLite database with a view or virtual table in S3 which would give you the same level of indirection and API abstraction provided by this mechanism.
Where DuckDB will have an advantage is in the processing speed for OLAP type queries. I don't know what the current state of SQLite Virtual Tables for parquet files on S3 is, but DuckDB has a number of optimisations for it like only reading the required columns and row groups through range queries. SQLite has a row oriented processing model so I suspect that it wouldn't do that in the same way unless there is a specific vtable extension for it. You can get a comparable benefit for data in a sqlite db itself with the project from the following blogpost but that wouldn't apply to collections of parquet files: https://phiresky.github.io/blog/2021/hosting-sqlite-database... |
![]() |
| For things like this I just export to a sqlite DB file and send that along. This is interesting for AWS/S3 datastores though. |
![]() |
| Sure, but the OP is using DuckDB as a way to export some smaller amount of data for someone. That's a great use-case for SQLite, from whatever your real DB is, which for the OP is also DuckDB. |
![]() |
| I admit I'm not crazy deep in this space but I'm _pretty_ into this space and I've never heard of Carbon.
Parquet is kind of winning the OSS columnar format race right now. |
![]() |
| "Good enough" makes it sound like barely a step up from a CSV file. I'd say its support for various encodings [1] including a great default (dictionary + run length encoding on the indices) and compression algorithms that can be set for each individual column, columnar access, partitioning, a parallelized reader out of the box, in-memory filtering and other ops concurrently with loading in the data (thanks to Arrow) etc. etc. are all really wonderful when working with medium-sized data.
[1] https://parquet.apache.org/docs/file-format/data-pages/encod... |
![]() |
| DuckDB has Swift bindings, but unfortunately, afaik, nothing official for Android. If anyone has gotten it working on Android I'd love to hear about it. |
Our team is currently building a form builder SaaS. Most forms have responses under 1,000, but some of them would have more than 50,000 responses.
So, when user tries to explore through all responses in our “response sheet” feature, usually they could be loaded via infinite scrolling (load as they scroll).
This uses up to 100MB of network in total if they had to get object arrays of 50,000 rows of data with 50 columns.
That was where duckdb kicked in : just store the responses into S3 as parquet file(in our case Cloudflare R2).
Then, load the whole file into duckdb-wasm into client. So when you scroll through sheet, instead of getting rows from server, you query rows from local db.
This made our sheet feature very efficient and consistent in terms of their speed and memory usage.
If network speed and memory is your bottle neck when loading “medium” data into your client, you definitely should give it a try.
PS. If you have any questions, feel free to ask!
PS. Our service is called Walla, check it out at https://home.walla.my/en