![]() |
|
![]() |
| Not surviving more than 2 weeks in a QF role because of kdb, and then suggesting they should rewrite everything to LISP is one of the more HN level recidivous comments I think I have ever seen. |
![]() |
| Not real time, just historical. (I don’t see why it can’t be used for real time though... but haven’t thought through the caveats)
Also, not sure what you mean by Parquet is not good at appending? On the contrary, Parquet is designed for an append-only paradigm (like Hadoop back in the day). You can just drop a new parquet file and it’s appended. If you have 1.parquet, all you have you to do is drop 2.parquet in the same folder or Hive hierarchy. Then query>
DuckDB automatically scans all the parquet in that directory structure when it queries. If there’s a predicate, it uses Parquet header information to skip files that don’t contain the data requested so it’s very fast.In practice we use a directory structure called Hive partitioning, which helps DuckDB do partition elimination to skip over irrelevant partitions, making it even faster. https://duckdb.org/docs/data/partitioning/hive_partitioning Parquet is great for appending! Now, it's not so good at updating because it's a write-once format (not read-write). To update a single record in a Parquet file entails regenerating the entire Parquet file. So if you have late-arriving updates, you need to do extra work to identify the partition involved and overwrite. Either that or use bitemporal modeling (add data arrival timestamp [1]) and do a latest date clause in your query (entailing more compute). If you have a scenario where existing data changes a lot, Parquet is not a good format for you. You should look into Timescale (time-series database based on Postgres) |
![]() |
| It's not a good filter in that case. I can learn obscure languages just fine, but that doesn't make me any more pleasant to hang out with. |
![]() |
| I agree that being able to write one piece of code that solves your use case is a big benefit over having to cobble together a message queue, stream processor, database, query engine, etc.
We've been playing around with the idea of a building such an integration layer in SQL on top of open-source technologies like Kafka, Flink, Postgres, and Iceberg with some syntactic sugar to make timeseries processing nicer in SQL: https://github.com/DataSQRL/sqrl/ The idea is to give you the power of kdb+ with open-source technologies and SQL in an integrated package by transpiling SQL, building the computational DAG, and then running an cost-based optimizer to "cut" the DAG to the underlying data technologies. |
![]() |
| 3 is a point that’s lost on people who use Q and related things for financial calculations. They picked kdb+ for a reason, and it wasn’t the database. I took that as the point of the post. |
![]() |
| Is it still possible to learn from scratch and make big bucks developing for kdb+ (k/q)? I remember seeing an open position a few years ago which paid like 1MM per year. Astounding. |
It's also a column store, with compression. Runs super fast, I've used it in a couple of financial applications. Huge amounts of tick data, all coming down to your application nearly as fast as the hardware will allow.
Good support, the guys on Slack are responsive. No, I don't have shares in it, I just like it.
Regarding kdb, I've used it, but there are significant drawbacks. Costs a bunch of money, that's a big one. And the language... I mean it's nice to nerd out sometimes with a bit of code golf, but at some point you are going to snap out of it and decide that single characters are not as expressive as they seem.
If your thing is ad-hoc quant analysis, then maybe you like kdb. You can sit there and type little strings into the REPL all day in order to find money. But a lot of things are more like cron jobs, you know you need this particular query run on a schedule, so just turn it into something legible that the next guy will understand and maintain.