Comment by faassen - Hacker Neue

faassen Mar 28, 2025 parent

I myself actually had no idea succinct data structures existed until last December , but then I found a paper that used them in the context of XML. Just to be clear: it's 120% of the original size; as it stands this library still uses more memory than the original document, just not a lot of overhead. Normal tree libraries, even if the tree is immutable, take a parent pointer, and a first child pointer and next and previous sibling pointers per node. Even though some nodes can be stored more compactly it does add up.

I suspect with the right FM-Index Xoz might be able to store huge documents in a smaller size than the original, but that's an experiment for the future.

lambda Mar 28, 2025

Would you be able to parse it in a streaming fashion and just store the structure of the document in memory, with just offsets for all of the string locations, and then re-read those from disk as needed?

With modern SSDs and disk cache, that's likely enough to be plenty performant without having to store the whole document in memory at once.

This item has no comments currently.