Would you be able to parse it in a streaming fashion and just store the structure of the document in memory, with just offsets for all of the string locations, and then re-read those from disk as needed?
With modern SSDs and disk cache, that's likely enough to be plenty performant without having to store the whole document in memory at once.
I suspect with the right FM-Index Xoz might be able to store huge documents in a smaller size than the original, but that's an experiment for the future.