38/60 Days System Design Questions

Your LLM has 128K tokens. Your document has 150K words. Something has to give. What do you do? A) Chunk the document into fixed-size pieces and embed each one — retrieve the top-k at query time. B) Use a sliding window — process the document in overlapping chunks, stitch the outputs together. C) Summarize each section progressively — feed the running summary forward as context. D) Truncate to the most recent tokens and hope the answer is near the end. Three of these are real strategies teams shi