Four Models in One Training Loop: Architecting SDAR on AWS (Before Renting a Single GPU)

Recap. In Part 1 we landed on the core idea of SDAR ( arXiv:2605.15155 ): keep RL as the backbone, bolt on a privileged teacher for dense token-level guidance, and put a sigmoid gate between them so the student amplifies the teacher's confident advice and softens its noisy rejections. We also said the quiet part out loud - this is not a Bedrock fine-tuning checkbox. This part is the blueprint. The whole system on one diagram, mapped to AWS services, with the memory math that picks your instance