Abstract Large Language Models (LLMs) are being applied in a wide array of settings, well beyond typical language-oriented use cases. In particular, LLMs are increasingly used as a plug-and-play method for generating predictions on tabular data. Prior work has shown that LLMs, via in-context learning or supervised fine-tuning, perform comparably with many tabular supervised learning techniques. However, we identify a critical vulnerability of using LLMs for tabular prediction -- making changes t