Inner speech (IS), or imagined speech without overt articulation, is a promising target for brain-computer interfaces (BCIs) aimed at restoring communication in individuals with severe speech impairments, such as locked-in syndrome. Foundation models (FMs), typically trained using self-supervised learning (SSL) on large-scale datasets, offer new opportunities for learning transferable and robust representations from neural signals. This mini review provides an overview of FM-based approaches for