ArkTS code generation: A comprehensive evaluation with large language models
This paper presents the first systematic evaluation of ArkTS code generation with large language models, which uses 300 prompts across three difficulty levels and measures Pass@1, compilation rate, and generation time in milliseconds while it maps compiler messages into syntax, type, undefined reference, and other failures, and it also adds an independent LLM judge with a fixed scoring rule. We evaluate 21 models and we find that functional correctness stays low and compilation varies widely, si
