Trusting ChatGPT? When a Subtle Variation in the Prompt Can Significantly Alter the Results
Jaime E. Cuellar·Juan Guillermo Torres Hurtado·Jaime Andres Pavlich-Mariscal·Óscar Moreno-Martínez·Paula Sofía Torres Rodríguez·Andrés Felipe Micán Castiblanco
How much can we trust highly complex predictive models like ChatGPT? This study tests if subtle changes in prompt structuring do not produce significant variations in the classification results of sentiment polarity analysis generated by the LLM GPT-4o mini. The model classified 100.000 comments in Spanish on four Latin American presidents as positive, negative, or neutral on 10 occasions, varying the prompts each time. The experimental methodology included exploratory and confirmatory analyses
