Evaluating the consistency between human raters and three AI systems on the scoring of argumentative essays