Automatic generation of labanotation based on a hybrid transformer–LSTM network with multi-scale spatio-temporal features