An audio-video based multi-modal fusion approach for emotion recognition