๐ญ Multimodal Emotion Recognition
This system predicts emotions from video by automatically extracting and analyzing:
- ๐ค Audio (extracted from video)
- ๐ Text (transcribed from audio using Whisper)
- ๐ฅ Video (visual frames)
How to use:
- Upload a video file (MP4, AVI, MOV, etc.)
- Click "Predict Emotion"
- The system will automatically extract audio, transcribe speech, and analyze all modalities
The model will provide emotion predictions based on all three inputs.
๐ Notes:
- Supported emotions: Angry, Happy, Neutral, Sad
- Model uses Wav2Vec2 (audio), BERT (text), and ResNet18 (video)
- Best results with clear audio, accurate transcripts, and visible faces