๐ŸŽญ Multimodal Emotion Recognition

This system predicts emotions from video by automatically extracting and analyzing:

  • ๐ŸŽค Audio (extracted from video)
  • ๐Ÿ“ Text (transcribed from audio using Whisper)
  • ๐ŸŽฅ Video (visual frames)

How to use:

  1. Upload a video file (MP4, AVI, MOV, etc.)
  2. Click "Predict Emotion"
  3. The system will automatically extract audio, transcribe speech, and analyze all modalities

The model will provide emotion predictions based on all three inputs.


๐Ÿ“Œ Notes:

  • Supported emotions: Angry, Happy, Neutral, Sad
  • Model uses Wav2Vec2 (audio), BERT (text), and ResNet18 (video)
  • Best results with clear audio, accurate transcripts, and visible faces