This project investigates the performance of EfficientNet-B0 on the long-tailed CIFAR-10-LT dataset and explores techniques to mitigate class imbalance. The baseline model achieves an overall accuracy of 84.23% but demonstrates significant imbalanced between head and tail classes, with high recall but low precision for head classes and low recall but high precision for tail classes. To address these issues, six strategies are implemented, including Reweight, and decoupled training with five advanced loss functions: Modified Cross-Entropy, Balanced Softmax, Focal Loss, Class-Balanced Loss, and Class-Balanced Focal Loss.
The decoupled training separates feature representation learning from classifier training, allowing targeted adjustments at the classifier level. Performance is evaluated using accuracy, F1-score, precision, recall. Results show that Reweight improves tail-class recall but lower head-class recall, whereas decoupling combined with advanced loss functions significantly improves both precision and recall across head and tail classes, with Modified Cross-Entropy achieving the highest overall accuracy of 93.78%. This project demonstrates that decoupled training with advanced loss functions effectively address the class imbalance problem in the CIFAR10-LT dataset and improving classification accuracy of EfficientNet-B0.
The project focuses on reproducing the EfficientNet-B0 model using the CIFAR10-LT dataset, addressing the class imbalance problem in the CIFAR10-LT dataset and improving the classification accuracy of EfficientNet-B0 baseline model. The baseline model source code was provided, with fixed hyperparameters for training:
Learning rate: 1e-4
Weight decay: 0.9999
Number of epochs: 30
Batch size: 128
The EfficientNet-B0 was reproduced using the CIFAR10-LT dataset. The EfficientNet-B0 baseline model achieved an overall accuracy of 84.15% on the CIFAR10-LT dataset. While this looks reasonably strong at first, per-class metrics shows several class imbalance and difficulty in performance. Figure 1 shows Confusion Matrix for baseline model. The square with darker blue shade shows higher recall ( 90%) while the one with lighter shade shows lower recall (70%).
This ensures that during training, each class is sampled with equal probability regardless of how many samples it actually has, preventing the model from being biased toward head classes. To achieve this, the modified model extracted all class labels from the dataset and used a Counter to calculate how many samples belonged to each class.
It then assigned each class a weight inversely proportional to its frequency, meaning that tail classes received higher weights.
Every sample in the dataset was assigned its corresponding class weight, and the model used torch.utils.data.WeightedRandomSampler to sample data points based on these weights, thereby improving the representation of tail classes during training.
The EfficientNet-B0 with reweight technique achieved an overall accuracy of 85.82%, an increase of 1.59% from baseline model and weighted average F1-score of 85.84%,increase by 1.66% compared to baseline model. The confusion norm with reweight technique is shown in Figure 2 and the performance metrics with reweight is shown in Table 2. Comparison of Figures 1 and 2 shows that the reweight technique fixes the class imbalance by increasing the recall in some of the tail classes with darker shade in the last few squares.
To reduce the effect of class imbalance and separate representation learning from classification bias, a two-stage decoupled training strategy was adopted. The idea is to first train the model end-to-end to learn strong visual representations, and then freeze the backbone while retraining only the classifier using different advanced loss functions.
The decoupled training strategy separates the processes of representation learning and classifier learning to mitigate bias introduced by long-tailed data distributions. In standard training, the backbone (feature extractor) and classifier are optimized simultaneously using cross-entropy loss. However, in imbalanced datasets, head classes with more samples dominate the gradient updates, causing the model to develop biased decision boundaries that favor these overrepresented classes. To address this, decoupled training divides the learning process into two stages: feature learning and classifier training.
In Stage 1 (Representation Learning), both the backbone and classifier head are trained together using the standard cross-entropy loss. This stage focuses on representation learning by extracting visual features from all categories to get a strong, general feature representation for each classes. The standard cross-entropy is used to ensure that every sample contributes equally, allowing the model to learn discriminative features across all categories, even when the dataset is long-tailed. It is important to note that due to limit number of epoch (30), Stage 1 was trained with 20 epoch. Once the backbone had learned strong representations, it was frozen to prevent further updates. Only the classifier layer was retrained for a smaller number of epochs of 10 using various advanced loss function.
In Stage 2 (Classifier Re-training or Decoupling Phase), the backbone is frozen, and only the classifier is re-trained using various advanced loss function. This stage is intended to recalibrate the classifier’s decision boundaries without altering the learned feature representations, effectively decoupling representation learning from classification. Freezing the backbone helps ensure that the learned representations remain stable and unbiased by class frequency. The classifier is then optimized independently to recalibrate decision boundaries, correcting for the imbalance learned in the first stage