This project investigates the performance of EfficientNet-B0 on the long-tailed CIFAR-10-LT dataset and explores techniques to mitigate class imbalance. The baseline model achieves an overall accuracy of 84.23% but demonstrates significant imbalanced between head and tail classes, with high recall but low precision for head classes and low recall but high precision for tail classes. To address these issues, six strategies are implemented, including Reweight, and decoupled training with five advanced loss functions: Modified Cross-Entropy, Balanced Softmax, Focal Loss, Class-Balanced Loss, and Class-Balanced Focal Loss.
The decoupled training separates feature representation learning from classifier training, allowing targeted adjustments at the classifier level. Performance is evaluated using accuracy, F1-score, precision, recall. Results show that Reweight improves tail-class recall but lower head-class recall, whereas decoupling combined with advanced loss functions significantly improves both precision and recall across head and tail classes, with Modified Cross-Entropy achieving the highest overall accuracy of 93.78%. This project demonstrates that decoupled training with advanced loss functions effectively address the class imbalance problem in the CIFAR10-LT dataset and improving classification accuracy of EfficientNet-B0.
The project focuses on reproducing the EfficientNet-B0 model using the CIFAR10-LT dataset, addressing the class imbalance problem in the CIFAR10-LT dataset and improving the classification accuracy of EfficientNet-B0 baseline model. The baseline model source code was provided, with fixed hyperparameters for training:
Learning rate: 1e-4
Weight decay: 0.9999
Number of epochs: 30
Batch size: 128
The EfficientNet-B0 was reproduced using the CIFAR10-LT dataset. The EfficientNet-B0 baseline model achieved an overall accuracy of 84.15% on the CIFAR10-LT dataset. While this looks reasonably strong at first, per-class metrics shows several class imbalance and difficulty in performance. Figure 1 shows Confusion Matrix for baseline model. The square with darker blue shade shows higher recall ( 90%) while the one with lighter shade shows lower recall (70%).
This ensures that during training, each class is sampled with equal probability regardless of how many samples it actually has, preventing the model from being biased toward head classes. To achieve this, the modified model extracted all class labels from the dataset and used a Counter to calculate how many samples belonged to each class.
It then assigned each class a weight inversely proportional to its frequency, meaning that tail classes received higher weights.
Every sample in the dataset was assigned its corresponding class weight, and the model used torch.utils.data.WeightedRandomSampler to sample data points based on these weights, thereby improving the representation of tail classes during training.
The EfficientNet-B0 with reweight technique achieved an overall accuracy of 85.82%, an increase of 1.59% from baseline model and weighted average F1-score of 85.84%,increase by 1.66% compared to baseline model. The confusion norm with reweight technique is shown in Figure 2 and the performance metrics with reweight is shown in Table 2. Comparison of Figures 1 and 2 shows that the reweight technique fixes the class imbalance by increasing the recall in some of the tail classes with darker shade in the last few squares.
To reduce the effect of class imbalance and separate representation learning from classification bias, a two-stage decoupled training strategy was adopted. The idea is to first train the model end-to-end to learn strong visual representations, and then freeze the backbone while retraining only the classifier using different advanced loss functions.
The decoupled training strategy separates the processes of representation learning and classifier learning to mitigate bias introduced by long-tailed data distributions. In standard training, the backbone (feature extractor) and classifier are optimized simultaneously using cross-entropy loss. However, in imbalanced datasets, head classes with more samples dominate the gradient updates, causing the model to develop biased decision boundaries that favor these overrepresented classes. To address this, decoupled training divides the learning process into two stages: feature learning and classifier training.
In Stage 1 (Representation Learning), both the backbone and classifier head are trained together using the standard cross-entropy loss. This stage focuses on representation learning by extracting visual features from all categories to get a strong, general feature representation for each classes. The standard cross-entropy is used to ensure that every sample contributes equally, allowing the model to learn discriminative features across all categories, even when the dataset is long-tailed. It is important to note that due to limit number of epoch (30), Stage 1 was trained with 20 epoch. Once the backbone had learned strong representations, it was frozen to prevent further updates. Only the classifier layer was retrained for a smaller number of epochs of 10 using various advanced loss function.
In Stage 2 (Classifier Re-training or Decoupling Phase), the backbone is frozen, and only the classifier is re-trained using various advanced loss function. This stage is intended to recalibrate the classifier’s decision boundaries without altering the learned feature representations, effectively decoupling representation learning from classification. Freezing the backbone helps ensure that the learned representations remain stable and unbiased by class frequency. The classifier is then optimized independently to recalibrate decision boundaries, correcting for the imbalance learned in the first stage
The modified cross-entropy loss modifies the standard CE by introducing inverse-frequency class weighting to reduce the dominance of head classes during training. While standard CE assumes all samples contribute equally, the modified version reweights the loss to downplay head classes and focus on tail classes. This was implemented by multiplying the standard CE loss with a class-dependent weighting factor wc, typically defined as inversely proportional to the number of samples in each class. This adjustment helps rebalance the gradient contribution between head and tail classes. This helps ensure that the weights are normalized to maintain overall loss scale. These normalized weights are implemented in PyTorch’s nn.CrossEntropyLoss to reweight the contribution of each class. This reweight ensures that minority (tail) classes have proportionally greater weight and prevent the classifier from being dominated by frequent categories during classifier training stage, leading to more balanced decision boundaries.
The EfficientNet-B0 model with the decoupling technique using a modified Cross-Entropy loss achieved a significant improvement over the baseline. The overall accuracy increased from 84.23% to 93.78%, a gain of 9.55%, while the weighted F1-score increased from 84.18% to 93.77%, demonstrating an overall improvement in both precision and recall
Comparing Figures 1 and 3 demonstrates that the decoupling technique with modified Cross-Entropy substantially improves the recall across nearly all classes, with every class achieving at least 90% recall, except for the truck class.
This demonstrates that the model now predicts tail classes more confidently, capturing more true positives without sacrificing precision. The significant increase in recall for cats addresses prior misclassification with visually similar classes (like dogs). The model also improved F1-score from 78–85% to over 90% for all tail classes
Balanced Softmax is a variant of the standard softmax designed to compensate for class imbalance in long-tailed datasets. Instead of treating all classes equally, it adjusts the logits by the logarithm of the class sample counts, giving underrepresented classes higher probabilities during training without changing the backbone features.