A Submodular Optimization Framework for Imbalanced Text Classification with Data Augmentation