Undersampling: Case Studies of Faviviral Inhibitory Activities

dc.contributor.authorBarigye, Stephen J.
dc.contributor.authorVega, José Manuel García de la
dc.contributor.authorCastillo-Garit, Juan A.
dc.date.accessioned2023-02-09T11:02:11Z
dc.date.available2023-02-09T11:02:11Z
dc.date.issued2019
dc.description.abstractImbalanced datasets, comprising of more inactive compounds relative to the active ones, are a common challenge in ligand-based model building workflows for drug discovery. This is particularly true for neglected tropical diseases since efforts to identify therapeutics for these diseases are often limited. In this report, we analyze the performance of several undersampling strategies in modeling the Dengue Virus 2 (DENV2) inhibitory activity, as well as the anti-flaviviral activities for the West Nile (WNV) and Zika (ZIKV) viruses. To this end, we build datasets comprising of 1218 (159 actives and 1059 inactives), 1044 (132 actives and 912 inactives) and 302 (75 actives and 227 inactives) molecules with known DENV2, WNV and ZIKV inhibitory activity profiles, respectively. We develop ensemble classifiers for these endpoints and compare the performance of the different undersampling algorithms on external sets. It is observed that data pruning algorithms yield superior performance relative to data selection algorithms. The best overall performance is provided by the one-sided selection algorithm with test set balanced accuracy (BACC) values of 0.84, 0.74 and 0.77 for the DENV2, WNV and ZIKV inhibitory activities, respectively. For the model building, we use the recently proposed GT-STAF information indices, and compare the predictivity of 3 molecular fragmentation approaches: connected subgraphs, substructure and alogp atom types, which are observed to show comparable performance. On the other hand, a combination of indices based on these fragmentation strategies enhances the predictivity of the built ensembles. The built models could be useful for screening new molecules with possible DENV, WNV and ZIKV inhibitory activities. ADMET modelers are encouraged to adopt undersampling algorithms in their workflows when dealing with imbalanced datasets.en_US
dc.identifier.citationBarigye, S. J., García de la Vega, J. M., & Castillo-Garit, J. A. (2019). Undersampling: case studies of flaviviral inhibitory activities. Journal of Computer-Aided Molecular Design, 33, 997-1008.https://doi.org/10.1007/s10822-019-00255-3en_US
dc.identifier.urihttps://nru.uncst.go.ug/handle/123456789/7666
dc.language.isoenen_US
dc.publisherJournal of Computer-Aided Molecular Designen_US
dc.subjectDengue virusen_US
dc.subjectWest nile virusen_US
dc.subjectInformation indexen_US
dc.subjectZika virusen_US
dc.subjectSupport vector machineen_US
dc.subjectUndersamplingen_US
dc.titleUndersampling: Case Studies of Faviviral Inhibitory Activitiesen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Undersampling case studies of flaviviral inhibitory activities.pdf
Size:
794.06 KB
Format:
Adobe Portable Document Format
Description:
Undersampling: case studies of flaviviral inhibitory activities
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: