Generative Adversarial Networks (GANs) Based Synthetic Sampling for Predictive Modeling

Abstract
In the present report we evaluate the possible utility of the Generative Adversarial Networks (GANs) in mapping the chemical structural space for molecular property profiles, with the goal of subsequently yielding synthetic (artificial) samples for ligand-based molecular modeling. Two case studies are considered: BACE-1 (β-Secretase 1) and DENV (Dengue Virus) inhibitory activities, with the former focused on data populating and the latter on data balancing tasks. We train GANs using subsamples extracted from datasets for each bioactivity endpoint, and apply the trained networks in generating synthetic examples from the respective bioactivity chemical spaces. Original and synthetic samples are pooled together and employed to build BACE-1 and DENV inhibitory activity classifiers and their performance evaluated over tenfold external validation sets. In both case studies, the obtained classifiers demonstrate satisfactory predictivity with the former yielding accuracy (ACC) and Mathew's correlation coefficient (MCC) values of 0.80 and 0.59, while the latter produces balanced accuracy(BACC) and MCC values of 0.81 and 0.70, respectively. Moreover, the statistics of these classifiers are compared with those of other models in the literature demonstrating comparable to better performance. These results suggest that GANs may be useful in mapping the chemical space for molecular property profiles of interest, and thus allow for the extraction of synthetic examples for computational modeling.
Description
Keywords
Generative Adversarial Network, β-Secretase, Dengue Virus, Machine Learning
Citation
Barigye, S. J., Garcia de la Vega, J. M., & Perez‐Castillo, Y. (2020). Generative adversarial networks (GANs) based synthetic sampling for predictive modeling. Molecular Informatics, 39(10), 2000086.https://doi.org/10.1002/minf.202000086