Browsing by Author "Marrero-Ponce, Yovani"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
Item Antiprotozoan lead discovery by aligning dry and wet screening: Prediction, synthesis, and biological assay of novel quinoxalinones(Bioorganic & medicinal chemistr, 2014) Martins Alho, Miriam A.; Marrero-Ponce, Yovani; Barigye, Stephen J.; Meneses-Marcel, AlfredoProtozoan parasites have been one of the most significant public health problems for centuries and several human infections caused by them have massive global impact. Most of the current drugs used to treat these illnesses have been used for decades and have many limitations such as the emergence of drug resistance, severe side-effects, low-to-medium drug efficacy, administration routes, cost, etc. These drugs have been largely neglected as models for drug development because they are majorly used in countries with limited resources and as a consequence with scarce marketing possibilities. Nowadays, there is a pressing need to identify and develop new drug-based antiprotozoan therapies. In an effort to overcome this problem, the main purpose of this study is to develop a QSARs-based ensemble classifier for antiprotozoan drug-like entities from a heterogeneous compounds collection. Here, we use some of the TOMOCOMD-CARDD molecular descriptors and linear discriminant analysis (LDA) to derive individual linear classification functions in order to discriminate between antiprotozoan and non-antiprotozoan compounds as a way to enable the computational screening of virtual combinatorial datasets and/or drugs already approved. Firstly, we construct a wide-spectrum benchmark database comprising of 680 organic chemicals with great structural variability (254 of them antiprotozoan agents and 426 to drugs having other clinical uses). This series of compounds was processed by a k-means cluster analysis in order to design training and predicting sets. In total, seven discriminant functions were obtained, by using the whole set of atom-based linear indices. All the LDA-based QSAR models show accuracies above 85% in the training set and values of Matthews correlation coefficients (C) vary from 0.70 to 0.86. The external validation set shows rather-good global classifications of around 80% (92.05% for best equation). Later, we developed a multi-agent QSAR classification system, in which the individual QSAR outputs are the inputs of the aforementioned fusion approach. Finally, the fusion model was used for the identification of a novel generation of lead-like antiprotozoan compounds by using ligand-based virtual screening of ‘available’ small molecules (with synthetic feasibility) in our ‘in-house’ library. A new molecular subsystem (quinoxalinones) was then theoretically selected as a promising lead series, and its derivatives subsequently synthesized, structurally characterized, and experimentally assayed by using in vitro screening that took into consideration a battery of five parasite-based assays. The chemicals 11(12) and 16 are the most active (hits) against apicomplexa (sporozoa) and mastigophora (flagellata) subphylum parasites, respectively. Both compounds depicted good activity in every protozoan in vitro panel and they did not show unspecific cytotoxicity on the host cells. The described technical framework seems to be a promising QSAR-classifier tool for the molecular discovery and development of novel classes of broad—antiprotozoan—spectrum drugs, which may meet the dual challenges posed by drug-resistant parasites and the rapid progression of protozoan illnesses.Item Derivatives in discrete mathematics: a novel graph-theoretical invariant for generating new 2/3D molecular descriptors. I. Theory and QSPR application(Journal of computer-aided molecular design, 2012) Marrero-Ponce, Yovani; Santiago, Oscar Martínez; Barigye, Stephen J.; Torrens, FranciscoIn this report, we present a new mathematical approach for describing chemical structures of organic molecules at atomic-molecular level, proposing for the first time the use of the concept of the derivative (∂ ) of a molecular graph (MG) with respect to a given event (E), to obtain a new family of molecular descriptors (MDs). With this purpose, a new matrix representation of the MG, which generalizes graph’s theory’s traditional incidence matrix, is introduced. This matrix, denominated the generalized incidence matrix, Q, arises from the Boolean representation of molecular sub-graphs that participate in the formation of the graph molecular skeleton MG and could be complete (representing all possible connected sub-graphs) or constitute sub-graphs of determined orders or types as well as a combination of these. The Q matrix is a non-quadratic and unsymmetrical in nature, its columns (n) and rows (m) are conditions (letters) and collection of conditions (words) with which the event occurs. This non-quadratic and unsymmetrical matrix is transformed, by algebraic manipulation, to a quadratic and symmetric matrix known as relations frequency matrix, F, which characterizes the participation intensity of the conditions (letters) in the events (words). With F, we calculate the derivative over a pair of atomic nuclei. The local index for the atomic nuclei i, Δ i , can therefore be obtained as a linear combination of all the pair derivatives of the atomic nuclei i with all the rest of the j′s atomic nuclei. Here, we also define new strategies that generalize the present form of obtaining global or local (group or atom-type) invariants from atomic contributions (local vertex invariants, LOVIs). In respect to this, metric (norms), means and statistical invariants are introduced. These invariants are applied to a vector whose components are the values Δ i for the atomic nuclei of the molecule or its fragments. Moreover, with the purpose of differentiating among different atoms, an atomic weighting scheme (atom-type labels) is used in the formation of the matrix Q or in LOVIs state. The obtained indices were utilized to describe the partition coefficient (Log P) and the reactivity index (Log K) of the 34 derivatives of 2-furylethylenes. In all the cases, our MDs showed better statistical results than those previously obtained using some of the most used families of MDs in chemometric practice. Therefore, it has been demonstrated to that the proposed MDs are useful in molecular design and permit obtaining easier and robust mathematical models than the majority of those reported in the literature. All this range of mentioned possibilities open “the doors” to the creation of a new family of MDs, using the graph derivative, and avail a new tool for QSAR/QSPR and molecular diversity/similarity studies.Item Discrete Derivatives for Atom-Pairs as a Novel Graph Theoretical Invariant for Generating New Molecular Descriptors: Orthogonality, Interpretation and QSARs/ QSPRs on Benchmark Databases(Molecular Informatics, 2014) Martínez-Santiago, Oscar; Marrero-Ponce, Yovani; Barigye, Stephen J.; Torrens, Francisco; Pérez-Giménez, FacundoThis report presents a new mathematical method based on the concept of the derivative of a molecular graph (G) with respect to a given event (S) to codify chemical structure information. The derivate over each pair of atoms in the molecule is defined as ∂G/∂S(vi , vj)=(fi−2fij+fj)/fij, where fi (or fj) and fij are the individual frequency of atom i (or j) and the reciprocal frequency of the atoms i and j, respectively. These frequencies characterize the participation intensity of atom pairs in S. Here, the event space is composed of molecular sub-graphs which participate in the formation of the G skeleton that could be complete (representing all possible connected sub-graphs) or comprised of sub-graphs of certain orders or types or combinations of these. The atom level graph derivative index, Δi, is expressed as a linear combination of all atom pair derivatives that include the atomic nuclei i. Global [total or local (group or atom-type)] indices are obtained by applying the so called invariants over a vector of Δi values. The novel MDs are validated using a data set of 28 alkyl-alcohols and other benchmark data sets proposed by the International Academy of Mathematical Chemistry. Also, the boiling point for the alcohols, the adrenergic blocking activity of N,N-dimethyl-2-halo-phenethylamines and physicochemical properties of polychlorinated biphenyls and octanes are modeled. These models exhibit satisfactory predictive power compared with other 0–3D indices implemented successfully by other researchers. In addition, tendencies of the proposed indices are investigated using examples of various types of molecular structures, including chain-lengthening, branching, heteroatoms-content, and multiple bonds. On the other hand, the relation of atom-based derivative indices with 17O NMR of a series of ethers and carbonyls reflects that the new MDs encode electronic, topological and steric information. Linear independence between the graph derivative indices and other 0-3D MDs is demonstrated by using principal component analysis on a dataset of 41 heterogeneous molecules. It is concluded that the graph derivative indices are independent indices containing important structural information to be used in QSPR/QSAR and drug design studies, and permit obtaining easier, more interpretable and robust mathematical models than the majority of those reported in the literature.Item Extended GT-STAF Information Indices based on Markov Approximation Models(Chemical Physics Letters, 2013) Barigye, Stephen J.; Marrero-Ponce, Yovani; Alfonso-Reguera, Vitalio; Pérez-Giménez, FacundoA series of novel information theory-based molecular parameters derived from the insight of a molecular structure as a chemical communication system were recently presented and usefully employed in QSAR/QSPRs (J. Comp. Chem, 2013, 34, 259; SAR and QSAR in Environ. Res. 2013, 24). This approach permitted the application of Shannon’s source and channel coding entropic measures to a chemical information source comprised of molecular ‘fragments’, using the zero-order Markov approximation model (atom-based approach). This report covers the theoretical aspects of the extensions of this approach to higher-order models, introducing the first, second and generalized-order Markov approximation models.Item Extending Graph (Discrete) Derivative Descriptors to N-Tuple Atom-Relations(Match-Communications in Mathematical and in Computer Chemistry, 2015) Santiago, Oscar Martínez; Marrero-Ponce, Yovani; Cabrera, Reisel Millán; Barigye, Stephen J.; Martínez, Luis M. ArtilesIn the present manuscript, an extension of the previously defined Graph Derivative Indices (GDIs) is discussed. To achieve this objective, the concept of a hypermatrix, conceived from the calculation of the frequencies of triple and quadruple atom relations in a set of connected sub-graphs, is introduced. This set of subgraphs is generated following a predefined criterion, known as the event (S), being in this particular case the connectivity among atoms. The triple and quadruple relations frequency matrices serve as a basis for the computation of triple and quadruple discrete derivative indices, respectively. The GDIs are implemented in a computational program denominated DIVATI (acronym for DIscrete DeriVAtive Type Indices), a module of TOMOCOMD-CARDD program. Shannon‟s entropy-based variability analysis demonstrates that the GDIs show major variability than others indices used in QSAR/QSPR researches. In addition, it can be appreciated when the indices are extended over n-elements from the graph, its quality increases, principally when they are used in a combined way.Item IMMAN: Free Software for Information theory-based Chemometric Analysis(Molecular diversity, 2015) Urias, Ricardo W. Pino; Barigye, Stephen J.; Marrero-Ponce, Yovani; García-Jacas, César R.; Perez-Gimenez, FacundoThe features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon’s entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software (http://mobiosd-hub.com/imman-soft/), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms.Item In silico Antibacterial Activity Modeling Based on the TOMOCOMD-CARDD Approach(Journal of the Brazilian Chemical Society, 2015) Castillo-Garit, Juan A.; Marrero-Ponce, Yovani; Barigye, Stephen J.; Medina-Marrero, RicardoIn the recent times, the race to cope with the increasing multidrug resistance of pathogenic bacteria has lost much of its momentum and health professionals are grasping for solutions to deal with the unprecedented resistance levels. As a result, there is an urgent need for a concerted effort towards the development of new antimicrobial drugs to stay ahead in the fight against the ever adapting bacteria. In the present report, antibacterial classification functions (models) based on the topological molecular computational design-computer aided ‘‘rational’’ drug design (TOMOCOMD-CARDD) atom-based non-stochastic and stochastic bilinear indices are presented. These models were built using the linear discriminant analysis (LDA) method over a balanced chemical compounds dataset of 2230 molecular structures, with a diverse range of structural and molecular mechanism modes. The results of this study indicated that the non-stochastic and stochastic bilinear indices provided excellent classification of the chemical compounds (with accuracies of 86.31% and 84.92%, respectively, in the training set). These models were further externally validated yielding correct classification percentages of 86.55% and 87.91% for the non-stochastic and stochastic bilinear models, respectively. Additionally, the obtained models were compared with those reported in the literature and demonstrated comparable results, although the latter were built over much smaller datasets and with much higher degrees of freedom. Finally, simulated ligand-based virtual screening of 116 compounds, recently identified as potential antibacterials, was performed yielding 86.21% and 83.62% of correct classification, respectively, and thus demonstrating the utility of the obtained TOMOCOMD-CARDD models in the search of novel compounds with desirable antibacterial activity.Item Novel 3D Bio-Macromolecular Bilinear Descriptors for Protein Science: Predicting Protein Structural Classes(Journal of theoretical biology, 2015) Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R.; Barigye, Stephen J.In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝn space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝn space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC2) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions.Item Novel global and local 3D atom-based linear descriptors of the Minkowski distance matrix: theory, diversity–variability analysis and QSPR applications(Journal of Mathematical Chemistry, 2015) Cubillán, Néstor; Marrero-Ponce, Yovani; Ariza-Rico, Harold; Barigye, Stephen J.; Valdes-Martini, José R.; Alvarado, Ysaías J.A new family of alignment-free 3D descriptors based on TOMOCOMD-CARDD framework has been designed, namely 3D-linear indices. In this report, we have proposed the use of a generalized form of the geometric pairwise atom-atom distance matrix as structural information matrix. This matrix, denominated as non-stochastic, uses as matrix form of linear maps as well as their algebraic transformations: stochastic, double stochastic and mutual probabilities matrices. The methodology for 3D-QSAR studies is based on the combined use of global and local approaches. Principal component analysis reveals that the novel indices are capable of capturing structural information not codified by the indices implemented in the DRAGON’s software. Moreover, Shannon’s entropy based variability analysis comparing the 3D-linear indices with some relevant descriptors suggests that the former encode similar-to-better amount of structural information than these descriptors. Finally, a search for the best regressions for congeneric databases in QSPR modeling was performed. The overall results demonstrates satisfactory behavior.Item Overlap and Diversity in Antimicrobial Peptide Databases: compiling a non-redundant set of sequences(Bioinformatics, 2015) Aguilera-Mendoza, Longendri; Marrero-Ponce, Yovani; Tellez-Ibarra, Roberto; Barigye, Stephen J.; Liu, JunThe large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced.Item Physico-Chemical and Structural Interpretation of Discrete Derivative Indices on N-Tuples Atoms(International journal of molecular sciences, 2016) Martínez-Santiago, Oscar; Marrero-Ponce, Yovani; Barigye, Stephen J.; Vivas-Reyes, RicardoThis report examines the interpretation of the Graph Derivative Indices (GDIs) from three different perspectives (i.e., in structural, steric and electronic terms). It is found that the individual vertex frequencies may be expressed in terms of the geometrical and electronic reactivity of the atoms and bonds, respectively. On the other hand, it is demonstrated that the GDIs are sensitive to progressive structural modifications in terms of: size, ramifications, electronic richness, conjugation effects and molecular symmetry. Moreover, it is observed that the GDIs quantify the interaction capacity among molecules and codify information on the activation entropy. A structure property relationship study reveals that there exists a direct correspondence between the individual frequencies of atoms and Hückel’s Free Valence, as well as between the atomic GDIs and the chemical shift in NMR, which collectively validates the theory that these indices codify steric and electronic information of the atoms in a molecule. Taking in consideration the regularity and coherence found in experiments performed with the GDIs, it is possible to say that GDIs possess plausible interpretation in structural and physicochemical termsItem QSRR Prediction of Gas Chromatography Retention Indices of Essential Oil Components(Chemical Papers, 2018) Marrero-Ponce, Yovani; Barigye, Stephen J.; Jorge-Rodrı´guez, Marı´a E.; Tran-Thi-Thu, TrangA comprehensive and largest (to the best of our knowledge) database of 791 essential oil components (EOCs) with corresponding gas chromatographic retention properties has been built. With this data set, Quantitative structure–retention relationship (QSRR) models for the prediction of the Kováts retention indices (RIs) on the non-polar DB-5 stationary phase have been built using the DRAGON molecular descriptors and the regression methods: multiple linear regression (MLR) and artificial neural networks (ANN). The obtained models demonstrate good performance, evidenced by the satisfactory statistical parameters for the best MLR (R 2 = 96.75% and Q2ext = 98.0%) and ANN (R 2 = 97.18% and Q2ext = 98.4%) models, respectively. In addition, the built models provide information on the factors that influence the retention of EOCs over the DB-5 stationary phase. Comparisons of the statistical parameters for the QSRR models in the present study with those reported in the literature demonstrate comparable to superior performance for the former. The obtained models constitute valuable tools for the prediction of RIs for new EOCs whose experimental data are undetermined.Item QuBiLS-MIDAS: A Parallel Free-Software for Molecular Descriptors Computation Based on Multilinear Algebraic Maps(Journal of Computational Chemistry, 2014) García-Jacas, César R.; Marrero-Ponce, Yovani; Barigye, Stephen J.; Contreras-Torres, ErnestoThe present report introduces the QuBiLS-MIDAS software belonging to the ToMoCoMD-CARDD suite for the calculation of three-dimensional molecular descriptors (MDs) based on the two-linear (bilinear), three-linear, and four-linear (multilinear or N-linear) algebraic forms. Thus, it is unique software that computes these tensor-based indices. These descriptors, establish relations for two, three, and four atoms by using several (dis-)similarity metrics or multimetrics, matrix transformations, cutoffs, local calculations and aggregation operators. The theoretical background of these N-linear indices is also presented. The QuBiLS-MIDAS software was developed in the Java programming language and employs the Chemical Development Kit library for the manipulation of the chemical structures and the calculation of the atomic properties. This software is composed by a desktop user-friendly interface and an Abstract Programming Interface library. The former was created to simplify the configuration of the different options of the MDs, whereas the library was designed to allow its easy integration to other software for chemoinformatics applications. This program provides functionalities for data cleaning tasks and for batch processing of the molecular indices. In addition, it offers parallel calculation of the MDs through the use of all available processors in current computers. The studies of complexity of the main algorithms demonstrate that these were efficiently implemented with respect to their trivial implementation. Lastly, the performance tests reveal that this software has a suitable behavior when the amount of processors is increased. Therefore, the QuBiLS-MIDAS software constitutes a useful application for the computation of the molecular indices based on N-linear algebraic maps and it can be used freely to perform chemoinformatics studies.Item Relations Frequency Hypermatrices in Mutual, Conditional and Joint Entropy-Based Information Indices(Journal of Computational Chemistry, 2013) Barigye, Stephen J.; Marrero-Ponce, Yovani; Martı´nez-Lopez, Yoan; Torrens, FranciscoGraph-theoretic matrix representations constitute the most popular and significant source of topological molecular descriptors (MDs). Recently, we have introduced a novel matrix representation, named the duplex relations frequency matrix, F, derived from the generalization of an incidence matrix whose row entries are connected subgraphs of a given molecular graph G. Using this matrix, a series of information indices (IFIs) were proposed. In this report, an extension of F is presented, introducing for the first time the concept of a hypermatrix in graph-theoretic chemistry. The hypermatrix representation explores the n-tuple participation frequencies of vertices in a set of connected subgraphs of G. In this study we, however, focus on triple and quadruple participation frequencies, generating triple and quadruple relations frequency matrices, respectively. The introduction of hypermatrices allows us to redefine the recently proposed MDs, that is, the mutual, conditional, and joint entropy-based IFIs, in a generalized way. These IFIs are implemented in GT-STAF (acronym for Graph Theoretical Thermodynamic STAte Functions), a new module of the TOMOCOMD-CARDD program. Information theoretic-based variability analysis of the proposed IFIs suggests that the use of hypermatrices enhances the entropy and, hence, the variability of the previously proposed IFIs, especially the conditional and mutual entropy based IFIs. The predictive capacity of the proposed IFIs was evaluated by the analysis of the regression models, obtained for physico-chemical properties the partition coefficient (Log P) and the specific rate constant (Log K) of 34 derivatives of 2-furylethylene. The statistical parameters, for the best models obtained for these properties, were compared to those reported in the literature depicting better performance. This result suggests that the use of the hypermatrix-based approach, in the redefinition of the previously proposed IFIs, avails yet other valuable tools beneficial in QSPR studies and diversity analysis.Item Structural and Physicochemical Interpretation of GT-STAF Information Theory-Based Indices(Bulletin of the Chemical Society of Japan, 2015) Barigye, Stephen J.; Marrero-Ponce, Yovani; Zupan, Jure; Pérez-Giménez, FacundoThe underlying structural and physicochemical interpretation of the recently defined information indices (denominated as GT-STAF indices) is examined, with the aim of gaining greater insight on the codified chemical information. It is found that these indices are related with molecular symmetry in the context of the defined molecular “fragment” model. Moreover, these indices are sensitive to structural differences, demonstrating gradual changes consistent with modifications in the molecular structure. A principal component analysis reveals that the GT-STAF indices generally codify conformational, physicochemical, and thermodynamic properties of amino acids. A study with aniline derivatives demonstrates that the GT-STAF indices do not directly correlate with the ionization constant (pKa); but rather require multivariate contributions to yield correlations comparable with univariate models for quantum chemical parameters, suggesting that the former codify some other form of electronic information orthogonal to the latter. Finally, an evaluation of atomic contributions to the molecular hydrophobicity in furylethylenes demonstrates that the GT-STAF approach generally approximates to chemical properties quite well.Item The Summation of Atomic Contributions is an Overly Simplified Charac terization of the Holistic Molecular Behavior(Letters in Drug Design & Discovery, 2016) Martínez-López, Yoan; Marrero-Ponce, Yovani; Jaramillo, Gustavo Echeverri; Barigye, Stephen J.The present report introduces a set of aggregation operators (AOs) to calculate total and local molecular descriptors (MDs) as a generalized approach for the sum of the components of an atomic weight vector. These AOs are classified in four groups, i.e. Norms, Means, Statistic Invariants and “Classical Algorithms”. In order to evaluate the usefulness of the proposed MDs in correlation studies, QSAR models for the binding affinity to the corticosteroid-binding globulin of Cramer’s steroid dataset were built using the multiple lineal regression technique. The statistical parameters of these models demonstrate that the indices obtained with AOs other than the summation yield better performance than those obtained using the summation exclusively. Additionally, a comparison between the statistics for the best model obtained using the AOs strategy and those reported in the literature reveals superior performance for the former, despite its simplicity. Therefore, it can be concluded that the proposed generalization scheme constitutes a useful tool to be taken into account in different chemo-informatics tasksItem Towards Better BBB Passage Prediction Using an Extensive and Curated Data Set(Molecular informatics, 2015) Brito-Sanchez, Yoan; Marrero-Ponce, Yovani; Barigye, Stephen J.; Perez, Carlos Morell; Cherkasov, ArtemIn the present report, the challenging task of drug delivery across the blood-brain barrier (BBB) is addressed via a computational approach. The BBB passage was modeled using classification and regression schemes on a novel extensive and curated data set (the largest to the best of our knowledge) in terms of log BB. Prior to the model development, steps of data analysis that comprise chemical data curation, structural, cutoff and cluster analysis (CA) were conducted. Linear Discriminant Analysis (LDA) and Multiple Linear Regression (MLR) were used to fit classification and correlation functions. The best LDA-based model showed overall accuracies over 85 % and 83 % for the training and test sets, respectively. Also a MLR-based model with acceptable explanation of more than 69 % of the variance in the experimental log BB was developed. A brief and general interpretation of proposed models allowed the estimation on how ‘near’ our computational approach is to the factors that determine the passage of molecules through the BBB. In a final effort some popular and powerful Machine Learning methods were considered. Comparable or similar performance was observed respect to the simpler linear techniques. Most of the compounds with anomalous behavior were put aside into a set denoted as controversial set and discussion regarding to these compounds is provided. Finally, our results were compared with methodologies previously reported in the literature showing comparable to better results. The results could represent useful tools available and reproducible by all scientific community in the early stages of neuropharmaceutical drug discovery/development projects.Item Trends in Information Theory-based Chemical Structure Codification(Molecular diversity, 2014) Barigye, Stephen J.; Marrero-Ponce, Yovani; Pérez-Giménez, Facundo; Bonchev, DanailThis report offers a chronological review of the most relevant applications of information theory in the codification of chemical structure information, through the so-called information indices. Basically, these are derived from the analysis of the statistical patterns of molecular structure representations, which include primitive global chemical formulae, chemical graphs, or matrix representations. Finally, new approaches that attempt to go “back to the roots” of information theory, in order to integrate other information-theoretic measures in chemical structure coding are discussed.