Prediction of melting points of a diverse chemical set using fuzzy regression tree

Author

Department of Chemistry, Mahshahr Branch, Islamic Azad University, Mahshahr, Iran

Abstract

The classification and regression trees (CART) possess the advantage of being able to handle
large data sets and yield readily interpretable models. In spite to these advantages, they are also
recognized as highly unstable classifiers with respect to minor perturbations in the training data.
In the other words methods present high variance. Fuzzy logic brings in an improvement in these
aspects due to the elasticity of fuzzy sets formalism. ACS, which is a meta-heuristic algorithm
and derived from the observation of real ants, was used to optimize fuzzy parameters. The
purpose of this study was to explore the use of fuzzy regression tree (RT) for modeling of
melting points of a large variety of chemical compounds. To test the ability of the resulted tree, a
set of approximately 4173 structures and their melting points were used (3000 compounds as
training set and 1173 as validation set). Further, an external test set contains of 277 drugs were
used to validate the prediction ability of the tree. Comparison the results obtained from both trees
showed that the fuzzy RT performs better than that produced by recursive partitioning procedure.

Keywords


[1] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees, Wadsworth,
Monterey, 1984.
[2] R. Jang, Neuro-Fuzzy and Soft Computing, Prentice Hall, NJ, 1997.
[3] S. Izrailev, D. Agrafiotis, J. Chem. Inf. Comput. Sci. 41 (2001) 176-180.
[4] V. Zare-Shahabadi, F. Abbasitabar, J. Compt. Chem. 31 (2010) 2354-2362.
[5] M. Shamsipur, V. Zare-Shahabadi, B. Hemmateenejad, M. Akhond, J. Chemometrics 20 (2006) 146-
157.
[6] S.H. Yalkowsky, S.C. Valvani, J. Pharm. Sci. 69 (1980) 912-922.
[7] S.H. Yalkowsky, J. Pharm. Sci. 70 (1981) 971-973.
[8] Y. Ran, S.H. Yalkowsky, J. Chem. Inf. Comput. Sci. 41 (2001) 354-357.
[9] A. Gavezzotti, J.Chem. Soc., Perkin Trans. 2 (1995) 1399-1404.
[10] M. Karthikeyan, R.C. Glen, A. Bender, J. Chem. Inf. Model. 45 (2005) 581-590.
[11] J.C. Dearden, Sci. Total Environ. 109/110 (1991) 59-68.
[12] M. Charton, B. Charton, J. Phys. Org. Chem. 7 (1994) 196-206.
[13] A.R. Katritzky, U. Maran, M. Karelson, V.S. Lobanov, J. Chem. Inf. Comput. Sci. 37 (1997) 913-
919.
[14] M. Charton, J. Comput.-Aided Mol. Des. 17 (2003) 197-209.
V. Zare-Shahabadi / J. Iran. Chem. Res. 4 (2011) 97-103
103
[15] C. A. Bergstrom, U. Norinder, K. Luthman, P. Artursson, J. Chem. Inf. Comput. Sci. 43 (2003)
1177-1185.
[16] L. Ma, C. Cheng, J. Chemom. 16 (2002) 75-80.
[17] K. J. Burch, E. G. Whitehead, J. Chem. Eng. Data 49 (2004) 858-863.
[18] R.K.H. Galvão, M.C.U. Araujo, G. E. José, M.J.C. Pontes, E.C. Silva, T.C.B. Saldanha, Talanta 67
(2005) 736-740.
[19] M. Dorigo, Optimization, Learning and Natural Algorithms, Ph.D. Thesis, Politecnico di Milano,
Italy, 1992.
[20] M. Dorigo, T. Stutzle, Ant Colony Optimization, MIT Press, New York, 2004.
[21] M. Shamsipur, V. Zare-Shahabadi, B. Hemmateenejad, M. Akhond, QSAR & Comb. Sci. 28 (2009)
1263-1275.
[22] V. Zare-Shahabadi, Ant colony optimization and its applications in analytical chemistry, Ph.D.
Thesis, Shiraz University, Shiraz, Iran, 2008.