The Bond-Valence Substituent Index for Predicting the Boiling Temperatures of Aliphatic Hydrocarbons

A simple molecular descriptor based on molecular structure for predicting the boiling temperature (BT) of alkanes was developed in this paper. This topological index was used to correlate the boiling temperature of aliphatic hydrocarbons with their bond-valence substituent structure instead of by atom-to-atom branching framework. The predictive power of the bond-valence substituent index (BVSI) was evaluated by comparing it with the popular predictor in literature, the Randic index and the more recently proposed index, the Fi of Manso et al. (2012). The model developed through a second order regression of the plot of the alkane’s boiling temperature versus the BVSI index proved successful in its predictive power such that the method was also applied to a combination of aliphatic hydrocarbons, the alkanes, alkenes, alkynes and cycloalkanes. This topological index provided higher correlation with small deviations compared to the topological index used for comparison. A further study of the BVSI index can be explored for other organic compounds with different functional groups and other physical properties besides their boiling temperatures in the future.


INTRODUCTION
Structural formulas are important guides in describing both the structure and the reactivity of a molecule. At the least, they indicate molecular constitution by showing the links between atoms in a molecule. Structural formulas also give an approximation of electron distribution by using lines that correspond to electron pairs in bonds and dots for unshared electrons. In quantitative terms, molecular formula arranges the relative position of all atoms in a molecule (Carey and Sunberg, 1993).
It was in the second half of the nineteenth century that G. N. Lewis (1916) introduced this concept that led to correct structural formulas deduced for a wide variety of organic compounds. The concept of valence was recognized, i.e. carbon almost always formed four bonds, nitrogen three, The objectives of the second method is to convert by theoretical pathway the chemical information encoded in a model into one or more numbers or molecular descriptors and to establish quantitative relationships between structure and properties; chemical, physical or, biological activities, and other experimental properties.
There are two techniques of explaining complex relationships between molecules and observed quantities: one is to look for relationships between molecular structures and physicochemical properties (QSPR, Quantitative Structure -Property Relationships) and the other is to search for relationships between molecular structures and biological activities (QSAR, Quantitative Structure -Activity Relationships). The main objective is to correlate some important functionality to the molecular structures as closely as possible. Thus, from the pioneering work of Wiener in 1947, followed by Randic and Hosoya in the 1970's QSPR studies were correlated along with molecular descriptors or topological index. From that time on several hundreds of molecular descriptors, mainly two-dimensional (2D) descriptors have been proposed for predicting the values of several physical properties and biological activities through QSPR and through QSAR.
Theoretical molecular descriptors such as the 2D descriptors in general make use of molecular graphs that allow a two-dimensional representation of a molecule known as topological representation. Graph theory is used in constructing these topologies derived from the configuration of molecules from which new concepts are developed. These topological representations, which are non-numerical expressions, are transformed into related numbers, which eventually are known as topological indices, a kind of theoretical molecular descriptor. The numerical basis for topological indices is derived either from the adjacency matrix or the topological distance matrix. These indices separate the molecules according to their size, degree of branching, flexibility, and shape. Size means bulk, mass or volume of the molecular structure while shape means the degree and nature of branching present in the molecule (Randic, 1975).
In organic chemistry, one way of correlating structure with property or reactivity is by studying the effect of the substituent atoms or group of atoms on bonds in a molecule. For example, qualitative polarity of the molecule can be assessed by estimating the dipole moments of the double bonds due to the electron donating or withdrawing effects of the substituents. The boiling points of alkane isomers can be well ordered according to their degree of branching or to the increase or decrease of their surface area. Another example of correlating substituents in this case with reactivity is by the use of the Hammet equation applied mainly to derivatives of benzene (Hammet, 1937). Here, more quantitative values can be derived for certain electronic effects of functional groups attached to the aromatic benzene.
It is the objective therefore of this paper to propose a simple molecular descriptor defined according to the chemical concept of substituent effects on the physical property (and later perhaps on its chemical and biological property) of a molecule by emphasizing the bond and its substituents. For now, this proposal will limit itself to predicting a physical property such as the boiling point of aliphatic hydrocarbons mainly alkanes, and base the correlation on the degree of substitution of each chemical bond with an equation that sums them up to a numerical index for quantitative correlation. This index will be named here as bond-valence substituent index (BVSI).
The topological index most related to the proposed BVSI is the Platt index (1952) which is expressed by the equation . The Platt index is defined as the summation of the number of bonds adjacent to each of the bonds in a molecule , where "deg e" represents the number of edges adjacent to edge f , and f total is the total number of edges of the molecular graph G. However, through the passage of time, the best and most used topological indices were those developed by Wiener in 1947, Hosoya in 1971and Randic in 1975. Kier and Hall (1976 showed that the Randic index has the highest correlation with the boiling points of alkane compounds compared to the former two other indices. More recent paper reported a more improved index than the Randic index proposed by Manso et al. (2012). Hence we will compare here, the Randic index  and the Fi index of Manso et al. with the BVSI to determine BVSI's worth as a molecular descriptor and index.
Inspite of the numerous graph-theoretically derived vertex and edge degree-based topological indices that had been invented and several characterization of their usefulness and applications, the BVSI is here presented with the novelty of introducing the bond-valence concept of predicting the boiling temperature (BT) of the alkanes, which includes how to treat the presence of double and triple bonds. Its predictive power for 137 alkanes is herein analyzed and compared with known predictors through the correlation model and then applied to a combination of alkanes, alkenes, alkynes, and cycloalkanes. The results obtained with the use of BVSI as molecular descriptor is assuring with regards to predicting properties based on the molecular structure of alkanes and related aliphatic hydrocarbons. The proposed index can be described as a summation of the inverse square roots of the bond terms, one plus a fraction, the number of substituents for each bond over the maximum number of substituents of a particular bond or simply the number of its valence electrons. Note that the possible maximum number of valence electrons a bond can have, differs from saturated and unsaturated hydrocarbons as shown above and if heteroatoms such as oxygen, nitrogen and halogens are involved, the same consideration can be applied.
In the calculation of the Randic topological index, the hydrogen-suppressed graph of the molecule is drawn (the hydrogen atoms are assumed to be positioned according to the valence of the carbon). For isopentane the graph has 5 vertices (v1, v2, v3, v4, and v5) and 4 edges (1, 2, 3, 4). The vertices are set of points, which are connected by lines called edges.
The graph for isopetane The number of edges that v1 has in isopentane is called the degree of the vertex, d1. For example, in isopentane v3 has a degree of three, d3. For isopentane, the Randic index  is therefore, The Fi index of Manso et al. (2012) on the other hand, is defined by the equation where Oi is the value of i for a given ordered pair (i, j) and Tj is the value for (jh, jh') with h and h' as any adjacent vertices. Oi is simply the total number of hydrogens present in the structure. Each vertex of the molecular graph is associated to an ordered pair i, j whose sum i + j is invariably equal to 3, ie. I + j = 3. This leads to j = 3 -i. For isopentane the Fi index, where, = ∑ + ∑ , and since, Oi = 12, and Tj = (0 x 2) 1/2 + (2 x 0) 1/2 + (2 x 1) 1/2 + (1 x 0) 1/2 = 1.4142 then, Fi = 12 + 1.4142 = 13.4142

RESULTS AND DISCUSSION
From the start, our aim was to relate a molecular index to the sum of the number of substituents for each bond in the molecule just as Platt (1952) did to predict the boiling temperatures of the alkanes. Later we realized that it would be better to use the fraction of the number of substituents with respect to the potential maximum possible substituents and transformed it into a reciprocal term so that the order of branching of the alkanes would have a direct relationship with the boiling point. And, to avoid overlapping of indices belonging to a different set of isomers we made use of the square root. The number 1 was added last to include the ethane molecule as well as to improve also the correlation. Thus, the original equation looked like as shown below.
This equation is just equation 1 since the number of substituents is simply S and the maximum number of substituents is equal to V.
Since majority of molecular indices are based on graph theory, from where they are called topological index, we provide here a transformation of the BVSI to the nearest form of a vertex degree graph theoretical equation as shown below. In a graph G, the degree of an edge (bond) = di + dj -2 where di and dj are the degrees of the adjacent vertices vi and vj . The maximum number of substituents a bond can have can be found by adding the valences of the two atoms and subtracting the number of electrons that forms the bond (). Thus, V = Zi v + Zj v - and therefore, And because the subject of this paper will involve primarily the alkanes, which are characterized by their C-C single bonds with a total number of bond-valence electrons of 6, the equation 4 can be simplified as equation 5, From the appearance of the equation 5, it can be clearly considered as a vertex degree-based under the category of adjacency-based topological index. Gutman (2013) has reviewed several vertex degree-based topological indices, including the Randic index. He summarized their main properties and provided a critical comparative study based on their prediction of the boiling temperatures (BT) and heat of formation of the octane series. Related vertex degree topological indices he reviewed were the Zagreb indices (Gutman and Trinajstic, 1972;Gutman et al., 1975), Narumi-Katayama and Multiple Zagreb Indices (Narumi and Katayama, 1984;Gutman and Ghorbani, 2012;Klein and Rosenfield, 2010), Atom-based Connectivity Index (Estrada et al., 2008), Augmented Zagreb Index (Furtula et al., 2010), Geometric-arithmetic Index (Vukicevic andFurtula, 2009), Harmonic Index (Fajtlowiez, 1987) and Sum-connectivity Index (Zhou and Trinajstic, 2009). Mohammadinasab (2017) investigated the relationship between the boiling temperature of a series of alkanes and some topological indices and geometrical descriptors using the multilinear regression method. He found out that Wiener, Randic and volume descriptors plays more important role in the description of boiling points of alkanes, in comparison to the other molecular descriptors. Hosamani et al. (2017) studied the QSPR of ten degree based topological indices (see above) and characterized the useful topological indices based on their predictive power against eight physical properties such as boiling points, critical pressure, critical temperature, molar volumes, molar refraction, heats of vaporization, surface tension, and melting points of 67 alkanes from n-butane to nonanes. Hosamani and Shirkol (2019) reported the QSPR of selected distance based topological indices to characterize their usefulness as predictors against the eight physical properties as above.
Here, a comparison of BVSI with Randic and Fi index is demonstrated for the 55 isomers of alkanes picked by Kier and Hall in their book on medicinal chemistry (1976) with the observed boiling point as their physical property for comparison. The objective for this choice of the set of alkanes is to have a set that has been used in comparing the superiority of the Randic index over other indices like the Wiener (1947) and Hosoya (1971) with respect to boiling temperatures of alkanes and is not subjective to the choice of the present authors.  Table 1 contains the 55 isomers of alkanes from C-2 to C-8 combined with their Randic , Fi and BVSI indeces listed against their corresponding observed boiling temperatures (from the alkanes provided by Kier and Hall (1976). The square root of the indices were used since both the Randic and the Fi indices have been found to have more improved linear correlations with the alkanes (Manso et al, 2012). Their linear regressions are plotted on Figs. 1, 2 and 3. Shown also in their respective figures are their correlation coefficients (R 2 ).     Table 2 shows that with respect to the chosen 55 alkanes, the variance of the predicted boiling temperatures by BVSI is smaller than the Randic's but not as much as the Fi. This data shows that the BVSI could be considered as a molecular and structural descriptor for predicting the boiling temperature of alkanes as it compares well with both the values of Fi and the more popular Randic index with respect to their relative correlation coefficients and variances.
However, on application of the same linear regression to more alkanes (additional 82 alkanes) as shown in Figure 4 the plot of the B.T. values against the corresponding BVSI index (Table 3), a very significant deviation was observed at higher temperatures. The correlation coefficient of the plot on Fig. 4 also decreased to R 2 = 0.970. The linear model thus seriously failed to predict the B.T. for alkanes with (BVSI) 1/2 higher than 5.   Table 3. The colored line refers to the data produced from Eqn. 8.
The B.T. predicted by the linear regression of the curve on Fig. 4 is B.T. predicted = 117.6 (BVSI) 1/2 − 168 Thus, to improve the predictive power of the BVSI for the boiling temperature of the 137 alkanes, a second order polynomial regression was applied to the curve on Figure 4. This is shown on  Table 3. The colored line refers to the data produced by the quadratic equation (Eqn. 10).
The B.T. predicted for this quadratic model is, B.T. predicted = -9.229 (BVSI) + 195.9 (BVSI) 1/2 -308.2 (10) The variance of the predicted B.T. over the experimental B.T. turned out to be only 57.62. This is much reduced than the variance taken from the 55 alkanes above (variance = 240.02) and is now almost of the same level as the variance with Fi. Fig. 6 shows the plot of normal boiling temperature (B.T.) predicted by the quadratic equation (Eq. 10) against the experimental data for the alkanes of Table 3. The straight line inserted in the plot obeys the equation y = x, i.e. an "ideal model". Figure 6 shows convincingly that the B.T. can be predicted by Eq. 10 for alkanes within a wide range of normal boiling temperature with low deviation.  (Lide, 2011) for the 137 alkanes listed in Table 3.
In The values of the BVSI indices in Table 4 were calculated as was described earlier with alkanes. The choice is the quadratic model, which has the higher correlation coefficient (R 2 ) and the smaller variance. Fig. 7 shows a plot of the experimental boiling temperatures against the predicted B.T. by the quadratic model using the BVSI index. The predictive power of the index for the aliphatic hydrocarbons such as the alkanes, alkenes, alkynes and cycloalkanes is shown by the straight line obtained from the quadratic plot.  Table 4.
This study thus shows the predictive power of the bond-valence substituent index as a molecular descriptor and as a topological index. It is hoped that this index could be applied to other physical and chemical properties of organic compounds besides the prediction of boiling temperature. This index could also be applied, for example, to their vapor pressure, surface tension, flash point, refractive index, octane index, gas chromatographic Rf and many others. Further studies are expected to proceed using the bond-valence substituent index in the chemical, biological and other technological fields where this index may find its use.

CONCLUSIONS
A simple, easy to calculate molecular descriptor, based on the degree of substitution of the C-C bonds of the molecular structure to predict the boiling temperature of organic compounds from hydrocarbons such as the alkanes and potentially other organic compounds has been proposed in this paper. This topological index emphasizes the contribution of the number of bond substituents to the physical property of the C-C bond and to the whole molecule, a familiar concept to chemists, in addition to the concept of branching from the vertex or atom degree-based explanation of physical property. A sample set (previously used by Kier and Hall [1976] to promote the use of the Randic index) of 55 alkanes containing isomers from ethane to octanes was used to evaluate by comparison the proposed BVSI with the most popular and most cited topological Randic index and the more recently proposed Fi index of Manso et al. (2012) using a linear model. In this comparison, the proposed BVSI model was found to have the highest coefficient of correlation among the Randic index and the Fi index in predicting the boiling temperatures of the 55 alkanes. The molecular descriptor BVSI here was proven to be effective with higher correlation coefficient and smaller (variance) deviation compared to the Randic index but failed when the number of the alkanes was increased to 137. Application of the index using the second order regression to larger number of alkanes (137) improved both its correlation coefficient and greatly decreased the deviation (variance) from the experimental B.T. of alkanes, boosting the predictive power of the BVSI. Application of the same process to a combination of alkanes, alkenes, alkynes and cycloalkanes showed that a potential general quadratic equation model can be derived to predict the boiling temperatures of a combination of aliphatic hydrocarbons. Based on the data provided in this paper, the BVSI proved to be a powerful index in predicting the boiling temperatures of alkanes with relatively small deviations compared to other topological indices found in literature. This topological index can be applied to study other property of interest and the concept worked out in this paper can be applied to other organic compounds. Therefore, the study of BVSI of other organic compounds, in conjunction with other molecular descriptors reported in literature, will be a good contribution to the study of structure and property of organic compounds that would be of interest to materials scientists studying the chemistry and physics of organic materials.