QSAR of Lipid-Coated Virus Inhibition by Fatty Acids and Monoglycerides Using Electrotopological Descriptors

E-State QSAR models were developed for the inhibition of a lipid-coated virus with a set of 19 fatty acids (FA) and their monoglycerides (MG). The FA and MG were also divided into two subsets to improve correlations, the saturated (n = 12) and the unsaturated set (n = 7). QSAR of the sets yielded better correlation statistics; the saturated FA and MG gave r 2 = 0.87, s = 0.26 while the QSAR of the unsaturated FA and MG gave r 2 = 0.9999, s = 0.0041. The combined set model however yielded only r 2 = 0.81, s = 0.32. Statistically satisfactory four and five variable models for the saturated and the unsaturated respectively and three variable model for the combined set FA and MG were developed. Structure interpretation is given for each variable emphasizing their effects on the resulting viral inhibition (pIC values). A leave-one-out method (LOO) was applied to investigate the predictive quality of each of the QSAR equation. This predictive quality of the multiple regression equation will be useful in the prediction of new potential antiviral compounds against lipid-coated virus. This paper hopes that the study could also be applied to SARS-CoV-2 virus, which is also a lipid-coated virus.


INTRODUCTION
Our country is faced with a great problem of eliminating the new coronavirus SARS-CoV-2.While vaccines have arrived, the arrival of enough supply to protect the whole population is very slow even to just the level that herd immunity could be felt.Drugs such as remdesivir, hydroquinone and steroid drugs like dexamethasone have been recommended for use to alleviate the suffering of those affected by the virus.But the plague seems to be a mythical hydra whose octopus-likes arms spring back every time they are cut, proliferating whenever and wherever the population relax their guard.We have therefore to find our own solution in inhibiting the proliferation of the Volume 32, Number 2, July 2021 • KIMIKA virus one way or the other as it appears from place to place while we wait for the vaccine.Even the drugs recommended are not easily available to the majority of our citizens.There is a need to try any solution that is possible like searching for already known safe drugs or molecules that may possess anti-coronavirus properties.
The objective of this paper is to provide a means of finding new or known molecules that potentially have antiviral properties and can be directly applied where the virus concentrates.The search could be facilitated if we know molecules that have been studied against a similar virus, like for example, the case of fatty acids (FA) and monoglycerides (MG) that have been reported to attack lipid-coated virus resulting to viral inhibition.Fortuitously, the coronavirus SARS-Cov-2 belongs to a class of lipid-coated virus that proliferates from person to person by penetrating and then assuming the same double layer lipid membrane of their host's cell as their own.Because of this behavior, the FA and MG, we are talking about, can block the virus at the first stage of entry, by intercalating with the lipid membrane of the virus, thus, inducing changes in the lipid coating and dampening the virus infectivity even without lysis [O'Donnell et al., 2020].While there is not yet a report of a direct study on the SARS-CoV-2 viral inhibition by these FA and MG, we can take advantage of their anti-viral activity related to their structure and develop a kind of a SAR (structure activity relationship).We can use the same SAR in searching for other antiviral compounds that may be appropriate for particular applications against the SARS-CoV-2 virus, for example, as component of mouth wash [O'Donnell et al., 2020;Carrouel et al., 2021); Meiller et al., 2005], nasal spray [Hilmarsson et al., 2013;Schroder and Svenson, 1999] or simple inhaler [Patne et al., 2020 ].A study in the 1970's showed that FA and MG of 16 -18 carbon chain length are highly effective in vitro, reducing survival of Herpes simplex virus to around 50% at micromolar concentrations [Sands, Auperin et al., 1979].Thormar et al., [1987] have reported a more or less complete anti-viral activity of FA and MG against the lipid-coated virus VSV (Vesicular stomatitis virus), HSV-I (Herpes simplex virus) and VV (visna virus).This antiviral activity is specific only to lipid coated virus (other known lipid coated viruses are the well-known SARS, MERS and H1N1 and of course the SARS-CoV-2 virus.These FA and MG were found to destroy the lipid coating envelope of the VSV, HSV, and VV virus leading to leakage and then complete disintegration of the envelope and viral particles.They thus affect cell lysis followed by death of the virus [Thormar et al, 1987].Although there are other proposed mechanisms, there is evidence, as other authors reported that the same process of inhibition by FA and MG happens with the different lipid coated virus [Pfaender et al., 2013]. In this paper we shall use topological methods to provide an excellent basis for development of models for viral inhibition more particularly by FA and MG with the intention of finding and predicting new candidate molecules possessing viral inhibition property.These topological models will directly give us specific structure information to guide the design of each molecule that are antiviral and suitable for a desired application.Topological structure descriptors will serve as the basis for structural representation in the model.These electrotopological indices have been used in model equations for predicting physico-chemical and biological property of molecules.The topological structure representation approach has been used successfully in QSAR [Contrera et al., 2005;Maw and Hall, 2000;Hall and Vaughn, 1997;Abou-Shaaban et al., 1996] as well as physico-chemical QSPR [Hall and Story, 1996;Hall and Story, 1997;Votano et al., 2004;Huuskonen, 2001] data collection.Topological approach clearly indicates their usefulness in drug design and property estimation.The method based on topological structure representation is considered less time-consuming and hence less costly to use for prediction than the 3-D modeling [Rose and Hall, 2003].This topologically based method can lead us to an easy computation of pIC values (viral inhibition) for candidate molecules and thus can act as a preliminary antiviral bioassay.The QSAR modeling will be followed by a statistical validation such as, the leave-one-out (LOO) statistics that will be used as a means of indicating predictive ability.

METHODS
Data Input.Viral inhibition data have been measured by Thormar et al. [1987] and expressed here from the molar values to its negative logarithm, pIC = -log(1/C).See Table 1 for the molecular representations and the experimental pIC values.The available combined data (n = 19) of FA and their MG were then subdivided into two sub-sets, the saturated (n = 12) and unsaturated (n = 7) FA and MG respectively to improve the SAR relationships.

Validation Studies. Leave-one-out method.
To demonstrate that the model is useful for prediction we carried out a leave-one-out method (LOO) where one equation is left out from a set and a multiple regression was done on the reduced set (Hall et al., 1991).The resulting coefficients from the regression were then substituted to the left -out equation to give the predicted pIC values.This was done X times as the number of the equations in a set.
Randomization Study.To determine the possibility that the model is no different from a simple random model, the pIC values were scrambled 10 times.For each randomized model, the r 2 and F values were listed and then averaged.The results were then compared with the model.

RESULTS
The QSAR models developed for the combined set (n = 19) FA and MG, the saturated set (n=12) and the unsaturated set (n = 7) are given in the following equations.Quantities in parentheses are the standard deviations of the coefficients.The variables in the models (eqns.1-3) are statistically independent (their individual correlation coefficients with each other are above r 2 = 0.80).The summary data for the LOO method are listed on Table 5 -7.For the combined data set, no residual exceeds two standard deviations (0.31).For the saturated subset, no residual exceeds two standard deviation (0.21) and for the unsaturated subset no residual also exceeds two standard deviation (0.004) except only that of the coefficient of S T (-CH3).Randomization.The independent variable, the pIC values (of the combined set), was randomized using the random function of EXCEL.The randomized data replaced the inhibition data in the data set.When the pIC values were randomly shaffled out ten times, the resulting r 2 values ranged from 0.013 to 0.40 with an average r 2 = 0.16 while the resulting F values ranged from 0.005 to 3.3 with an average F = 1.2

DISCUSSION
QSAR by E-State Modeling.Used in many QSAR studies is the E-State model whose approach is topological in structure representation [Hall et al., 2007].This approach is very appropriate in obtaining useful information about structural features that influence property such as a particular biological activity.Models are expressed in E-State indices in both atom-type and atom-level forms.For structure representation to develop information, hydride groups are considered as atoms for simplicity.To compute for the E-State index of an atom in a molecule the equation below is used.Si is known as the E-State index for an atom or a group of atoms (hydride groups), Ij is the atom's intrinsic state, and Ij is known as the perturbation term.The atom intrinsic state I is expressed as a function of the valence of the atom or group of atoms, over the number of nearest neighboring atoms or groups.Where N = n th row in the periodic table,  v = valence of atom i and  = number of neighboring atoms.
When the atom involved is an element in the second row of the periodic table, N = 2, giving, where rij = the number of atoms in the shortest path between atoms i and j, and Ij = intrinsic state of atom j, and I i = the perturbation term.In this way the atom's E-State index encodes the electronic state and topological structure from all other atoms within the structure.The nearer the neighboring atom to atom i, the stronger its influence on its E-State value, the farther it is by a path of several bonds the smaller its influence.This influence diminishes as the square of the number of atoms in the path.
Similar expression is used for the hydrogen E-State [Molecular Descriptor's Guide, 2008] as in, is KHE = Kier-Hall electronegativity.
When the E-State indices of the same atom or group type are combined in a common skeletal core, they can be used in equation to serve as the E-State descriptors or variables in the multiple regression.The atom-type E-State descriptor then is the sum of the individual atom-level E-State values for a particular atom-type.This descriptor combines three important aspects of structural information, 1) electron accessibility at the atom, 2) presence/absence of the atom and 3) count of the atom.descriptor expresses very similar information as the atom-type E-State descriptor except that accessibility refers to proton accessibility.Similar expression is used for the hydrogen atom type E-State descriptor: In this viral inhibition study, with nineteen straight chain FA and MG, the atom-level E-State and hydrogen E-state indices were used to develop QSAR models along with atom-type descriptors in addition to molecular connectivity  indices.
Interpretation of the models.Combined saturated and unsaturated FA/MG model.The threevariable model eqn. 1 adequately represent the pIC data, based on direct statistics as well as by the validation method.Each of the variables is a descriptor of an aspect of the molecular structure and will be discussed for the three models to indicate the specific structure information encoded.
The 1  v Descriptor.The variable that makes the greatest contribution to calculated pIC is 1  v , a representation of molecular variation [Molecular Descriptor's Guide, 2008].The descriptor 1  v is the first order molecular connectivity valence chi index.The 1  v descriptor (Eqn. 1) contributes 58% on average to the calculated pIC value.The significance of this descriptor is further indicated by the fact that largest pIC values also correspond to largest 1  v values and the smallest pIC values correspond to smallest 1  v values.This structural descriptor provides molecular architecture in formation.The 1  v descriptor decreases with increase skeletal branching but increases with the number of skeletal atoms.Because of its positive coefficient, less branching leads to increase calculated pIC values.
The S T (-CH2-) Descriptor.The second variable in the model is the atom type E-state descriptor for the skeletal carbon atoms of the methylene.The S T (-CH2-), the second significant descriptor encodes the electron accessibility for the carbon in the methylene group.The S T (-CH2-) descriptor contributes 24% on average to the calculated pIC values.Because of the negative coefficient on S T (-CH2-) descriptor (Eqn.1), larger values are related to decrease in pIC values and smaller values are related to increase in pIC values.
In the case of monoglycerides where the -CH2-groups are directly attached to electronegative atoms like oxygen of the hydroxyl groups, the S T (-CH2-) assume large negative values and thus decrease the total S T (-CH2-) values gained by the attached fatty acid, thereby increasing pIC.And, because larger values of S T (-CH2-) descriptor mean increase also in size of the molecule, meaning also increase in the value of the S T ( 1  v ) descriptor, the overall effect is increase in the pIC value due to the larger positive coefficient of the S T ( 1  v ) descriptor.The value of the S T (-CH2-) descriptor also decreases as the number of double bonds increases, which contribute to the increase of pIC values.
The S T (-OH) Descriptor.The third variable in the model is the atom type E-state descriptor S T (-OH) (Eqn.1).It encodes the electron accessibility for oxygen atoms in each molecule.The S T (-OH) descriptor contributes 19% on average to the calculated pIC values.The value of the ST(-OH) descriptor can be divided into two sources, the -OH of the FA and the two -OH groups from the glyceride side of MG.Because of the negative coefficient on the S T (-OH) descriptor, a larger value of this descriptor relates to a diminished value of the pIC.
However, in MG the addition of the glyceride group increases also the size of the molecule thereby contributing to the value of 1  v , which has a larger positive coefficient combined with a decrease in S T (-CH2-) value, which has a negative coefficient, both of these effects are reflected as an overall increase in the value of the pIC.The S T (O=) descriptor is a significant variable in the model, contributing to 59% on the average to the pIC, making it the most significant contributor to pIC (Eqn.2).Because of the positive coefficient on S T (O=) larger values of the positive coefficient are related to larger pIC values.
Increasing length of the alkyl chain also increases slightly the S T (O=) value.Moreover, the substitution of a glyceride group to the molecule provides an additional ester oxygen that renders its  electrons accessible to the carbonyl oxygen thereby increasing largely the positive value of S T (O=) descriptor and thus the pIC value.
The HS T (other) Descriptor.The HS T (other) descriptor is the sum of the hydrogen-atom level E-State indices for all non-polar hydrogen atoms in the molecule (from the hydrogens of the methyl to the methylene hydrogens of the fatty alkyl chain and to those in the glyceride group).This descriptor encodes the structure attributes of hydride groups that are non-polar.Such a descriptor represents parts of the structures that may participate in hydrophobic-like interaction.HS T (other) is the second most significant variable in the model (Eqn.2), contributing to 27% on the average to the calculated pIC.Because of the negative coefficient on HS T (other), smaller values are related to larger pIC values.The values of the HS T (other) descriptor increase with increasing size of the molecule and with the addition of the glyceride group.These increases diminish the pIC values.
The S T (-CH2-) Descriptor.S T (-CH2-) descriptor is the atom-type E-State index for the methylene carbon in the molecule.It encodes the electron accessibility for the methylene carbon atoms in the molecule.The S T (-CH2-) descriptor contributes 10.8% on average to the saturated FA/MG model (Eqn.2).Because of the positive coefficient on S T (-CH2-), larger values are related to larger pIC values.Increasing size of the fatty acid increases the S T (-CH2-) value but decreases with addition of the monoglyceride group affecting the pIC values as discussed above.
The S T (-CH3) Descriptor.The S T (-CH3) descriptor is the fourth variable in the model and is the atom-type E-State descriptor for the carbon atom of the methyl group.S T (-CH3) encodes the electron accessibility for carbon atoms in each molecule.The S T (-CH3) descriptor contributes 12.4% on average to the calculated pIC values.Because of the positive coefficient on the ST(-CH3), larger values are related to larger pIC values.
The unsaturated FA/MG model.The S T (-CH3) Descriptor.The S T (-CH3) descriptor is the atom-type E-State descriptor for the carbon atom of the methyl group in the molecule.S T (-CH3) encodes the electron accessibility for carbon atom of the methyl group in each molecule.The S T (-CH3) descriptor is a significant variable in the model since it contributes 21% on average to the calculated pIC values (Eqn.3).Because of the positive coefficient on the S T (-CH3), larger values are related to larger pIC values.
The HS T (other) Descriptor.The HS T (other) of the unsaturated FA and MG is the sum of the hydrogens described above as in the saturated model except the addition of the vinyl hydrogens from the additional double bond(s).This descriptor encodes the structure attributes of hydride groups that are non-polar.It contributes 25% on the average to the calculated pIC values (Eqn. 3).It is the most significant variable in the model equation.Because of the negative coefficient on HS T (other), smaller values are related to larger pIC values.
The descriptor HS T (other) can be partitioned among three atom-type hydrogen E-State indices: HS T (other) = HS T (Csats) + HS T (Csatu) + HS T (vinyl).These descriptors represent hydrogen atoms in three different environments; HS T (Csats), hydrogens on saturated carbon atoms bonded to saturated carbon atoms; HS T (Csatu), hydrogens on saturated carbon atoms bonded to unsaturated carbon atoms; HS T (vinyl), hydrogens on vinylic carbon atoms.When each of these three variables replaces HS T (other) in the model the correlation, such as, if HST(other) is replaced by HS T (Csatus), r 2 = 0.98 and F = 8.3; if by HS T (Csatu), r 2 = 1.00 and F = 9500; if by HS T (vinyl) , r 2 = 0.87 and F = 1.8.The analysis indicates that HS T (Csatu) of these three regions of carbon skeleton, dominates the HS T (other) descriptor.
The S T (-CH=) Descriptor.The S T (-CH=) descriptor is the atom-type E-State index for the vinyl carbon in the molecule.It encodes the electron accessibility of the vinyl carbon in the molecule.The S T (-CH=) descriptor contributes 12.4% on average to the pIC values.Because of the positive coefficient on the S T (-CH=) descriptor, larger values are related to larger pIC values.Increasing number of double bonds, increases the value of the S T (-CH=) and therefore the pIC also.
The S T (-OH) descriptor.The S T (-OH) descriptor encodes the electron accessibility for oxygen atoms in each molecule.The S T (-OH) descriptor contributes 9.94% on average to the calculated pIC.The value of the S T (-OH) descriptor can be divided into two sources, the -OH of the FA and the two -OH groups in the glyceride of MG.Because of the positive coefficient on the S T (-OH) descriptor, larger values lead to larger pIC values.
The S T (-CH2-) Descriptor.S T (-CH2-) descriptor is the atom-type E-state index for the methylene carbon of the unsaturated molecule.It encodes the electron accessibility for the methylene carbon atom.The S T (-CH2-) descriptor contributes 9.47% on average to the pIC of the unsaturated FA/MG model.Because of the positive coefficient on S T (-CH2-), larger values are related to larger pIC values (Eqn.3).These percent contribution values serve to guide the reader to the calculated significance of each descriptor.For each descriptor, the absolute value of each contribution was taken to compute the percentage contribution.
To summarize, the combined model equation 1 demonstrates that inhibition of lipid coated virus is increased for FA and MG with less skeletal branching, larger size, possession of a glyceride chain, and greater number of double bonds.This model allows the prediction of pIC values for organic compounds and blends the structure information of the two subsets, the saturated and unsaturated FA and MG.It can be noticed however, that some indicators counteract the property of other indicators and the positive or negative sign of the values are not in the direction of the expected pIC of the molecule but when the coefficients are applied the resulting calculation results to the nearest or correct value expected.
Validation Studies.The predictive quality of each of the equations is shown in Tables 5-7.The leave-one-out method applied, clearly seen from Table 5, shows that the coefficients are very stable and the predicted values are quite acceptable for the combined set study.The difference of the average residuals from LOO is essentially the same as in the full regression and is quite acceptable (0.25→0.31).This quality of the combined equation is not seen in the two remaining models where the coefficients are only moderately similar, although acceptable overall.While their correlation coefficients are relatively higher (Table 6-7) and impressive as in the unsaturated set, (Table 7 and Fig. 3), the differences of their residuals are larger specially with the saturated set (0.16→0.48) but not as much as in the unsaturated (0.00→0.10).These are due probably to the reduced size of the samples in each subset models (n = 12 for the saturated and n = 7 for the unsaturated).This leave-one-out method shows how the predictive quality of the regression equation will result in a satisfactory manner if only the number of samples is satisfied.
The independent variables, the pIC values of the combined set (n = 19), were randomized using the random generator of EXCEL.The randomized data replaced the inhibition data in the data set.
When the pIC values were randomly scrambled out ten times, the largest r 2 and F values found are r 2 = 0.40 and F = 3.3 and the average r 2 = 0.16 and the average F = 1.2.When these data corresponding to a random statistics were compared to the model (Eqn. 1) r 2 = 0.81 and F = 21, these results clearly indicate that the model is significantly different from a random model, giving credence for the model based on the topological variables used in Eqn. 1.The same conclusion can also be derived from the randomization of the two subsets.

CONCLUSIONS
For the viral inhibition data set, excellent QSAR models are developed with two E-State structure descriptors and one molecular connectivity chi valence index for the combined set.To improve the correlation coefficient two subsets were introduced, the subset saturated FA and MG, where the QSAR is developed with one hydrogen E-State structure descriptor and three atom type E-State descriptors, and the subset with unsaturated FA and MG, developed with one hydrogen E-State descriptor and four atom type E-State descriptors.Furthermore, cross validation of the QSAR model signifies that the model may be used to make useful prediction for compounds not in the original data set.The structure information encoded in the descriptor is described, indicating significant and specific structure information that may be useful to aid compound design.This encoded structure information may be interpreted directly in terms of structural feature that can be helpful to synthetic chemists; skeletal branching and general molecular size S T ( 1 X v ), presence and polar nature of oxygen atoms, S T (-OH), presence of non-polar hydrogen atoms, HS T (other) and electron accessibility on oxygen atom in the carbonyl group S T (O=).These qualitative statements can be converted into numerical values of pIC by the QSAR model equations.
These findings on important functional groups that the antiviral (at least against lipid-coated virus) should possess for activity, recalls to mind recent developments on the use of antiviral essential oil components against the lipid-coated SARS-CoV-2 virus [Asif et al., 2020;Yadalam et al., 2021;Patne et al., 2020].These essential oils contain monoterpenes [Denison et al., 1995;Astani et al., 2010] that are not far in structure make-up as the FA and MG and can be one of the targets for further study after this paper.Our model equations can be applied to recognize potential antiviral compounds from essential oil components or any other structurally related compounds.This does not prevent us in determining the antiviral activities of several fatty acid derivatives such as the fatty alcohols, fatty acid esters, fatty alcohol esters, and amides.
Our use of the E-State method is very appropriate in this kind of problem because in virus inhibition, as in toxicity or physico-chemical property studies, the dominant force is the noncovalent interaction.Electron accessibility attributes are very significant in this kind of interactions where both the electronic and steric forces are well encoded.In addition, the E-state method is based on the topological representation of molecular structure.E-State method combines valence state electronegativity with skeletal characterization of the molecule.Noncovalent interaction meanwhile involves intermolecular electron accessibility, which is the essence of molecular connectivity principle.Besides, topological method avoids the problems arising from assumption of a partitioning mechanism as well as the problems associated with finding a physico-chemical system to use as the basis for estimating, for example the log P values for each molecule [Rose and Hall, 2003].

ACNOWLEDGEMENT
I would like to acknowledge the help of Dr. Junie Billones of the College of Math and Sciences, UP Manila, for providing the necessary journals needed in a particular aspect of problem met when writing this paper.Thanks are also due to Dr. Eloise Prieto-Valdez of the Molecular Biology and Biotechnology, College of Science, University of the Philippines, Diliman, Quezon City, for a critical review of the original problem before the paper was conceptualized.

Figure 1 .Figure 2 .
Figure 1.Plot of calculated pIC values versus observed pIC values for the combined saturated and unsaturated data set using equation (1) The saturated FA/MG model.The S T (O=) Descriptor.The first variable of the equation, the S T (O=), an atom-type E-State index, encodes the electron accessibility for the carbonyl oxygen in the molecule.Larger values of the positive coefficient are related to larger pIC values in the molecule.

Table 1 . Viral inhibition by incubation with fatty acids and monoglycerides at 37 o C for 30 minutes [Thormar et al., 1987]. Fatty Acids (FA) & Monoglycerides (MG) Inhibition conc. a (mM) pIC b [log(1/C)] Fatty Acids
Before the multiple regression was performed the indices were first paired and correlations were performed to determine if pairs have correlation coefficient greater than 0.80.For each pairing process that produce such correlation coefficient, one of the pair of correlated variables was eliminated before preparing the QSAR equation.Spotting this pair or pairs was easily done using a correlation matrix for easy reference.The E-State indices or variables finally selected are primarily considered to be more easily interpreted in terms of molecular structure.
b Carbon atoms : double bonds.c In molar concentrations.Volume 32, Number 2, July 2021 • KIMIKA T (other): atom type hydrogen E-State for non-polar hydrogen atoms.S T (O=): atom type E-State index for the carbonyl oxygen atom.S T (-CH3): atom type E-State index for the methyl carbon atom S T (-CH=): atom type E-State index for the vinyl carbon atom.