This preprint can also be viewed on ChemRxiv.
Hydrogen bonding is one of the most important non-covalent interactions, and hydrogen bonds frequently comprise key structural elements in biological systems or pharmaceuticals.1–3 In medicinal chemistry, tuning the strength of hydrogen-bond donors and acceptors can modulate P-glycoprotein transport, efflux ratio, lipophilicity, and blood–brain barrier permeability.4–7 For this reason, quantifying the strength of individual hydrogen bonds and understanding how changes to molecular structure affect the strength of individual hydrogen bonds is important in rational molecular design.
Despite the ubiquity of hydrogen bonding in biological systems and medicinal chemistry, the relative strength of different hydrogen-bond donors and acceptors is much less appreciated than other key structural quantities like pKa, bond-dissociation energy, or torsional preferences. Most frequently, the Brønsted acidity or basicity of a given functional group is used as a proxy for its hydrogen-bonding strength. Although this relationship can be valid along a narrow structural series, in general there is "little to no relationship"8 between a molecule's Brønsted acidity and its hydrogen-bond-donor or acceptor characteristics.8–10
The strength of different hydrogen-bond acceptors can be measured experimentally by measuring the association constant with a model hydrogen-bond donor in a given solvent.8–11 Taking the base-10 logarithm of this association constant gives pKBHX, the strength of the hydrogen-bond acceptor. While pKBHX values can be measured for any combination of hydrogen-bond donor, hydrogen-bond acceptor, and solvent, for "a mixture of historical and technical reasons" (Laurence et al.11) 4-fluorophenol is typically used as the hydrogen-bond donor and carbon tetrachloride is typically used as the solvent. Hydrogen-bond-acceptor strengths measured under these conditions correlate with pKBHX values measured with other hydrogen-bond donors or in different solvents, although exceptions exist: for instance, pKBHX values measured with 4-fluorophenol are poorly predictive for C–H hydrogen-bond donors.11
Here, we follow precedent & existing datasets and consider only pKBHX values measured with 4-fluorophenol in carbon tetrachloride. These pKBHX values typically vary between about −1 and 5—weak hydrogen-bond acceptors like alkenes have a pKBHX value between −1 and 0, while strong hydrogen-bond acceptors like amides have pKBHX values between 2 and 2.5, and only very strong hydrogen-bond acceptors like N-oxides have pKBHX values greater than 3 (Figure 1).
Figure 1: Archetypal pKBHX values for various functional groups; this figure depicts only general trends, and exceptions exist in all cases. pKBHX values were measured using 4-fluorophenol as the hydrogen-bond donor and carbon tetrachloride as the solvent.
While hundreds of pKBHX values have been experimentally measured under these conditions over the past three decades,11,12 there is still a need to compute hydrogen-bond-acceptor strength for new organic compounds. Experimental pKBHX measurements produce a single measure of hydrogen-bond-acceptor strength for the entire molecule, while rational design of functional molecules typically requires atom-level quantification of hydrogen-bond basicity. The strength of different hydrogen-bond acceptors is quite sensitive to the electronic properties of the acceptor, making it crucial to correctly account for the specific substitution of the molecule in question. For this reason, the efficient and accurate prediction of hydrogen-bond basicity has been a topic of active research in computational chemistry for decades.6,13–21
Since the 1990s, researchers have found that the minimum electrostatic potential in the region of lone pairs (Figure 2) can be used to predict the strength of various hydrogen-bond acceptors, with different scaling parameters employed for each functional group.6,13,22,23 Nevertheless, the computational cost and complexity of these workflows has limited their broader adoption in rational molecular optimization. Here, we report an efficient workflow for the -based automated prediction of hydrogen-bond basicity for unseen organic molecules. Rowan's pKBHX prediction workflow uses neural network potentials to accelerate geometry optimization, requires only a single low-cost DFT calculation per molecule, and gives high-accuracy results across a wide set of structurally varied molecules and medicinally relevant scaffolds. We illustrate the utility of these predictions with a variety of case studies drawn from the medicinal chemistry literature, and show that this same strategy can be applied to the prediction of hydrogen-bond-donor strength.
Figure 2: Visualization of the −0.04 electrostatic potential isosurface at the r2SCAN-3c level of theory24 for a multifunctional druglike molecule in Rowan.25 Values of are obtained by numerical minimization within the regions illustrated.
Rowan's hydrogen-bond-basicity workflow begins by running a conformer search on the input molecule using the ETKDG algorithm as implemented in RDKit.26–28 Following initial conformer generation and MMFF94 optimization,29 we filter the resulting conformational ensemble using the CREST screening protocol and GFN2-xTB energies to de-duplicate structures and remove high-energy conformers.30,31 A 2% rotational constant threshold, a 0.25 Å RMSD similarity threshold, and a 50 kcal/mol energy cutoff window are employed for CREST screening.
We score & optimize the output conformers with the AIMNet2 neural network potential32 and use the lowest energy conformer for subsequent calculations. Following Peter Kenny, we predict the basicity of each hydrogen-bond acceptor by computing around the hydrogen-bond-accepting atom (using the low-cost "Swiss-army knife" r2SCAN-3c method)24 and linearly scaling these potentials to match experimental pKBHX values.6 Electrostatic potential minima were located by numerical minimization of the electrostatic potential (computed via the ESPPropCalc object in Psi4)33 with the BFGS algorithm in Scipy.34
All density-functional-theory computations were conducted with Psi4 1.9.1.33 (The default settings in Psi4 were modified somewhat: a (99,590) integration grid with "robust" pruning, the Stratmann–Scuseria–Frisch quadrature scheme was employed,35 and an integral tolerance of 10−14 was used throughout.36) Density fitting was employed for all calculations and a level shift of 0.10 Hartree was applied to accelerate SCF convergence.37 AIMNet2 geometry optimizations were run using geomeTRIC 1.0.2.38
To find values for the scaling constants, we employed Kenny's 2020 database of experimentally measured pKBHX values,12 and predicted the measured per-molecule pKBHX values by combining the predicted pKBHX values from each distinct electrostatic potential minimum (up to three per hydrogen-bond acceptor, following Equation 3 from Kenny et al.6). A total of 434 molecules were employed after excluding those for which 3D conformations could not be generated using ETKDG. We used Levenberg–Marquardt least-squares fitting39,40 as implemented in LMFIT41 to solve the resulting non-linear optimization problem.
Functional Group | Number | Slope () | Intercept | MAE | RMSE |
---|---|---|---|---|---|
Amine | 171 | -34.4386 | -1.4884 | 0.212 | 0.324 |
Aromatic N | 71 | -52.8126 | -3.1376 | 0.113 | 0.150 |
Imine | 28 | -48.4007 | -2.3309 | 0.180 | 0.236 |
Nitrile | 28 | -50.1167 | -3.2273 | 0.144 | 0.198 |
N-oxide | 16 | -74.3261 | -4.4159 | 0.455 | 0.589 |
Chalcogen oxide | 17 | -47.7009 | -2.2794 | 0.186 | 0.224 |
Pnictogen oxide | 16 | -61.1141 | -3.3839 | 0.437 | 0.549 |
Carbonyl | 128 | -57.2911 | -3.5271 | 0.160 | 0.208 |
Ether/hydroxyl | 99 | -35.9245 | -2.0338 | 0.188 | 0.239 |
Thiocarbonyl | 10 | -51.8837 | -2.2649 | 0.330 | 0.384 |
Divalent S | 17 | -39.1666 | -2.1243 | 0.086 | 0.127 |
Aromatic O | 11 | -35.9245 | -2.0338 | 0.125 | 0.158 |
Fluorine | 23 | -16.4441 | -1.2540 | 0.202 | 0.276 |
Total | 434 | 0.188 | 0.270 |
Table 1: Summary of number of data points, scaling slope, scaling intercept, mean absolute error (MAE), and root mean squared error (RMSE) for each functional group.
Our final workflow gives a mean error of approximately 0.19 pKBHX units (Table 1), comparable to previously reported methods.6,42 Most of the outliers are bulky amines (Figure 3): electrostatic potential–based predictions seem to overestimate the basicity of these sites, since bulky groups will block the approach of hydrogen-bond donors like 4-fluorophenol while minimally perturbing the electrostatic potential of the lone pair. For instance, triisopropylamine has a predicted pKBHX value of 1.39 units, but an experimental pKBHX value of 0.30 units, likely due to the bulky substituents preventing access to the central nitrogen.
Figure 3: Plot of predicted vs. experimental pKBHX values for compounds in Kenny's 2020 pKBHX database.12 Dotted lines indicate errors of ±0.5 pKBHX units.
This sequence of steps avoids time-consuming geometry optimizations at the DFT level of theory, instead using the AIMNet2 neural network potential to optimize structures and running a single low-cost r2SCAN-3c calculation on the final minimum. Predictions can be run for the entire dataset above in a few hours on a conventional laptop, and even 50-atom systems run in fewer than 10 minutes. The above hydrogen-bond-acceptor-strength-prediction workflow has been implemented in the Rowan cloud computational-chemistry platform,25 making it simple to submit calculations and quickly visualize the relative strength and position of different hydrogen-bond acceptors (Figure 4).
Figure 4: Visualization of predicted pKBHX sites and values for an example small molecule through the Rowan platform.25
We next investigated whether the above workflow could be modified to simultaneously predict hydrogen-bond acidity (pKα). Hydrogen-bond-donor sites have no such associated extrema on the electrostatic-potential surface, making naïve extension of the above -based strategy to hydrogen-bond donors impossible. However, previous work by Peter Kenny showed that the electrostatic potential at a fixed distance from the donor hydrogen atom could be linearly scaled to match experimental hydrogen-bond-donor-strength values, prompting us to investigate including this approach within our existing workflow.23
Following Kenny, we compute the characteristic electrostatic potential Vα for a given hydrogen-bond-donor motif X–H at a point 0.55 Å past the hydrogen along the bond axis, following precedent from Kenny.23 This electrostatic potential is converted to a per-donor hydrogen-bond-acidity value through linear scaling, and per-donor hydrogen-bond acidities are combined into a single per-molecule hydrogen-bond-acidity value that can be compared to experimental pKα values.
The above protocol has only two tunable parameters: the slope m and intercept b used to scale Vα. To obtain values for these parameters, we employed Kenny's dataset of 41 association constants between hydrogen-bond donors and N-methylpyrrolidone in 1,1,1-trichloroethane. Levenberg–Marquardt least-squares fitting39,40 yielded values of 42.18 for m and -12.46 for b, in rough agreement with constants obtained by Kenny,23 and a good fit to experimental data (Figure 5).
Figure 5: Plot of predicted vs. experimental pKα values for compounds in Kenny’s 2009 pKα database.23 Dotted lines indicate errors of ±0.5 pKα units.
Here, we demonstrate how hydrogen-bond-basicity and acidity prediction can be used to gain chemical insight and motivate the design of molecules with better potency or biophysical properties. The original compound numbers from each paper have been kept for easy reference back to the original publication, with a prefix added for disambiguation.
The below discussion assumes some familiarity with terminology from medicinal chemistry. Briefly: medicinal chemists commonly rely on parameters such as IC50, EC50, logD7.4, permeability, and efflux ratio to guide the optimization of pharmacokinetic and pharmacodynamic properties.43 The IC50 value quantifies the concentration of a compound needed for half-maximal inhibition of a specific enzyme or pathway, while the EC50 is the concentration at which half the maximal effect (e.g., in a cellular or functional assay) is observed—lower values of either indicate higher potency or efficacy. LogD is the distribution coefficient (often measured at a specific pH, such as 7.4) between n-octanol and water: higher values indicate that a compound is more lipophilic, while lower values indicate that a compound is more hydrophilic.43
The efflux ratio, often measured in polarized cell monolayer assays such as the human Caco-2 cell line, captures how readily a compound is transported out of cells. In such assays, the apparent permeability from the apical to basolateral side () is compared to the permeability in the reverse direction ().43 A high efflux ratio suggests active pumping by transporters like P-glycoprotein (P-gp), an ATP-dependent efflux transporter. Mean residence time refers to the average time a drug molecule remains in the body—from the moment it enters circulation until it is completely eliminated—providing a useful measure of how long the compound stays available for therapeutic effect. In most drug discovery campaigns, scientists aim to increase permeability and decrease efflux while maintaining high on-target potency.43
In 2018, Sébastien Degorce and co-workers embarked on a campaign to optimize the biophysical properties of IRAK4 inhibitors, with the goal of identifying potent and orally bioavailable inhibitors as potential treatments for diffuse large B-cell lymphoma.44 As a part of this effort, scaffold hopping between different pyrrolopyrimidine and pyrrolotriazine cores was investigated, with the goal of increasing lipophilicity and lowering the hydrogen-bond basicity of the scaffold. The authors found that moving from the original pyrrolopyrimidine scaffold (exemplified by A.14) to an isomeric pyrrolopyrimidine (A.17) decreased permeability and lipophilicity, despite removing one hydrogen-bond donor (Figure 6).
We investigated these systems with the -based pKBHX workflow described above, and found that the pyrimidine ring in A.17 was predicted to show significantly increased hydrogen-bond basicity. The basicity of the pyrimidine increased by pKBHX units, thus offsetting any potential gains in lipophilicity. In contrast, pyrrolotriazine A.20 showed dramatic improvements in permeability and efflux ratio, which can be attributed to the computed 0.7 pKBHX-unit decrease in the hydrogen-bond basicity of the aromatic nitrogen. The order-of-magnitude increase or decrease in hydrogen-bond-acceptor-strength induced by single-atom changes underscores the utility of prospective pKBHX prediction in silico.
Figure 6: Pyrrolopyrimidine and pyrrolotriazole cores investigated by Degorce and co-workers. Basic nitrogens are labeled in green, with the corresponding pKBHX values computed by Rowan next to them. For experimental assay details, refer to Degorce et al.44
Bernard Barlaam and co-workers modified a kinase inhibitor (B.1) to achieve high selectivity for the PI3Kα kinase.45 Based on structural data, the authors hypothesized that introducing a nitrogen at "N1" (Figure 7) on the five-membered ring might induce lone-pair repulsion with several carbonyl groups found in other kinase binding pockets and thus lower the compound's affinity for these "anti-targets." A variety of azaarene linkers were tested against PI3Kα and the kinase anti-target KDR, and the addition of a nitrogen did indeed increase the selectivity for PI3Kα (Figure 7). Nitrogens that were better hydrogen-bond acceptors were found to be more selective: strong hydrogen-bond acceptors B.4 and B.6 bound to KDR with low potency, while weaker hydrogen-bond acceptors like B.2, B.3, and B.5 bound to KDR with high potency. Ultimately, triazole B.4 best balanced potency against PI3Kα with inter-kinase selectivity and was selected for further optimization.
Computed hydrogen-bond-acceptor strengths correlate well with the observed KDR activity: azaarenes with N1 pKBHX values greater than 1 have KDR IC50 values above 1 μM, while azaarenes with N1 pKBHX below 1 have have KDR IC50 values in the high nanomolar regime. If further scaffold exploration was necessary, computed pKBHX values could be used to rapidly screen out compounds that were likely to competitively inhibit KDR.
Figure 7: Kinase inhibitors investigated by Barlaam and co-workers.45 Basic nitrogens are labeled in green, with the corresponding pKBHX values computed by Rowan next to them. Hydrogen-bond-acceptor-strength calculations were run on the abbreviated structure shown in the box. For experimental assay details, refer to Barlaam et al.45
Andreas Roecker and co-workers investigated a series of pyrazole-based compounds as glucosylceramide synthase inhibitors, with the goal of identifying a CNS-penetrant compound capable of reducing glycosphingolipid accumulation often seen in Parkinson's disease.46 High-throughput screening and direct-to-biology screening of c. 1300 compounds resulted in a highly potent lead compound that suffered from high P-gp transport. Performing a scaffold modification to replace a pyrazolyl amide (C.10) with a pyrazolyl urea (C.11) dramatically decreased efflux and extended the residence time in rat studies. Further optimization efforts around the pyrazolyl urea scaffold resulted in the development of several candidates suitable for in vivo testing.
Hydrogen-bond-basicity calculations can be used to rationalize these effects (Figure 8). Switching from amide to urea reduces the hydrogen-bond basicity of the key carbonyl group by approximately an order of magnitude, consistent with the decreased efflux and increased lipophilicity observed. Strong correlations between P-gp efflux and hydrogen-bond basicity have been reported previously in the literature.5,47
Figure 8: Glycocosylceramide synthase inhibitors investigated by Roecker and co-workers.46 The basic oxygen is labeled in green, with the corresponding pKBHX values computed by Rowan nearby. For experimental assay details, refer to Roecker et al.46
In the process of developing poly-ADP ribose polymerase (PARP) inhibitors as potential anti-cancer agents,48 Chungang Gu and co-workers studied how modulating the strength of hydrogen-bond acceptors could be used to optimize the bioavailability of PARP inhibitors.49 They found that tuning the electronic properties of a key triazole ring was able to simultaneously modulate binding affinity, permeability, and efflux ratio (Figure 9). Compound D.15 maintained potency while dramatically increasing permeability and decreasing efflux (Table 2), and was selected for further in vivo studies.48
pKBHX predictions can be used to interpret these experimental results. Compound D.1, the strongest hydrogen-bond acceptor, has pKBHX values of and on the two triazole nitrogens (Table 2). Adding additional fluorines to the compound reduces the basicity of the triazole nitrogens, which makes the compound more permeable and less susceptible to efflux—compound D.15, which has a difluoroethyl group, preserves the activity of the parent compound while dramatically improving its biophysical properties. Surprisingly, moving from difluoroethyl to trifluoromethyl causes a fourfold decrease in EC50, a decrease in permeability, and an increase in efflux ratio (Table 2), contrary in all three cases to what simple trends based on pKBHX would predict. Activity cliffs like this are not uncommon in chemical structure–activity relationships, illustrating the limitations in using any single descriptor to predict compound performance.
Figure 9: General scaffold of the PARP inhibitors investigated by Gu and co-workers.49 Key basic nitrogens are labeled in green. Hydrogen-bond-acceptor-strength calculations were run on the abbreviated structure shown in the box.
Compound | R | EC50 (µM) | Papp(A→B) (10-6 cm/s) | Efflux Ratio | pKBHX N1 | pKBHX N2 |
---|---|---|---|---|---|---|
D.1 | C(H)Me2 | 0.052 | 2.0(1.5) | 44(37) | 2.38 | 2.47 |
D.14 | C(F)Me2 | 0.043 | 5.9(2.4) | 7(3.0) | 1.90 | 2.07 |
D.15 | C(Me)F2 | 0.053 | 15.6(2.7) | 2(0.8) | 1.48 | 1.71 |
D.12 | CF3 | 0.216 | 6.7(2.4) | 6(2.3) | 1.40 | 1.55 |
Table 2: Summary of R-groups, (), permeability (), efflux ratio, and pKBHX values. For experimental assay details, refer to Gu et al.49
In 2014, Walvoord and co-workers used a colorimetric molecular sensor to assay the strength of various hydrogen-bond-donor catalysts. They were able to measure binding free energy values for a variety of catalysts and demonstrated that these hydrogen-bond-donor-strength measurements correlated with observed rate acceleration in Diels–Alder reactions.50
We used the Walvoord dataset to test the ability of our computational hydrogen-bond-acidity workflow to reproduce hydrogen-bond-donor-strength values recorded under different experimental conditions. To simplify comparisons and eliminate the need to consider cooperative binding, we restricted the set of compounds studied to those containing only a single active hydrogen-bond donor. The predicted hydrogen-bond-acidity values showed excellent correlation with experimental binding free energies across a set of 16 compounds (Pearson correlation of 0.93) suggesting that the electrostatic-potential-based method employed here can be used as a generalizable probe of hydrogen-bond-donor strength (Figure 10).
Figure 10: Plot of predicted pKα values and experimental binding free energies for neutral single hydrogen-bond-donor compounds from Walvoord’s 2014 database.50 The dotted line indicates the line of best fit.
Improved scientific tools are a key driver of scientific progress.51 While computational methods for prediction of hydrogen-bond-acceptor strength were first reported over three decades ago,13 the broader utilization of these tools has been impeded by the lack of a workflow usable by non-computational scientists. Here, we address this lacuna through the development of a robust and efficient algorithm for pKBHX prediction. Our hydrogen-bond-basicity-prediction workflow predicts per-site pKBHX values directly from 2D- or 3D-representations of molecules, with no additional input required, and provides insight into otherwise opaque structural features of lead molecules, as illustrated by examples from the medicinal chemistry literature. We anticipate that this approach will be a valuable addition to the armamentarium of tools available to the modern drug hunter, and hope that this software can be more closely integrated into lead optimization programs in the future.
C.C.W. thanks Gilles Ouvry for initial discussions on this topic & for suggesting the papers discussed in case studies B and D, Ari Wagen for designing and implementing the visuals shown in Figures 2 & 4, and Ari Wagen, Jonathon Vandezande, Eugene Kwan, Hayden Sharma, Marcus Sak, Jonathan Wong, Lewis Martin, Gilles Ouvry, Harry Stern, Michael Nielsen, & an anonymous industry scientist for feedback on drafts of this manuscript.