Isothermal Analysis of ThermoFluor Data can readily provide Quantitative Binding Affinities
Isothermal Analysis of ThermoFluor Data can readily provide Quantitative Binding Affinities
Differential scanning fluorimetry, also known as ThermoFluor or Thermal Shift Assay, has become a commonly-used approach for detecting protein-ligand interactions, particularly in the context of fragment screening. Upon binding to a folded protein, most ligands stabilize the protein; thus, observing an increase in the temperature at which the protein unfolds as a function of ligand concentration can serve as evidence of a direct interaction. While experimental protocols for this assay are well-developed, it is not straightforward to extract binding constants from the resulting data. Because of this, DSF is often used to probe for an interaction, but not to quantify the corresponding binding constant. Here, we propose a new approach for analyzing DSF data. Using unfolding curves at varying ligand concentrations, our "isothermal" approach collects from these the fraction of protein that is folded at a single temperature (chosen to be temperature near the unfolding transition). This greatly simplifies the subsequent analysis, because it circumvents the complicating temperature dependence of the binding constant; the resulting constant-temperature system can then be described as a pair of coupled equilibria (protein folding/unfolding and ligand binding/unbinding). The temperature at which the binding constants are determined can also be tuned, by adding chemical denaturants that shift the protein unfolding temperature. We demonstrate the application of this isothermal analysis using experimental data for maltose binding protein binding to maltose, and for two carbonic anhydrase isoforms binding to each of four inhibitors. To facilitate adoption of this new approach, we provide a free and easy-to-use Python program that analyzes thermal unfolding data and implements the isothermal approach described herein.
Differential scanning fluorimetry, also known as ThermoFluor or Thermal Shift Assay, has become an important label-free technique for biophysical ligand screening and protein engineering. Briefly, this method makes use of a dye - typically either SYPRO Orange or one-anilino-eight-naphthalenesulfonate - that is quenched in an aqueous environment but becomes strongly fluorescent when bound to exposed hydrophobic groups of a protein. By heating one's protein of interest in the presence of such a dye, the thermal unfolding transition can be monitored spectrophotometrically. Because ligands that interact with proteins typically stabilize the folded protein, this leads to a shift in the midpoint of the unfolding transition (i.e. the melting temperature) .
The simplicity of this assay makes DSF very straightforward to implement using an RT-PCR thermocycler, it can be inexpensive and fast, and it requires relatively little sample: these advantages have made this approach attractive for screening applications in drug discovery - particularly for moderately-sized fragment libraries - and also for protein stability formulation. Meanwhile, the fact that this method is label-free and well-suited to detect binding over a wide range of affinities has made DSF one of the most popular approaches in drug discovery for fragment screening and for evaluating the "ligandability" of a target protein. While it would be desirable to obtain binding constants at an early stage, for example to prioritize fragment hits on the basis of their ligand efficiency, the magnitudes of the observed temperature shifts (at a given ligand concentration) have been shown to correlate only weakly with compounds' potency measured in other orthogonal assays.
Typical DSF data are shown in Figure one A. Here, SYPRO dye is used as a reporter for the extent of unfolding of maltose binding protein, and the melting temperature from each curve is determined. Using this method,
Maltose binding protein is observed to have a melting temperature of approximately fifty-two point five degrees Celsius in the absence of its ligand, maltose. Upon addition of increasing concentrations of maltose, the unfolding transition is shifted to increasingly higher temperatures: this implies that maltose stabilizes maltose binding protein, by binding to the natively folded protein.
Dose-response data in DSF experiments are typically presented by showing the temperature shift as a function of ligand concentration (Figure one B), and there are a number of ways to determine the melting temperature from the fluorescence data. One simple method is to take the first derivative of the observed fluorescence data with respect to temperature, and to then identify the maximum value (corresponding to the steepest part of the transition). Other methods instead smoothly fit the whole melting curve, either by using a so-called Boltzmann model, or by using a more rigorous "thermodynamic model", or occasionally by using other arbitrary polynomials.
The Boltzmann model is the most widely-used approach, in part because it is very user-friendly. The fluorescence at a given temperature is linearly related to the fraction of unfolded protein, which takes the form F unfolded (T) equals one plus one T minus T, where the melting temperature and alpha are parameters that reflect the steepness of the thermal unfolding transition. This model is applied primarily because it provides a sigmoidal shape that can be fit quite well to experimental data, especially when additional fitting parameters are included to account for the fact that the dye itself often has some temperature dependence (Figure S1). Despite its name, however, this equation does not explicitly model the thermodynamic transition: for this reason, the Boltzmann model is not used to garner any information beyond accurately identifying the midpoint of the protein unfolding transition (melting temperature), and studies that use this model simply report the presence/absence of binding rather than using this data to determine binding constants.
In studies to date seeking quantitative binding constants, "thermodynamic models" have been used. The simplest of such models write the fraction of unfolded protein as F unfolded (T) equals one plus delta H (one minus T divided by T) to the power of delta Cp (T minus T plus T in (T divided by T)).
where delta H is the enthalpy change of protein unfolding and delta Cp is the change in heat capacity enthalpy change of protein unfolding (both assumed to be temperature-independent). Typically delta Cp is under-determined given the available experimental data, and therefore determined through separate complementary experiments or estimated from the buried surface area of the folded protein, then fixed when fitting the thermal unfolding data. Though more complicated to write down, these models in fact have the same number of the effective free parameters (when delta Cp is fixed at a pre-determined value). Further, these models also have the advantage of using physically meaningful parameters.
Simply determining the Tm-shift as a function of ligand concentration is not sufficient to provide the binding affinity, however. Although some groups have simply fit these curves using the Hill equation, treating the Tm as an arbitrary "observable" that depends on the ligand concentration, this is not a physically reasonable approach. The Hill equation is only applicable when the observable is linearly proportional to the fraction of one of the species that is bound or unbound in solution, and Tm is not such a variable. The ATm data are also, by definition, drawn from different temperatures: the binding affinity cannot be assumed constant at different temperatures, further making the Hill equation inappropriate for this usage. This point is further underscored by the fact that these experimental data do not correspond to a simple saturation-based ligand titration method: rigorous thermodynamic simulations show that 4Tm should change monotonically with increasing ligand concentration, even if this behavior is not always observed in real cases due to artifacts like irreversible protein aggregation.
Instead, correct binding constants have thus far been determined using a more rigorous approach that explicitly considers the temperature-dependent enthalpy, entropy, and heat capacity of both protein folding and ligand binding. Using these thermodynamic parameters determined from the complete unfolding transitions, binding constants can subsequently be determined at the Tm. The means to do so was presented several decades ago, and also in the context of screening for ligands that bind a particular protein. In the earliest cases, these equations were formulated for the weak-binding regime, such that the free ligand concentration can be approximated by the total ligand concentration; these equations have since been extended to avoid the latter assumption. In all cases, though, the binding constant is determined at the Tm; together with the binding enthalpy, the van't Hoff equation can then be used to extrapolate binding constants at other temperatures. Because the binding enthalpy is difficult to determine from the unfolding transition data, this most commonly comes from a knowledge-based estimate or is measured directly using other techniques like isothermal titration calorimetry.
While details of the model have been iteratively improved since the original formulation, the two key elements of the "thermodynamic model" have remained unchanged: a fit of the melting curves is used to obtain multiple thermodynamic parameters, then these are used to calculate the binding constant at Tm and potentially, via extrapolation, at other temperatures. These elements of the model also remain the two key practical limitations of DSF. Because of the complexity associated with correctly replicating this analysis, it is often cited in modern studies but not frequently used: DSF is most popular as a qualitative test rather than a quantitative test, with the majority of literature reports reporting Tm-shifts as shown in Figure one B but not attempting to extract binding constants. Collectively this has led to a general consensus that the observed Tm shifts "cannot be readily transformed into binding affinities".
Here, we develop and describe a new isothermal strategy for analysis of DSF data. Rather than determine the Tm values from the raw fluorescence data at each ligand concentration, we instead select a single temperature of interest, and at this temperature we evaluate the fraction of protein that is folded or unfolded at each ligand concentration (Figure one C). Because all of the data used corresponds to the same temperature, no thermodynamic parameters are required; instead, a very simple model of coupled equilibria, protein folding or unfolding and ligand binding or unbinding, describe our system. Furthermore, because we only require the fraction of protein that is unfolded, for a given ligand concentration, at the temperature of interest, the raw data can be fit either with the simple Boltzmann model or with the more rigorous thermodynamic model. Other studies have similarly used isothermal slices of unfolding data, for example in analysis of cellular thermal shift data and other protein-ligand interactions; however, each of these stopped short of using these data to quantitatively determine binding constants. As demonstrated below, here we show that this approach leads to a very simple formulation for determining the binding affinity near the protein's unfolding temperature, and it provides values consistent with those measured in other orthogonal assays.
Theory
Theory
Isothermal analysis of ThermoFluor data. DSF experiments, specifically those in which large compound collections are screened, yield melting temperatures that shift either higher or lower when various compounds are added. Most non-covalent drug-like ligands stabilize their protein target upon selective binding, and accordingly they increase the protein's Tm. Conversely, compounds that decrease the protein's Tm are thought to operate by binding the unfolded protein more tightly than the folded protein, by competing with an endogenous, stabilizing co-factor, or through potentially non-specific effects; some metal ions, like zinc, can also destabilize proteins. We have excluded from the present analysis cases in which the ligand destabilizes the protein, and we focus solely on the scenario in which the ligand exclusively binds the natively-folded protein with a one-to-one stoichiometry.
Accordingly, we write the protein folding-unfolding reaction as a competitive coupled equilibrium with ligand binding, as follows:
[U] + [L] = [F] + [L] = [FL] Ku
Ka one where [U] is the concentration of the unfolded protein, [L] is the concentration of free ligand, [F] is the concentration of the folded and unbound protein, and [FL] is the concentration of the protein-ligand complex. Ky is the equilibrium constant for the protein unfolding reaction, and Ka is the equilibrium constant for the unbinding reaction. Both Ky and Ka depend on temperature, but both are constant at fixed temperature, and fixed buffer conditions. Intuitively from this scheme, we see that the concentration of unfolded protein goes to zero as the ligand concentration becomes large and drives the equilibrium to the right. Importantly, this scheme assumes each reaction, folding and binding, has no intermediates, and thus can be represented in this two-state manner; we will consider further the implications of this assumption in the Discussion section. We also note that the presence of the reporter dye is not included in our model.
From the conservation of mass and the definitions of these two equilibrium constants, we write the following:
[P]] = [F] + [U] + [FL] two
[L]] = [L] + [FL] three
Ku = [U]/[F] four
Ka = ([F] times [L])/[FL] five where U is the concentration of the unfolded protein, L is the concentration of free ligand, F is the concentration of the folded and unbound protein, and FL is the concentration of the protein-ligand complex. In Equation four we define Ky as the equilibrium constant between the unbound unfolded and folded states U and F. This equilibrium constant is therefore independent of ligand concentration, and reflects the overall fraction of protein that is unfolded/folded only when no ligand is present (since inclusion of ligand shifts some of U and F into the FL state). Ka is the equilibrium constant for the unbinding reaction. P is the total protein concentration, and LT is the total protein concentration (both of which are known). We note that the interaction between the reporter dye and the protein is not explicitly included in this model, though the presence of the dye presumably does contribute to stabilizing the unfolded protein.
Once the raw data have been normalized, fluorescence intensity in the DSF experiment is linearly related to the fraction of the unfolded protein fu. Starting from the definition of fw we simplify using Equations two through five and obtain the following expression:
U plus F plus FL equals one one plus open parenthesis one over Kg close parenthesis times open parenthesis one plus open bracket L close bracket over Ka close parenthesis close parenthesis six
This provides the fraction of unfolded protein in terms of the free ligand concentration L, whereas the known quantity in this experiment is the total ligand concentration LT. From Equations two through five we obtain the following quadratic equation for L:
L squared plus open bracket P close bracket minus LT plus Ka open parenthesis one plus Ky close parenthesis close bracket L minus L close bracket Ka open parenthesis one plus Ky close parenthesis equals zero seven
Thus, L can be written in terms of the total ligand concentration LT as follows:
L equals open bracket L close bracket minus P minus Kg open parenthesis one plus Ku close parenthesis plus open bracket P close bracket minus open bracket L close bracket plus Kg open parenthesis one plus Ku close parenthesis squared plus four L Kg open parenthesis one plus Ku close parenthesis close parenthesis eight
We note that this expression corresponds to only one root of the quadratic equation, since the other root is unphysical.
Together, Equations six and eight provide a single expression to write fu in terms of LT, P, Ky, and Ka. As expected for the limiting case where LT becomes large, we see from this set of equations that fu goes to zero. Conversely in the limiting case when LT goes to zero, we see that L goes to zero and thus Equation six reduces to the definition of the equilibrium constant for unfolding. Together, these two limits correspond to the endpoints of the data shown in Figure one C.
LT and PT are experimental parameters that are known; our expression for fu therefore uses only two free parameters Ky and Ka. These two parameters can be fit to the normalized data at the same time, or alternatively Ku can be first determined at the temperature of interest from the thermal unfolding curve in the absence of ligand; this allows fitting of the data in Figure one C to be subsequently carried out with a single free parameter Ka.
A simpler approximate solution. Monitoring the fraction of unfolded protein through dye binding in this competitive coupled equilibrium Equation one is very much analogous to detecting the fraction of labeled probe molecule in a competitive binding assay. In the latter case, one uses increasing concentrations of the unlabeled inhibitor of interest to explore the effect on a labeled probe that binds at the same site; the concentrations of all species, as well as the binding affinity of the probe ligand, can then be used to determine the inhibition constant for the unlabeled species from its IC fifty fifty-four.
Inspired by this analogy, we explored whether the same strategy could be applied here. We summarize our solution for these equations below, and elaborate further in the Appendix.
We again start from Equations two through five, but this time we solve these equations for the specific scenario in which the total ligand concentration matches the EC fifty. By definition, the EC fifty is the ligand concentration at which the fraction of unfolded protein is half of that observed in the absence of ligand. For this special case:
LT equals open bracket L close bracket fifty plus FL fifty equals EC fifty nine
P equals F fifty plus U fifty plus FL fifty ten
Ku equals U fifty over F fifty eleven
Ka equals open bracket F fifty times open bracket L close bracket fifty over FL fifty twelve where U fifty, L fifty, and F fifty are the concentrations of unfolded protein, free ligand, and folded unbound protein at the condition when LT equals EC fifty. Recall from Equation four that Ky is defined to be the equilibrium constant between the only the unbound unfolded/folded states, not the overall fraction of protein that is unfolded/folded, and thus for this reason Equation eleven does not include any contribution from FL fifty.
Correspondingly, in the absence of ligand we write:
P equals F plus U thirteen
LT equals L equals FL equals zero fourteen
Ku equals U over Fo fifteen
From Equations thirteen through fifteen we can solve for the fraction unfolded in the absence of ligand fuo:
U zero U plus F zero equals fuo equals one plus one over Ku sixteen
From the definition of EC fifty, we write:
From Equations fifteen and seventeen, we can write F fifty in terms of U zero and Ky. Substituting this into Equation ten yields an expression for FL fifty in terms of P and Ky; simplifying this with Equations fifteen and sixteen, we find that at the ligand concentration corresponding to the EC fifty, half of the total protein concentration has ligand bound to it:
FL fifty equals PT over two eighteen
This allows solution of Equations twelve and sixteen to yield a simple expression for L fifty as well:
Ka
L fifty minus one minus fuo nineteen
Combining Equations eighteen and nineteen back into Equation nine, we obtain a simple expression that relates the EC fifty to Ka:
Kg plus LPI PT
EC fifty equals one minus fuo twenty
There are no additional assumptions required to reach this equation, e.g. no need to assume that L approximately equals LT. This expression is intuitively gratifying, and it highlights the fact that the EC fifty observed in this experiment cannot be simply interpreted as the Ka. Most notably, in the limit where ligand binding is very tight low Ka, the observed EC fifty is driven essentially by stoichiometry enough ligand must be added to match half the number of available sites on the protein; this makes the EC fifty very insensitive to changes in the Ka in this regime, and it suggests that our approach may not be well-suited to determining the binding affinity for very tight interactions. This implication is borne out in real experimental data, as presented at the end of the following section.
Finally, rearranging Equation twenty yields:
Kg equals open parenthesis one minus fuo close parenthesis times EC fifty minus P twenty-one
PT is a known experimental parameter. Fu zero corresponds to the fraction of protein that is unfolded (at the temperature of interest) in the absence of ligand, and thus it can be determined directly from the thermal unfolding curve in the absence of ligand. Even using a very simple and arbitrary fit of Fu zero as a function of ligand concentration (e.g. the Hill equation), we can still easily estimate the midpoint of this transition (the ligand's EC fifty value): thus, Equation twenty-one provides a rapid means to estimate the Ka when it is undesirable to fit the complete curve using Equations six and eight. That said, fitting with the functional form presented in Equations six and eight leads to the most accurate estimate of the midpoint (since the complete curve is used to determine the fitting parameters), and is thus preferred.