Backbone protons in proteins and peptides have a chemical shift that is partially dependent on local secondary structure. Although the observed chemical shift for protons in well-folded protein systems can be distorted by the contribution of higher-level structuring features, such as the ordering of sequence-remote, aromatic sidechains in the vicinity of the backbone, chemicals shifts can be a powerful tool for determining the degree of secondary-structure formation in shorter sequences. The chemical shift in such systems is the sum of the random-coil chemical shift (the shift expected in the absence of non-random secondary structure) and the deviation caused by secondary structure formation. In order to determine an accurate chemical shift deviation (CSD) it is vital to start with a random-coil value that is matched to the conditions under which the chemical shift was observed. For example, temperature, (co)solvents, pH, and neighboring residues all play a part in changing a proton's random-coil shift. The chemical shift deviation database (CSDb) allows for the rapid application of tedious random-coil correction methodologies to chemical shift data. The calculations that determine these effects are discussed in more detail in the section called CSD Calculations.
This help file covers specific problems as well as describes the overall process of entering NMR chemical shift data. One may read the help page in its entirety for a general outline on how to use the database, or just start using the database and click on the help links throughout the site which reference specific portions of this page. Unalterable example data is accessible from all accounts under the owner name 'Andersen' and is available to help familiarize yourself with the database. Please note that the CSDb is still being refined and minor glitches in the user interface may still exist. Any suggestions, comments, or questions are welcome.
- Jasper C. Lin: jclin at u dot washington dot edu
- Niels H. Andersen: andersen at chem dot washington dot edu
Four CSD calculation methods are available for processing entered chemical shifts. The oldest method, v.2002, is an update of an earlier method (Andersen, 1997); the primary modification came with the application of nearest-neighbor effects. The v.2002 method includes corrections for serine, threonine, histidine, leucine, and aromatic residues preceding or following a given residue.
In 2003, the CSD algorithm was further updated and divided into two separate methods, yielding the v.2003S and v.2003X methods. Updates include a mild temperature gradient for reference HA values, additional fragmentation of the solvent-correction terms to recognize that different amounts of cosolvent addition require different correction values, and addition of lysine and arginine as residues that affect the reference shifts of neighbors. The two 2003 methods differ in the magnitude of HN, and to a lesser extent HA, temperature and solvent corrections: v.2003S is calibrated for solvent-sequestered residues, while v.2003X is calibrated for solvent-exposed positions. The need for having two different sets of correction values became apparent in our work with beta-hairpin peptides. The strands of these peptides, when folded, include residues with HA and HN protons that are alternately inwardly- and outwardly-directed, corresponding to "sequestered" and "exposed" positions respectively. The exposed positions require a steeper temperature gradient correction for the HN's. Failure to apply the reduced gradient for sequestered HN positions with large positive CSD's under-estimates the %-fold of the system at temperatures lower than 295K, over-estimating at temperatures greater than 300K. The direction of the error in folding estimates for sequestered HN sites that display negative structuring CSDs (e.g. at turn loci) is reversed. Application of the more shallow sequestered-position gradient to exposed sites results in "apparent CSD's" crossing through zero over a temperature range. The v.2003S method is most appropriately used only for solvent-sequestered positions of systems >60% folded throughout the temperature range examined. The v.2003X method can be employed for solvent-exposed positions independent of the extent of folding and for all positions in systems <30% folded. The 30%- to 60%-folded regime must be treated carefully as the temperature correction that should be applied to the reference values actually changes significantly during the course of melting experiments.
The separation of the CSD algorithm into two separate methods made calculation of CSD's for beta systems tedious. With CSDb2, the ability to tag a specific shift as "sequestered" was added and the algorithms 2003S and 2003X were combined to make v.2005. No additional changes were made to the calculation of HN and HA CSD's. Method v.2005 is, however, the first to include the ability to calculate CSD's of sidechain protons. The algorithm is less complex than those employed for HA and HN: a random-coil shift is simply subtracted from the observed shift. No temperature, pH, solvent, etc. corrections have yet been devised.
NH Andersen, JW Neidigh, SM Harris, GM Lee, Z Liu, and H Tong. Extracting Information from the Temperature Gradients of Polypeptide NH Chemical Shifts: I. The Importance of Conformational Averaging. Journal of the American Chemical Society, 119, 1997, 8547-8561.^ Back to the Top ^
Data in the CSDb are organized using a four-level system: Sequence, Sample, Data Sets, and Shifts. The broadest category is Sequence, which only contains the most basic information about a peptide. Each entry at the Sequence level may then be associated with multiple Sample entries, which describe specific sample conditions. Similarly, each Sample-level entry may be associated with multiple Data Set entries, each specifying a different temperature. Finally, each Data Set entry connects to Shift-level entries. Each chemical shift entered at this lowest level is cross-referenced by residue number and nucleus name (HA, HN, HB2, etc., using the IUPAC naming convention). Although d-amino acids are considered in calculations, most unnatural amino acids are not. A space is provided for storing their chemical shifts, however a CSD is not calculated. As calculation protocols are developed for unnatural amino acids, they will be added in subsequent calculation methods. In summary, each recorded chemical shift has associated with it the nucleus and residue number it is from, the temperature it was recorded at, the make-up of the sample it was in, and the sequence of the peptide or protein it is from.
The home page lists all of the sequences owned by the current user and includes a header that is consistent throughout the site. The user can begin the process of entering a new sequence or graphing data, or navigate directly to the "home" or "about" page using the header menu. The header also includes a search function that returns a list of sequences. The results include those from text-string searches of users, sequences, sequence names, and sequence keywords. The search may be made more specific by including the modifier user:, keyword:, or sequence: before the search term (e.g. keyword:hairpin). An additional function may be accessed by searching for "similar:SEQUENCE", where "SEQUENCE" is any peptide sequence. The search will return sequences that are similar to the one entered.
Several peptides studied in the Andersen Group are listed on the Home Page as examples. All these peptides have chemical shift data entered for them. Feel free to use these examples to help familiarize yourself with the hierarchy and functionality of the database.
Choosing "new sequence" from the header or "Edit this sequence" from the Sequence Details page opens a form for the entry or modification of sequence detail information, including the name and terminal-modifications of a sequence. The name of a sequence should be chosen carefully to uniquely identify it. When entering a new sequence, be certain to enter the actual amino acid sequence with care, as you will not be able to edit, insert, or delete the amino acids later (the sequence and any associated data can, however, be deleted as a whole). The sequence is entered using the standard one-letter amino-acid code. For example, L-asparagine is entered as "N". Capital letters designate the L-form, while lowercase letters signify the less-common D-form. Any unnatural amino acid should be denoted as X or x, in which case the CSDb will provide a space for the chemical shift, but ignore it during the calculations, since the CSDb does not have reasonable random-coil values for all concievable residues. The one-letter code sequence should be entered without spaces. Thus, "Asp D-Pro Gly" is entered "DpG".
Terminal modifications can be specified so that they can be accounted for during CSD calculation. Currently, the CSDb only recognizes and corrects-for free N- and C-termini, acetylated N-termini (enter "Ac") and amidated C-termini (enter "NH2"). If the terminus is not modified, leave the space blank. The ionization state of free termini is determined based on sample pH and does not need to be indicated in this form.
The last entry blank, "First Residue", requests that you indicate the residue number of the first residue in your sequence. While this would usually be "1", other numbers may be entered in order to conform sequences truncated at the N-terminus with untruncated variants. For example, it is common to study only a portion of a large protein by synthesizing that portion as an independent small peptide. CSDb allows you to number the first listed residue to correspond to the numbering of the large protein. Also, a peptide may be modified by truncating the N-terminus. Comparing the original and truncated peptides becomes trivial if you specify the first listed residue of the shortened peptide using the same number as its corresponding residue on the long peptide. For example, these two peptides:
AAKAA AKAAA KAAAA KGY wild type A AKAAA KAAAA KGY shortened mutant
would be confusing on comparison if the shortened mutant started with residue "1". By starting the residue numbering at "5", the residues line up by number. This is especially useful if one compares the two sequences using the graph utility in the CSDb, which aligns sequences by residue number.
After submitting the sequence information, the CSDb will confirm your selections for the amino acids. A table will list the residue number, the entered one-letter code and its corresponding three-letter code (as interpreted by CSDb). A final entry column is provided for making corrections. Comments from the CSDb regarding your entry appear to the right. Upon submission, you will be directed to the newly made Sequence Detail page where you can continue the data entry process.
Details for a sequence may be viewed by clicking its name. The detail page includes the exact sequence in three-letter code form, but is primarily useful for graphing CSD data associated with the sequence and adding, editing, and viewing sample data associated with the sequence. From this page you can also delete the sequence by clicking the "X" next to "Delete Sequence". Deleting at this level will delete ALL samples and shift information associated with this sequence. A confirmation page will be displayed before the deletion is performed.^ Back to the Top ^
Samples associated with a sequence are listed below the sequence details. The page displays solution conditions, such as the primary solvent, co-solvent, pH, etc. and the reference, followed by a list of temperatures, or data sets, at which shifts were recorded. Action options include editing the sample details, adding a new dataset to the sample and shortcuts to commonly-used graphs. The last action is the "Sample Delete", selection of which will result in the deletion of the sample along with all associated datasets. The other samples associated with the same sequence will be left intact.
The sample parameters are essential for accurate CSD calculations. The fields are meant to be self-explanatory and some simple examples are listed. There are two fields that do not contribute to the CSD calculations. The first is 'Solvent Description', which is used by the CSDb to label the sample for easy identification. You are encouraged to keep the description short but informative. Also, the 'Reference' field is provided to help maintain a link between the entered data and its source; usually a journal reference or notebook page. The 'Primary Solvent', 'Co-solvent', '% Co-solvent', and 'pH' are all used in the calculation of the CSD. Please note that you should not type in a '%' sign when entering the '% Co-solvent'.^ Back to the Top ^
Selecting a data set temperature from those listed for a given sample opens the DataSet Details page. The page lists the shifts and the calculated CSDs. Links near the top of the page include access to other temperatures for the same sample, access to pages where you can modify or enter new shifts for the dataset, and a few shortcuts to commonly used graphs. In order to view how the calculations were performed, click on the method in question under 'CSD Calc. Details'. A link is also included for deleting the dataset and associated shifts.
There are two ways chemical shifts can be entered. The "Edit Shifts" link on the DataSet Detail page brings up a form that prompts you to enter shifts in the appropriate fields. If you do not have a value for a particular chemical shift, leave the field blank and the CSDb will ignore it. Note that not all residues are listed as having 1 NH and 1 Ha shift. Exceptions include L/D-Pro, which does not have an NH proton, and Gly, which has two Ha protons (in IUPAC: HA2 and HA3). In the latter case, the CSDb treats both alpha protons identically, thus a stereo-specific assignment is not necessary. The database is designed to store values with 0.001-ppm precision. After entering the shifts, hitting the submit button will update the values in the database.
The "Paste Shifts" link is useful in importing data from a spreadsheet. However, the pasted values must be in the correct format: one shift per line, each shift in the form of "residue #, nucleus, shift-value", the values separated by a comma, space or tab. The lines do not need to be sorted by any field. After submitting, a confirmation page lists what is about to be entered. If there are any mistakes, you may go back and change the entry before the shifts are added to the database.^ Back to the Top ^
The graph setup allows you to determine how many data sets to graph and for each, choose which sequence, sample, temperature, nucleus, and method to use. The graph utility is flexible, allowing you to graph different data sets simultaneously. When graphing several data sets, the CSDb uses the residue number to align the sequences.
The bar graph is displayed at the top by the graphing program Ploticus. There are several action options right below the graph. They allow you to change the graph setup and download the Ploticus file used to graph the CSDs. Also, the numbers used to create the graph are listed in a table called "CSD Data Table".^ Back to the Top ^