Setup and equilibration of protein

Here focus is on the general setup (adding hydrogens and determining protonation state) with the maestro program. We additionally go through how maestro can be checked by Ulf Rydes programs. A complete list of the setup can be found at Ulf Ryde's page. The procedure below goes through steps 1-7.


Table of Contents


1. Characterization of the pdb

Check (use open-office chart):

  1. Is the protein a monomer, dimer ? or n-mer?
  2. Are there Cys-Cys-crosslinks (see also 4. Rename residues below)
  3. Are there missing parts of the structure (the first amino acids are often not resolved, check REMARK 465)? We may later add missing atoms automatically using tleap.
  4. Are the non-standard residues?
  5. At what pH was the structure obtained?

2. Protein setup

Protein setup with Maestro

  1. Make log-file (write down important information concerning flipped residues etc.). Maestro can write a command log file automatically: “Edit” => “Preferences...” => “General”, submenu “Quitting Maestro” => Check “Write command log file”.
  2. Open Prep. Wiz. and "Import" the pdb file.
  3. Make sure "Delete water beyond 5 Å" is unchecked and hit "preprocess".
  4. Review "Problems => Current overlapping atoms" If all close contacts are hydrogens, these will be optimized later and are likely not a problem in the end (if they are water molecules, we return to them in 7. Force field check by energy calculation).
  5. Review the "Problems => ".
    • “alternative positions". By "Next position" you can see the other position(s). By clicking "commit" you select one configuration, keep those with the highest occupancy (if close to active side, consider to calculate both). Mark down which you choose and why (with 50:50 it is basically guessing).
    • “overlapping atoms". If caused by water molecules, check again after the Simulated annealing step below. If problem remains, then note down and get back to it in step 7.
    • “missing atoms” If parts of residues are missing, note the residues down and check again visually after tleap filled the missing parts (see Step 5).
  6. Go to "Review and Modify" and then "Analyze Workspace" and delete undesired parts (e.g. co-crystallized solvent molecules that are not water or other artefacts).
  7. Save a pdb before optimization.
  8. Optimize H-positions under "Refine" (can take some minutes; usually 5-10 but might take up to several hours for large systems).
  9. Run Propka and check his residues (see also below).
    • Check His (normally close to 7). Check if Asp and Glu are above 7 (they are then protonated, otherwise not). Chek of Lys, Arg, and Tyr have pKa below 7 (then they are not protonated).
    • Note propka is a good tool, but not always reliable. Use it as guidelines (pKa calculations are very difficult).
  10. Check histidines with changepdb (see Analysis with changepdb below) - use the pdb from the pdb database.
  11. Check burried charges through visualization and with changepdb (there should be none that are not involved in ionic pairs - excpetions for metal ligands).
  12. Decide on protonation state for Asp, Glu, Lys, Arg, Tyr and His
  13. Use "Interactive optimizer => Analyze network"
  14. Note (and check) which Asn and Gln groups that are flipped. They can be flipped by hitting the arrow to the left/right side in Maestro.
  15. Check His protonation states (HIE, HID or HIP). Does it fit with analysis from changepdb? If not, then change them by hitting the arrow to the left side.
  16. Hit "Optimize" (lock the histidines by checking the "lock" box. Protonation states of other residues that should remain as they are should also be checked as "locked").
  17. Save a pdbfile (HID, HIE, HIP must be specified afterwards - this can be done with changepdb command "hi").

Analysis with changepdb

  • Histidine analysis
    • run changepdb.
    • give the input pdb.
    • command: hh (creates output to screen and files "hisX" with X = 1,2,3...).
    • copy the output to screen into a file (e.g. named "hishb").
    • grep after the relevant information (grep HHb hishb).
    • inspect the histidine files and decide on protonation state (HIE, HID, HIP). Inspect the hydrogen-bonding network around the histidines - for highly negatively charged proteins, solvent exposed histidines are generally HIP (unless there are good arguments against it). Their pKa values can also be calculated with PROPKA.
  • Burried charges
    • run changepdb.
    • give the input pdb.
    • command bc.
  • Metal centers (typically not done)
    • run changepdb.
    • give the input pdb.
    • command: des (gives file *.actX (X = 1,2,..) that contains an active site cut-out.
  • Change histidines
    • run changepdb.
    • give the input pdb.
    • run command "hi".
    • type "D" for HID, "E" for HIE, "P" for HIP.
    • type "w" (save) and then "q" (quit).

3. Non-standard residues

Non-standard residues requires determination of charges and sometimes a "dummy" force field. More advanced force fields can be constructed if necessary (see PON parametrization)

  • Run a geometry optimization on the non-standard residues [typically B3LYP/def2-SV(P) or TPSS/def2-SV(P)]. Make sure that the structure is not too different from the crystal structure or other trustworthy experimental structures (if there are large deviations, consider using a single point calculation or only optimized hydrogens to generate charges).
  • Insert "$esp_fit resp" in the turbomole "control" file and run turbomole (must be version > 7.0):
    ridft -resp -proper > logm
    
    The use of "-proper" indicates that no wave function optimization is required. Delete it if you did not do a wave function optimization. Look for "GRID" in logm to check whether the Electrostatic Potential (which will be used to fit the resp charge) is generated.

    Note: It can be convenient to ensure that the order of atoms is the same as in the pdb file. This can be ensured by using short-cut to generate "control" and "coord" files for turbomole. Make a syst1 file and use pdbtocomqum followed by comqumtoturbo as described under "Setup of ComQum Calculations" in the ComQum section.
  • Use now the script "changepot" (generates resp.pot, resp.in1, resp.in and qin). Note: changepot might give an error due to a missing line in logm. Insert this line with:

    awk '/charge for group # 1/{x++} x==2{sub(/charge for group # 1/,"writing ESP data for Amber to file esp.dat \n&")}1' logm > logm_2 && mv logm_2 logm
    

    changepot:

    1. Select program, type t for turbomole.
    2. Press enter (new); this is the name of the potential file (new is default).
    3. Name of turbomole output (logm).
    4. Do not use Boltzman weights (press enter).
    5. Press enter (special requirements if # grid pints > 99999).
    6. Press enter (output: resp.pot).
    7. Press enter (resp.in / resp.in1).
    8. Press enter (qin).

      The files resp.in and resp.in1 are input files for the fitting of resp charges. qin are initial charges. Fitting of resp charges is usually done in two steps: 1. a fit involving all atoms with loose convergence. This is done with resp.in. 2. all heavy atoms are fixed while hydrogens in methylene (CH2) and methyl (CH3) groups are constrained to be identical and the fit is done with more tight convergence (see below).

  • Check if there are any polar hydrogens bound to the same atom (e.g. water). These should be constrained to have the same charge (this is done in resp.in, see below for the format):

    Format of resp.in files:

    &cntrl  ihfree=0, qwt=0.0005, iqopt=2 # qwt = optimization parameter
    &end
    1.0
    Mol1
    2  63 # charge and tot. # of atoms
    7   0 # Atom nr. (here 7 = N) and command (-1 = freeze, 0 = optimized freely, n = fix to the same as atom #n e.g. 43 means fix to same value as atom number 43)
    6    0
    6    0
    
  • Run resp (part of the AMBER suite)

    resp -O -i resp.in -o resp.out -q qin -e resp.pot
    
  • Open resp.in1: Fix all charges except carbon atoms with more than one hydrogen (usually CH3 or CH2 groups). Fix the the hydrogens to have the same charge. qwt should be 0.001. Save qout in qin1:

    cp qout qin1
    

    and run

    resp -O -i resp.in1 -o resp.out1 -q qin1 -e resp.pot
    

    The final charges are now in resp.out1 - add this charges to the pdb file you want to start the equillibrium from (see more below).

    • Guide to adding charges to pdb: Column q(init) in resp.out1 gives the input charges. Column q(opt) are the output charges, which are the final charges. For the QM atoms (but not for the junction atoms), they replace the charges in the last column of the pdb file (make sure to copy the charges in the right order to the atoms).

    • Junction atoms: After adding charges to the QM atoms, it is likely that the total charge of the protein no longer correctly adds up (can be checked with changepdb, see next point). This is fixed by adding a uniform constant to all junction atoms (add this factor to the existing AMBER charge), ensuring that the charge adds up correctly.

  • Check the charge with changepdb (cc). Ensure that the charges add up.

4. Rename residues

  1. Cys => Cym if Cys is a metal ligand.
  2. Cys => CYX if Cys is a Cys-Cys cross link.
  3. HIS => HID, HIE or HIP if Maestro was used for the setup.
  4. HOH => WAM if water is a metal ligand. Move these waters to the top of the list of water molecules and add TER before and after them. Note: that wam.in and wam.dat will be needed for tleap (see next point, below).

5. Add non polar hydrogens, missing atoms and solvent cap with tleap

  • Solvent cap

    Data required for solvating the protein can be obtained through changepdb:

    changepdb

    1. run changepdb
    2. Press c (cap)
    3. Press m (move)
    4. Press w (write)
  • Run tleap (see below for sample leap.in file)
    tleap -f leap.in
    
    Sample tleap file
    source leaprc.ff14SB # Amber protein force field
    loadAmberPrep hic.in # special residue prep.in file 
    loadAmberPrep cu2.in # special residue prep.in file 
    loadAmberParams hic.dat # special residue data file 
    loadAmberParams cu2.dat # special residue data file 
    x=loadpdb lpmo_setup_part_3_NH2corrected.pdb # pdb input file 
    bond x.56.SG x.178.SG # Cys-Cys cross-link 
    solvatecap x TIP3PBOX {0.0 0.0 0.0} 40 # solvation (see above) 
    saveamberparm x prmtop prmcrd
    quit
    

There will likely be errors from tleap. A few examples on how to fix these are given below:

Typical errors in tleap

  • End-groups (OXT and AMT): If one wishes to have the end amino group not charged, then delete the lines in addPdbResMap. Copy file leaprc.ff03 to the local directory;

    cp $AMBERHOME/dat/leap/cmd/leaprc.ff03 .
    
  • Prep.in files: For special residues, a prep.in file must be constructed for the atom-types. A pdb file with the residue can be cut out from the pdbfile. The prep file can be constructed by (amber)

    antechamber -i hic.pdb -fi pdb -o hic.in -fo prepi -nc 0 -c bcc -rn HIC -at amber # AMBER force field
    antechamber -i hic.pdb -fi pdb -o hic.in -fo prepi -nc 0 -c bcc -rn HIC -at gaff # GAFF force field
    
  • Missing parameters: In some cases, neither the regular amber protein force field nor the GAFF force field have parameters. In this case, the parameters must be constructed.

6. Structure check and optional charge modification

Fill this out ...

7. Force field check by energy calculation

  • It is a good idea to check the resulting energies with the program changeparm

    changeparm

    1. run changeparm
    2. Select prmtop
    3. Press m (mdrest)
    4. Select prmcrd

Energies above 1000 kcal/mol should be checked.

8. Simulated annealing

  • Run a simulated annealing run with shake

    Title
    &cntrl
    irest=0,ntx=1,
    nstlim=1200000,dt=0.002,
    temp0=300.0,ntt=1,tautp=0.2,
    ntc=2,ntf=2,
    nsnb=15,cut=15.0,dielc=1.0,
    ntpr=1000,ntwx=0,ntwv=0,ntwe=0,
    ntb=0,ntp=0,taup=0.2,
    ipol=0,igb=0,
    vlimit=20.0,nmropt=1,
    ibelly=1,bellymask=":WAT | @H="
    &end
    
    &wt type='TEMP0',istep1=     0,istep2= 400000,value1=370.00,value2=370.00  &end
    &wt type='TEMP0',istep1= 400001,istep2=1200000,value1=370.00,value2=  0.00  &end
    &wt type='TAUTP',istep1=     0,istep2= 400000,value1=  0.20,value2=  0.20  &end
    &wt type='TAUTP',istep1= 400001,istep2= 800000,value1=  1.00,value2=  1.00  &end
    &wt type='TAUTP',istep1= 800001,istep2=1000000,value1=  0.50,value2=  0.50  &end
    &wt type='TAUTP',istep1=1000001,istep2=1200000,value1=  0.05,value2=  0.05  &end
    
    &wt type='END' &end
    
    &rst iat=0 &end
    
  • After this, run a minimization with 10000 steps

    Title
    &cntrl 
    irest=0,ntx=1, 
    nstlim=0,dt=0.002, 
    imin=1,maxcyc=10000,drms=0.001, 
    temp0=300.0,ntt=1,tautp=0.2, <tt>  ntc=2,ntf=2, 
    nsnb=15,cut=15.0,dielc=1.0, 
    ntpr=1000,ntwx=0,ntwv=0,ntwe=0, 
    ntb=0,ntp=0,taup=0.2 
    ipol=0,igb=0, 
    ncyc=10,ntmin=1,dx0=0.01, 
    ibelly=1,bellymask=":WAT | @H=" 
    &end
    

Known problems

Problems with SHAKE: In the case that SHAKE complains, it can often be relieved by running a short minimization followed and short MD without SHAKE (see input "sander.in00" and sander.in0) and then continue the annealing with SHAKE. If the problem persists, then continue without SHAKE

mpirun -bind-to core -np 20 sander.MPI -O -i sander.in1_no_shake -o sander.out1_no_shake -r mdrest1_no_shake -e mden1_no_shake -p prmtop -c prmcrd
mpirun -bind-to core -np 20 sander.MPI -O -i sander.in2_no_shake -o sander.out2_no_shake -r mdrest2_no_shake -e mden2_no_shake -p prmtop -c mdrest1_no_shake

The sander.in1_no_shake and sander.in2_no_shake are given below

Simulated annealing without shake (sander.in1_no_shake)

Title
 &cntrl
  irest=0,ntx=1,
  nstlim=1200000,dt=0.0005,
  temp0=300.0,ntt=1,tautp=0.2,
  ntc=1,ntf=1,
  nsnb=15,cut=15.0,dielc=1.0,
  ntpr=1000,ntwx=0,ntwv=0,ntwe=0,
  ntb=0,ntp=0,taup=0.2,
  ipol=0,igb=0,
  vlimit=20.0,nmropt=1,
  ibelly=1,bellymask=":WAT | @H="
 &end

 &wt type='TEMP0',istep1=     0,istep2= 400000,value1=370.00,value2=370.00  &end

 &wt type='TEMP0',istep1= 400001,istep2=1200000,value1=370.00,value2=  0.00  &end

 &wt type='TAUTP',istep1=     0,istep2= 400000,value1=  0.20,value2=  0.20  &end

 &wt type='TAUTP',istep1= 400001,istep2= 800000,value1=  1.00,value2=  1.00  &end

 &wt type='TAUTP',istep1= 800001,istep2=1000000,value1=  0.50,value2=  0.50  &end

 &wt type='TAUTP',istep1=1000001,istep2=1200000,value1=  0.05,value2=  0.05  &end

 &wt type='END' &end

 &rst iat=0 &end

Minimization (CHECK THIS) (sander.in2_no_shake)

Title
 &cntrl
  irest=0,ntx=1,
  nstlim=0,dt=0.0005,
  imin=1,maxcyc=10000,drms=0.001,
  temp0=300.0,ntt=1,tautp=0.2,
  ntc=1,ntf=1,
  nsnb=15,cut=15.0,dielc=1.0,
  ntpr=1000,ntwx=0,ntwv=0,ntwe=0,
  ntb=0,ntp=0,taup=0.2
  ipol=0,igb=0,
  ncyc=10,ntmin=1,dx0=0.01,
  ibelly=1,bellymask=":WAT | @H="
 &end

9. Write pdb file

results matching ""

    No results matching ""