Setup and equilibration of protein

Here focus is on the general setup (adding hydrogens and determining protonation state) with the maestro program. We additionally go through how maestro can be checked by Ulf Rydes programs. A complete list of the setup can be found under "Amber equilibrium" in Ryde methods. The procedure below goes through steps 1-7.


Table of Contents


1. Characterization of the pdb

Check (use open-office chart):

  1. Is the protein a monomer, dimer ? or n-mer?
  2. Are there Cys-Cys- crosslinks
  3. Are there missing parts of the structure (the first amino acids are often not resolved, check REMARK 465)
  4. Are the non-standard residues?
  5. At what pH was the structure obtained?

2. Protein setup

Protein setup with Maestro

  1. Make log-file (write down important information concerning flipped residues etc)
  2. Open Prep Wiz and Import the pdb file
  3. Uncheck the default "Delete water beyond 5 Å" and hit preprocess
  4. Review "Problems => Current overlapping atoms" If all close contacts are hydrogens, these will be optimized later and are likely not a problem in the end
  5. Review the "Problems => alternative positions" (This can be checked with Ulfs Progtrams, see XXX)
  6. Go to "Review and Modify" and then "Analyze Workspace" and Delete undesired (co-crystallized solvent moelcules that are not water or other artefacts).
  7. Save a pdb before optimization
  8. Optimize H-positions (can take some minutes; usually 5-10 but might take up to several hours for big systems).
  9. Run Propka and check his residues (see also below). Check if Asp and Glu, Lys, Arg, and Tyr have pKa below 7
  10. Check histidines with changepdb (see below)
  11. Check burried charges through visualization and with changepdb (there should be none that are not involved in ionic pairs - excpetions for metal ligands)
  12. Decide on protonation state for Asp, Glu, Lys, Arg, Tyr and His
  13. Use "Interactive optimizer => Analyze network"
  14. Note (and check) which Asn and Gln groups that are flipped.
  15. Check His protonation states (HIE, HID or HIP). Does it fit with analysis from changepdb? If not, then change them by hitting the arrow to the left side.
  16. Hit "optimize" (lock the histidines by checking the "lock" box. Protonation states of other residues that should remain as they are should also be checked as "locked")
  17. Save a pdbfile (HID, HIE, HIP must be specified afterwards - this can be done with changepdb command "hi")

Analysis with changepdb

  • Histidine analysis
    • run changepdb
    • give the input pdb
    • command: hh (creates output to screen and files "hisX" with X = 1,2,3...).
    • copy the output to screen into a file (e.g. named "hishb")
    • grep after the relevant information (grep HHb hishb)
    • inspect the histidine files and decide on protonation state (HIE, HID, HIP) - for highly negatively charged proteins solvent exposed histidines are generally HIP (unless there are good arguments against it). Their PKa values can also be calculated with PROPKA
  • Burried charges
    • run changepdb
    • give the input pdb
    • command bc
  • Metal centers
    • run changepdb
    • give the input pdb
    • command: des (gives file *.actX (X = 1,2,..) that contains an active site cut-out
  • Change histidines
    • run changepdb
    • give the input pdb
    • run command "hi"
    • type "D" for HID, "E" for HIE, "P" for HIP

3. Non-standard residues

Non-standard residues requires determination of charges and sometimes a "dummy" force field. More advanced force fields can be constructed if necessary (see PON parametrization)

  • Run a geometry optimization on the non-standard residues [typically B3LYP/def2-SV(P) or TPSS/def2-SV(P)]. Make sure that the structure is not too different from the crystal structure or other trustworthy experimental structures (if there are large deviations, consider using a single point calculation or only optimized hydrogens to generate charges)
  • With Turbomole (must be version > 7.0) insert "$esp_fit resp in the "control" file 'and run turbomole (Note: It can be convenient to ensure that the order of atoms is the same as in the pdb file)
    ridft -resp -proper > logm
    
    The use of "-proper" indicates that no wave function optimization is required. Look for "GRID" in logm to check whether the Electrostatic Potential (which will be used to fit the resp charge) is generated.
  • Use now the script "changepot" (generates resp.pot, resp.in1, resp.in and qin). Note: changepot might give an error due to a missing line in logm. Insert this line with:

    awk '/charge for group # 1/{x++} x==2{sub(/charge for group # 1/,"writing ESP data for Amber to file esp.dat \n&")}1' logm > logm_2 && mv logm_2 logm
    

    changepot:

    1. Select program
    2. Press enter (new)
    3. Name of turbomole output (logm)
    4. Do not use Boltzman weights (press enter)
    5. Press enter (special requirements if # grid pints > 99999)
    6. Press enter (output: resp.pot)
    7. Press enter (resp.in / resp.in1)
    8. Press enter (qin)

      The files resp.in and resp.in1 are input files for the fitting of resp charges. qin are initial charges. Fitting of resp charges are usually done in two steps, first a fit involving all atoms (check this) with loose convergence. This is done with resp.in. In a next step, all heavy atoms are fixed while hydrogens in methylene (CH2) and methyl (CH3) groups are constrained to be identical and the fit is done with more tight convergence (see below).

  • Check if there are any polar hydrogens bound to the same atom (e.g. water). These should be constrained to have the same charge (this is done in resp.in, see below for the format):

    Format of resp.in files:

    &cntrl  ihfree=0, qwt=0.0005, iqopt=2 # qwt = optimization parameter
    &end
    1.0
    Mol1
    2  63 # charge and tot. # of atoms
    7   0 # Atom nr. (here 7 = N) and command (-1 = freeze, 0 = optimized freely, n = fix to the same as atom #n e.g. 43 means fix to same value as atom number 43)
    6    0
    6    0
    
  • Run resp (part of the AMBER suite)

    resp -O -i resp.in -o resp.out -q qin -e resp.pot
    
  • Open resp.in1: Fix all charges except carbon atoms with more than one hydrogen (usually CH3 or CH2 groups). Fix the the hydrogens to have the same charge). qwt should be 0.001. Save qout in qin1:

    cp qout qin1
    

    and run

    resp -O -i resp.in1 -o resp.out1 -q qin1 -e resp.pot
    

    The final charges are now in resp.out1

4. Rename residues

  1. Cys => Cym if Cys is a metal ligand
  2. Cys => CYX if Cys is a Cys-Cys cross link
  3. HIS => HID, HIE or HIP if Maestro was used for the setup

5. Add non-polar hydrogens with tleap

  • Solvent cap

    Data required for solvating the protein can be obtained through changepdb:

    changepdb

    1. run changepdb
    2. Press c (cap)
    3. Press m (move)
    4. Press w (write)
  • Run tleap (see below for sample leap.in file)
    tleap -f leap.in
    
    Sample tleap file
    source leaprc.ff14SB # Amber protein force field
    loadAmberPrep hic.in # special residue prep.in file 
    loadAmberPrep cu2.in # special residue prep.in file 
    loadAmberParams hic.dat # special residue data file 
    loadAmberParams cu2.dat # special residue data file 
    x=loadpdb lpmo_setup_part_3_NH2corrected.pdb # pdb input file 
    bond x.56.SG x.178.SG # Cys-Cys cross-link 
    solvatecap x TIP3PBOX {0.0 0.0 0.0} 40 # solvation (see above) 
    saveamberparm x prmtop prmcrd
    quit
    

There will likely be errors from tleap. A few examples on how to fix these are given below:

Typical errors in tleap

  • End-groups (OXT and AMT): If one wishes to have the end amino group not charged, then delete the lines in addPdbResMap. Copy file leaprc.ff03 to the local directory;

    cp $AMBERHOME/dat/leap/cmd/leaprc.ff03 .
    
  • Prep.in files: For special residues, a prep.in file must be constructed for the atom-types. A pdb file with the residue can be cut out from the pdbfile. The prep file can be constructed by (amber)

    antechamber -i hic.pdb -fi pdb -o hic.in -fo prepi -nc 0 -c bcc -rn HIC -at amber # AMBER force field
    antechamber -i hic.pdb -fi pdb -o hic.in -fo prepi -nc 0 -c bcc -rn HIC -at gaff # GAFF force field
    
  • Missing parameters: In some cases, neither the regular amber protein force field nor the GAFF force field have parameters. In this case, the parameters must be constructed.

6. Force field check by energy calculation

  • It is a good idea to check the resulting energies with the program changeparm

    changeparm

    1. run changeparm
    2. Select prmtop
    3. Press m (mdrest)
    4. Select prmcrd

Energies above 1000 kcal/mol should be checked.

7. Simulated annealing

  • Run a simulated annealing run with shake

    Title
    &cntrl
    irest=0,ntx=1,
    nstlim=1200000,dt=0.002,
    temp0=300.0,ntt=1,tautp=0.2,
    ntc=2,ntf=2,
    nsnb=15,cut=15.0,dielc=1.0,
    ntpr=1000,ntwx=0,ntwv=0,ntwe=0,
    ntb=0,ntp=0,taup=0.2,
    ipol=0,igb=0,
    vlimit=20.0,nmropt=1,
    ibelly=1,bellymask=":WAT | @H="
    &end
    
    &wt type='TEMP0',istep1=     0,istep2= 400000,value1=370.00,value2=370.00  &end
    &wt type='TEMP0',istep1= 400001,istep2=1200000,value1=370.00,value2=  0.00  &end
    &wt type='TAUTP',istep1=     0,istep2= 400000,value1=  0.20,value2=  0.20  &end
    &wt type='TAUTP',istep1= 400001,istep2= 800000,value1=  1.00,value2=  1.00  &end
    &wt type='TAUTP',istep1= 800001,istep2=1000000,value1=  0.50,value2=  0.50  &end
    &wt type='TAUTP',istep1=1000001,istep2=1200000,value1=  0.05,value2=  0.05  &end
    
    &wt type='END' &end
    
    &rst iat=0 &end
    
  • After this, run a minimization with 10000 steps

    Title
    &cntrl 
    irest=0,ntx=1, 
    nstlim=0,dt=0.002, 
    imin=1,maxcyc=10000,drms=0.001, 
    temp0=300.0,ntt=1,tautp=0.2, <tt>  ntc=2,ntf=2, 
    nsnb=15,cut=15.0,dielc=1.0, 
    ntpr=1000,ntwx=0,ntwv=0,ntwe=0, 
    ntb=0,ntp=0,taup=0.2 
    ipol=0,igb=0, 
    ncyc=10,ntmin=1,dx0=0.01, 
    ibelly=1,bellymask=":WAT | @H=" 
    &end
    

Known problems

Problems with SHAKE: In the case that SHAKE complains, it can often be relieved by running a short minimization followed and short MD without SHAKE (see input "sander.in00" and sander.in0) and then continue the annealing with SHAKE. If the problem persists, then continue without SHAKE

mpirun -bind-to core -np 20 sander.MPI -O -i sander.in1_no_shake -o sander.out1_no_shake -r mdrest1_no_shake -e mden1_no_shake -p prmtop -c prmcrd
mpirun -bind-to core -np 20 sander.MPI -O -i sander.in2_no_shake -o sander.out2_no_shake -r mdrest2_no_shake -e mden2_no_shake -p prmtop -c mdrest1_no_shake

The sander.in1_no_shake and sander.in2_no_shake are given below

Simulated annealing without shake (sander.in1_no_shake)

Title
 &cntrl
  irest=0,ntx=1,
  nstlim=1200000,dt=0.0005,
  temp0=300.0,ntt=1,tautp=0.2,
  ntc=1,ntf=1,
  nsnb=15,cut=15.0,dielc=1.0,
  ntpr=1000,ntwx=0,ntwv=0,ntwe=0,
  ntb=0,ntp=0,taup=0.2,
  ipol=0,igb=0,
  vlimit=20.0,nmropt=1,
  ibelly=1,bellymask=":WAT | @H="
 &end

 &wt type='TEMP0',istep1=     0,istep2= 400000,value1=370.00,value2=370.00  &end

 &wt type='TEMP0',istep1= 400001,istep2=1200000,value1=370.00,value2=  0.00  &end

 &wt type='TAUTP',istep1=     0,istep2= 400000,value1=  0.20,value2=  0.20  &end

 &wt type='TAUTP',istep1= 400001,istep2= 800000,value1=  1.00,value2=  1.00  &end

 &wt type='TAUTP',istep1= 800001,istep2=1000000,value1=  0.50,value2=  0.50  &end

 &wt type='TAUTP',istep1=1000001,istep2=1200000,value1=  0.05,value2=  0.05  &end

 &wt type='END' &end

 &rst iat=0 &end

Minimization (CHECK THIS) (sander.in2_no_shake)

Title
 &cntrl
  irest=0,ntx=1,
  nstlim=0,dt=0.0005,
  imin=1,maxcyc=10000,drms=0.001,
  temp0=300.0,ntt=1,tautp=0.2,
  ntc=1,ntf=1,
  nsnb=15,cut=15.0,dielc=1.0,
  ntpr=1000,ntwx=0,ntwv=0,ntwe=0,
  ntb=0,ntp=0,taup=0.2
  ipol=0,igb=0,
  ncyc=10,ntmin=1,dx0=0.01,
  ibelly=1,bellymask=":WAT | @H="
 &end