Usage ===== IDPConformerGenerator runs entirely through command-lines. Follow the explanations in this page plus the documentation on the command-line themselves. Command-lines ------------- To execute :code:`idpconfgen` command-line, run :code:`idpconfgen` in your terminal window, after :ref:`installation `:: idpconfgen or:: idpconfgen -h Both will output the help menu. .. note:: All subclients have the :code:`-h` option to show help information. :code:`idpconfgen` has several interfaces that perform different functions. However, there is a sequence of interfaces that need to be used to prepare the local torsion angle database and the files needed to build conformers. After these operations executed, you will end up with a single :code:`json` file that you can use to feed the build calculations. The other files are safe to be removed. IDPConfGen Small Peptide Example -------------------------------- The :code:`example/` folder contains instructions to setup IDPConformerGenerator database from scratch and generate conformers for a small peptide. .. include:: ../example/small_peptide/README.rst :start-after: .. start-description :end-before: .. end-description Building With Variable Bond-Geometry Strategies ----------------------------------------------- To increase user flexibility and build parameterization,.IDPConformerGenerator has 4 bond geometry strategies to choose from: sampling (default), fixed, exact, and int2cart. These strategies could be selected from using the ``--bgeo-strategy`` flag during the ``build`` process. Please note that a new database needs to be generated with the ``bgeodb`` subclient to use the ``exact`` strategy, and Int2Cart will need to be installed to use the ``int2cart`` strategy (see below). The default ``sampling`` strategy aims to overcome limitations of having fixed bond angles and bond lengths to increase the diversity of conformations sampled. The ``fixed`` strategy uses average bond geometries on a per-residue basis derived from an extended Dunbrack PISCES database ``cull_d200611/200611/cullpdb_pc90_res1.6_R0.25_d200611_chains8807``. The ``exact`` method uses exact bond/bend angles and bond lengths for each residue in a fragment sampled from the database. To initialize the new, backwards-compatible, database use the ``bgeodb`` module on the ``idpconfgen_database.json`` previously generated using ``torsions``. Reminder, this requires the ``sscalc_splitted.tar`` file generated by ``sscalc`` in the early stages of creating the database:: idpconfgen bgeodb sscalc_splitted.tar -sc idpconfgen_database.json -o idpconfgen_extended_database.json -n A Real Case Scenario -------------------- .. include:: ../example/drksh3_example/README.rst :start-after: .. start-description Modeling Disordered Region Tails on a Folded Domain --------------------------------------------------- .. note:: When modeling multi-chain complexes with the ``ldrs`` subclient, the FASTA file format for the ``-seq`` parameter must be as follows with no blank spaces. ``>A`` ``Sequence for chain A`` ``>B`` ``Sequence for chain B`` If you would like to skip a chain while modeling multi-chain complexes, you must have the identical sequence in the ``.fasta`` file to the chains in the template you would like to skip. Clash-checking and will be done with skipped-chains in consideration. .. include:: ../example/cnot7_example/README.rst :start-after: .. start-description Modeling Disordered Regions Within Folded Domains ------------------------------------------------- .. include:: ../example/slc26a9_example/README.rst :start-after: .. start-description Processing Low-Confidence Predicted Residues -------------------------------------------- .. include:: ../example/AF_example/README.rst :start-after: .. start-description Modeling Disordered Regions in a Multi-Chain Protein Complex ------------------------------------------------------------ .. include:: ../example/complex_example/README.rst :start-after: .. start-description Exploring IDPConfGen Analysis Functions --------------------------------------- Our vision for IDPConformerGenerator as a platform includes the analysis of your database and the PDBs generated by IDPConfGen. To get started, the :code:`stats` subclient is a quick way to check how many hits for different sequence fragment matches you will find in the database for your protein system of choice. It is also possible to include different secondary structure filters as well as amino-acid substitutions to get a more accurate representation the number of hits in the database for your system:: idpconfgen stats \ -db idpconfgen_database.json \ -seq drksh3.fasta \ --dloop-off \ --dany \ -op drk_any \ -of ./drk_any_dbStats Another tool to investiagte the database is the :code:`search` subclient. To use this, you will need a tarball or folder of raw PDBs required from the :code:`fetch` subclient. The :code:`search` function goes through the PDB headers to find keywords of your choice and returns the number of hits and their associated PDBIDs in .JSON format:: idpconfgen fetch \ ../cull100 \ -d ./cull100pdbs/ \ -u \ -n idpconfgen search \ -fpdb ./cull100pdbs/ \ -kw 'thermococcus,pro,beta' \ -n After generating conformer ensembles with IDPConfGen, it is possible to do some basic plotting with the integrated plotting flags in the :code:`torsions` and :code:`sscalc` subclients. For :code:`torsions`, you can choose to plot either omega, phi, or psi dihedral angle distributions in a scatter plot format. For :code:`sscalc`, fractional secondary structure will be plotted in terms of DSSP codes as well as fractions from the alpha, beta, or other regions of the Ramachandran space for your conformers of choice. The following example plots the psi angle distributions and the fractional secondary structure of the :code:`drk_CSSSd2D_nosub_mcsce` ensemble generated in the previous module:: idpconfgen torsions \ ./drk_CSSSd2D_nosub_mcsce \ -deg \ -n \ --plot angtype=psi xlabel=drk_residues To plot the fractional Ramachandran space information:: idpconfgen torsions \ ./drk_CSSSd2D_nosub_mcsce \ -deg \ -n \ --ramaplot filename=fracDrkRama.png colors=['o', 'b', 'k'] To plot the fractional secondary structure information:: idpconfgen sscalc \ ./drk_CSSSd2D_nosub_mcsce \ -u \ -rd \ -n \ --plot filename=dssp_reduced_drk_.png To see which plotting parameters can be modified, please refer to :code:`src/idpconfgen/plotfuncs.py`. We have given a short list of modifyable parameters here:: --plot title= title_fs=<TITLE FONT SIZE> xlabel=<X-AXIS LABEL> xlabel_fs=<X-AXIS LABEL FONT SIZE> colors=<LIST_OF_COLORS> Exploring MC-SCE and Int2Cart Integrations ------------------------------------------ Integrating the functions from our collaborators at the `Head-Gordon Lab <https://thglab.berkeley.edu/>`_, IDPConformerGenerator has the ability to build with bond geometries derived from a recurrent neural network machine learning model `Int2Cart <https://github.com/THGLab/int2cart>`_. Furthermore, as we introduced the `MC-SCE <https://github.com/THGLab/MCSCE>`_ method for building sidechains in the previous modules, we would like to provide some examples on changing the default sidechain settings. To use the Int2Cart method for bond geometries, the :code:`--bgeo-strategy` flag needs to be defined with ``int2cart`` duringthe building stage:: idpconfgen build \ -db idpconfgen_database.json \ -seq drksh3.fasta \ -etbb 100 \ -etss 250 \ -nc 100 \ -csss csss_drk_d2D.json \ --dloop-off \ -et 'pairs' \ -scm mcsce \ --bgeo-strategy int2cart \ -of ./drk_CSSSd2D_nosub_int2cart_mcsce \ -n To change the number of trials for MC-SCE to optimize success rate and overall speed:: idpconfgen build \ -db idpconfgen_database.json \ -seq drksh3.fasta \ -etbb 100 \ -etss 250 \ -nc 100 \ -csss csss_drk_d2D.json \ --dloop-off \ -et 'pairs' \ -scm mcsce \ --mcsce-n_trials 64 \ -of ./drk_CSSSd2D_nosub_32_trials_mcsce \ -n How to Efficiently Set Jobs up for HPC Clusters ----------------------------------------------- Using the :code:`sethpc` subclient, users can generate bash scripts for SLURM managed systems. Due to architecture of Python's multiprocessing module, IDPConformerGenerator is unable to utilize the resources of multiple nodes on HPC clusters. However, with :code:`sethpc`, users are able to request multiple nodes per job and :code:`sethpc` will automatically generate the SBATCH scripts needed, along with an :code:`all*.sh` and :code:`cancel*.sh` script to run/cancel all of the jobs generated with ease. Please note that on many HPC resources (such as Graham) your queuing priority will not change requesting 5 nodes per job or 1 node per 5 jobs, but this should be confirmed. If multiple nodes are requested, at the end of all jobs, the :code:`merge` subclient can be run to merge all of the conformers generated into one folder with the option of modify the naming-pattern for each structure. Please see below for an example of running :code:`sethpc` and :code:`merge`. To request 3 nodes to generate 512,000 structures of the unfolded state of the drkN SH3 domain with 10 hours per node:: idpconfgen sethpc \ -des ./drk_hpc_jobs/ \ --account def-username \ --job-name drk_hpc \ --nodes 3 \ --ntasks-per-node 32 \ --mem 16g \ --time-per-node 0-10:00:00 \ --mail-user your@email.com \ -db idpconfgen_database.json \ -seq drksh3.fasta \ -etbb 100 \ -etss 250 \ -nc 512000 \ -csss csss_drk_d2D.json \ --dloop-off \ -et 'pairs' \ -scm mcsce \ --bgeo-strategy int2cart \ -of /scratch/user/drk/ \ -n 32 \ -rs 12 To merge all of the folders created by the multi-node jobs:: idpconfgen merge \ -tgt /scratch/user/drk/ \ -des /scratch/user/drk/drk_CSSSd2D_nosub_multiple_mcsce \ -pre drk_confs \ -del Using IDPConfgen as Python library ---------------------------------- To use IDPConformerGenerator in your project, import it as a library:: import idpconfgen From within the Python prompt you can get information on each module, class, and function with ``help(idpconfgen)``. You can also access the whole API documentation here at :ref:`the reference page <Reference>`.