Usage
=====

IDPConformerGenerator runs entirely through command-lines. Follow the
explanations in this page plus the documentation on the command-line themselves.

Command-lines
-------------

To execute :code:`idpconfgen` command-line, run :code:`idpconfgen` in your
terminal window, after :ref:`installation <Installation>`::

    idpconfgen

or::

    idpconfgen -h

Both will output the help menu.

.. note::
    All subclients have the :code:`-h` option to show help information.

:code:`idpconfgen` has several interfaces that perform different functions.
However, there is a sequence of interfaces that need to be used to prepare the
local torsion angle database and the files needed to build conformers. After
these operations executed, you will end up with a single :code:`json` file that
you can use to feed the build calculations. The other files are safe to be
removed.

IDPConfGen Small Peptide Example
--------------------------------

The :code:`example/` folder contains instructions to setup
IDPConformerGenerator database from scratch and generate conformers for a small
peptide.

.. include:: ../example/small_peptide/README.rst
   :start-after: .. start-description
   :end-before: .. end-description


Building With Variable Bond-Geometry Strategies
-----------------------------------------------

To increase user flexibility and build parameterization,.IDPConformerGenerator
has 4 bond geometry strategies to choose from: sampling (default), fixed, exact,
and int2cart. These strategies could be selected from using the ``--bgeo-strategy``
flag during the ``build`` process. Please note that a new database needs to be
generated with the ``bgeodb`` subclient to use the ``exact`` strategy, and Int2Cart
will need to be installed to use the ``int2cart`` strategy (see below).

The default ``sampling`` strategy aims to overcome limitations of having fixed bond
angles and bond lengths to increase the diversity of conformations sampled. The ``fixed``
strategy uses average bond geometries on a per-residue basis derived from an extended
Dunbrack PISCES database ``cull_d200611/200611/cullpdb_pc90_res1.6_R0.25_d200611_chains8807``.
The ``exact`` method uses exact bond/bend angles and bond lengths for each residue in a 
fragment sampled from the database. To initialize the new, backwards-compatible, database
use the ``bgeodb`` module on the ``idpconfgen_database.json`` previously generated using
``torsions``. Reminder, this requires the ``sscalc_splitted.tar`` file generated by ``sscalc``
in the early stages of creating the database::

    idpconfgen bgeodb sscalc_splitted.tar -sc idpconfgen_database.json -o idpconfgen_extended_database.json -n


A Real Case Scenario
--------------------

.. include:: ../example/drksh3_example/README.rst
   :start-after: .. start-description

Modeling Disordered Region Tails on a Folded Domain
---------------------------------------------------

.. note::
    
    When modeling multi-chain complexes with the ``ldrs`` subclient,
    the FASTA file format for the ``-seq`` parameter must be as follows with no
    blank spaces.
    
    ``>A``
    ``Sequence for chain A``
    ``>B``
    ``Sequence for chain B``

    If you would like to skip a chain while modeling multi-chain complexes,
    you must have the identical sequence in the ``.fasta`` file to the chains
    in the template you would like to skip.

    Clash-checking and will be done with skipped-chains in consideration.

.. include:: ../example/cnot7_example/README.rst
    :start-after: .. start-description

Modeling Disordered Regions Within Folded Domains
-------------------------------------------------

.. include:: ../example/slc26a9_example/README.rst
    :start-after: .. start-description

Processing Low-Confidence Predicted Residues
--------------------------------------------

.. include:: ../example/AF_example/README.rst
    :start-after: .. start-description

Modeling Disordered Regions in a Multi-Chain Protein Complex
------------------------------------------------------------

.. include:: ../example/complex_example/README.rst
    :start-after: .. start-description

Exploring IDPConfGen Analysis Functions
---------------------------------------

Our vision for IDPConformerGenerator as a platform includes the analysis of your database and the PDBs generated by IDPConfGen.
To get started, the :code:`stats` subclient is a quick way to check how many hits for different sequence fragment matches you will
find in the database for your protein system of choice. It is also possible to include different secondary structure filters as well
as amino-acid substitutions to get a more accurate representation the number of hits in the database for your system::

    idpconfgen stats \
        -db idpconfgen_database.json \
        -seq drksh3.fasta \
        --dloop-off \
        --dany \
        -op drk_any \
        -of ./drk_any_dbStats

Another tool to investiagte the database is the :code:`search` subclient. To use this, you will need a tarball or folder of raw PDBs required
from the :code:`fetch` subclient. The :code:`search` function goes through the PDB headers to find keywords of your choice and returns the
number of hits and their associated PDBIDs in .JSON format::

    idpconfgen fetch \
        ../cull100 \
        -d ./cull100pdbs/ \
        -u \
        -n

    idpconfgen search \
        -fpdb ./cull100pdbs/ \
        -kw 'thermococcus,pro,beta' \
        -n

After generating conformer ensembles with IDPConfGen, it is possible to do some basic plotting with the integrated plotting flags
in the :code:`torsions` and :code:`sscalc` subclients. For :code:`torsions`, you can choose to plot either omega, phi, or psi dihedral
angle distributions in a scatter plot format. For :code:`sscalc`, fractional secondary structure will be plotted in terms of DSSP codes
as well as fractions from the alpha, beta, or other regions of the Ramachandran space for your conformers of choice. The following example
plots the psi angle distributions and the fractional secondary structure of the :code:`drk_CSSSd2D_nosub_mcsce` ensemble generated in the previous
module::

    idpconfgen torsions \
        ./drk_CSSSd2D_nosub_mcsce \
        -deg \
        -n \
        --plot angtype=psi xlabel=drk_residues
    
To plot the fractional Ramachandran space information::

    idpconfgen torsions \
        ./drk_CSSSd2D_nosub_mcsce \
        -deg \
        -n \
        --ramaplot filename=fracDrkRama.png colors=['o', 'b', 'k']

To plot the fractional secondary structure information::

    idpconfgen sscalc \
        ./drk_CSSSd2D_nosub_mcsce \
        -u \
        -rd \
        -n \
        --plot filename=dssp_reduced_drk_.png

To see which plotting parameters can be modified, please refer to :code:`src/idpconfgen/plotfuncs.py`. We have given a short list of modifyable parameters here::

    --plot title=<TITLE> title_fs=<TITLE FONT SIZE> xlabel=<X-AXIS LABEL> xlabel_fs=<X-AXIS LABEL FONT SIZE> colors=<LIST_OF_COLORS>

Exploring MC-SCE and Int2Cart Integrations
------------------------------------------

Integrating the functions from our collaborators at the `Head-Gordon Lab <https://thglab.berkeley.edu/>`_,
IDPConformerGenerator has the ability to build with bond geometries derived from a recurrent neural network
machine learning model `Int2Cart <https://github.com/THGLab/int2cart>`_. Furthermore, as we introduced
the `MC-SCE <https://github.com/THGLab/MCSCE>`_ method for building sidechains in the previous modules,
we would like to provide some examples on changing the default sidechain settings.

To use the Int2Cart method for bond geometries, the :code:`--bgeo-strategy` flag needs to be defined with
``int2cart`` duringthe building stage::

    idpconfgen build \
        -db idpconfgen_database.json \
        -seq drksh3.fasta \
        -etbb 100 \
        -etss 250 \
        -nc 100 \
        -csss csss_drk_d2D.json \
        --dloop-off \
        -et 'pairs' \
        -scm mcsce \
        --bgeo-strategy int2cart \
        -of ./drk_CSSSd2D_nosub_int2cart_mcsce \
        -n

To change the number of trials for MC-SCE to optimize success rate and overall speed::

    idpconfgen build \
        -db idpconfgen_database.json \
        -seq drksh3.fasta \
        -etbb 100 \
        -etss 250 \
        -nc 100 \
        -csss csss_drk_d2D.json \
        --dloop-off \
        -et 'pairs' \
        -scm mcsce \
        --mcsce-n_trials 64 \
        -of ./drk_CSSSd2D_nosub_32_trials_mcsce \
        -n


How to Efficiently Set Jobs up for HPC Clusters
-----------------------------------------------

Using the :code:`sethpc` subclient, users can generate bash scripts for SLURM managed
systems. Due to architecture of Python's multiprocessing module, IDPConformerGenerator is
unable to utilize the resources of multiple nodes on HPC clusters. However, with :code:`sethpc`,
users are able to request multiple nodes per job and :code:`sethpc` will automatically generate
the SBATCH scripts needed, along with an :code:`all*.sh` and :code:`cancel*.sh` script to run/cancel
all of the jobs generated with ease.

Please note that on many HPC resources (such as Graham) your queuing priority will not change 
requesting 5 nodes per job or 1 node per 5 jobs, but this should be confirmed.

If multiple nodes are requested, at the end of all jobs, the :code:`merge` subclient can be run to
merge all of the conformers generated into one folder with the option of modify the naming-pattern
for each structure. Please see below for an example of running :code:`sethpc` and :code:`merge`.

To request 3 nodes to generate 512,000 structures of the unfolded state of the drkN SH3 domain with 10 hours per node::
    
    idpconfgen sethpc \
        -des ./drk_hpc_jobs/ \
        --account def-username \
        --job-name drk_hpc \
        --nodes 3 \
        --ntasks-per-node 32 \
        --mem 16g \
        --time-per-node 0-10:00:00 \
        --mail-user your@email.com \
        -db idpconfgen_database.json \
        -seq drksh3.fasta \
        -etbb 100 \
        -etss 250 \
        -nc 512000 \
        -csss csss_drk_d2D.json \
        --dloop-off \
        -et 'pairs' \
        -scm mcsce \
        --bgeo-strategy int2cart \
        -of /scratch/user/drk/ \
        -n 32 \
        -rs 12 

To merge all of the folders created by the multi-node jobs::

    idpconfgen merge \
        -tgt /scratch/user/drk/ \
        -des /scratch/user/drk/drk_CSSSd2D_nosub_multiple_mcsce \
        -pre drk_confs \
        -del

Using IDPConfgen as Python library
----------------------------------

To use IDPConformerGenerator in your project, import it as a library::

    import idpconfgen

From within the Python prompt you can get information on each module, class, and
function with ``help(idpconfgen)``. You can also access the whole API
documentation here at :ref:`the reference page <Reference>`.