Lib build¶

Tools for conformer building operations.

class idpconfgen.libs.libbuild.ConfLabels(atom_labels, res_nums, res_labels)¶

Contain label information for a protein/conformer.

Variables:

atom_labels (np.array) –
res_nums (np.array) –
res_labels (np.array) –

atom_labels¶: Alias for field number 0

res_labels¶: Alias for field number 2

res_nums¶: Alias for field number 1

idpconfgen.libs.libbuild.ConfMasks¶: alias of ConfMaks

idpconfgen.libs.libbuild.are_connected(n1, n2, rn1, a1, a2, bonds_intra, bonds_inter)[source]¶

Detect if a certain atom pair is bonded accordind to criteria.

Considers only to the self residue and next residue

idpconfgen.libs.libbuild.build_regex_substitutions(s, options, pre_treatment=<class 'list'>, post_treatment=<built-in method join of str object>)[source]¶

Build character replacements in regex string.

Example

>>> build_regex_substitutions('ASD', {'S': 'SE'})
'A[SE]D'

>>> build_regex_substitutions('ASDS', {'S': 'SE'})
'A[SE]D[SE]'

>>> build_regex_substitutions('ASDS', {})
'ASDS'

Parameters:

s (regex string)
options (dict) – Dictionary of char to multichar substitutions
pre_treatment (callable, optional) – A treatment to apply in s before substitution. pre_treatment must return a list-like object. Default: list because it expects s to be a string.
post_treatment (callable, optional) – A function to apply on the resulting list-like object before returning. Default: ‘’.join, to return a string.

idpconfgen.libs.libbuild.create_Coulomb_params_raw(atom_labels, residue_numbers, residue_labels, force_field)[source]¶: .

idpconfgen.libs.libbuild.create_LJ_params_raw(atom_labels, residue_numbers, residue_labels, force_field)[source]¶: Create ACOEFF and BCOEFF parameters.

idpconfgen.libs.libbuild.create_bonds_apart_mask_for_ij_pairs(atom_labels, residue_numbers, residue_labels, bonds_intra, bonds_inter, base_bool=False)[source]¶

Create bool mask array identifying the pairs X bonds apart in ij pairs.

Given bonds_intra and bonds_inter criteria, idenfities those ij atom pairs in N*(N-1)/2 condition (upper all vs all diagonal) that agree with the described bonds.

Inter residue bonds are only considered for consecutive residues.

Parameters:

atom_labels (iterable, list or np.ndarray) – The protein atom labels. Ex: [‘N’, ‘CA, ‘C’, ‘O’, ‘CB’, …]
residue_numbers (iterable, list or np.ndarray) – The protein residue numbers per atom in atom_labels. Ex: [1, 1, 1, 1, 1, 2, 2, 2, 2, …]
residue_labels (iterable, list or np.ndarray) – The protein residue labels per atom in atom_labels. Ex: [‘Met’, ‘Met’, ‘Met’, …]

See also

create_conformer_labels, ConfLabels

idpconfgen.libs.libbuild.init_confmasks(atom_labels)[source]¶

Create a ConfMask object (namedtuple).

ConfMask is a named tuple which attributes are integer masks for the respective groups.

Parameters:: atom_labels (array-like) – The atom names of the protein.
Returns:: namedtuple – ConfMasks object.

Notes

ConfMask attributes map to the following atom groups:

bb3 : N, CA, C
bb4 : N, CA, C, O
NHs : amide protons
Hterm : N-terminal protons
OXT1 : O atom of C-terminal carboxyl group
OXT2 : OXT atom of the C-terminal carboxyl group
cterm : (OXT2, OXT1)
non_Hs : all but hydrogens
non_Hs_non_OXT : all but hydrogens and the only OXT atom
non_NHs_non_OXT : all but NHs and OXT atom
H2_N_CA_CB : these four atoms from the first residue
             if Gly, uses HA3.
non_sidechains : all atoms except sidechains beyond CB
all_sidechain : all sidechain atoms including CB and HA

idpconfgen.libs.libbuild.make_combined_regex(regexes)[source]¶

Make a combined regex with ORs.

To be used with re.fullmatch.

idpconfgen.libs.libbuild.make_list_atom_labels(input_seq, atom_labels_dictionary)[source]¶

Make a list of the atom labels for an input_seq.

Considers the N-terminal to be protonated H1 to H3, or H1 only for the case of Proline. Adds also ‘OXT’ terminal label.

Parameters:

input_seq (str) – 1-letter amino-acid sequence.
atom_labels_dictionary (dict) – The ORDERED atom labels per residue.

Returns:

list – List of consecutive atom labels for the protein.

idpconfgen.libs.libbuild.populate_dict_with_database(xmers, res_tolerance, primary, secondary, combined_dssps)[source]¶

Identify sampling positions.

Identifies slices in primary and secondary where the combined_dssps regexes apply. Considers also residue tolerance.

This function is used internally for prepare_slice_dict with multiprocessing.

Parameters:

xmers (list) – The list of all protein fragments we want to search in the primary and secondary “database” strings. This list should contain only sequence of the same length.
combined_dssps (string) – A string regex prepared with all the combined DSSPs that need to be searched: see code in prepare_slice_dict and make_combined_regex.
others – Other parameters are like described in prepare_slice_dict.

Returns:

int, dict – The length of the xmers in xmers list. The dictionary with the identified slice positions.

idpconfgen.libs.libbuild.prepare_energy_function(atom_labels, residue_numbers, residue_labels, forcefield, lj_term=True, coulomb_term=False, energy_type_ij='pairs', **kwnull)[source]¶

Prepare energy function.

Parameters:

lj_term (bool) – Whether to compute the Lennard-Jones term during building and validation. If false, expect a physically meaningless result.
coulomb_term (bool) – Whether to compute the Coulomb term during building and validation. If false, expect a physically meaningless result.
energy_type_ij (str) – How to calculate the energy for ij pairs. See libs.libenergyij.post_calc_options.

idpconfgen.libs.libbuild.prepare_slice_dict(primary, input_seq, csss=False, dssp_regexes=None, secondary=None, mers_size=(1, 2, 3, 4, 5), res_tolerance=None, ncores=1)[source]¶

Prepare a dictionary mapping fragments to slices in primary.

Protocol:

1) The input sequence is split into all different possible smaller peptides according to the mers_size tuple. Let’s call these smaller peptides, XMERS.

2) We search in the primary string where are all possible sequence matches for each of the XMERS. And, we consider possible residue substitution in the XMERS according to the res_tolerance dictionary, if given.

2.1) the found positions are saved in a list of slice objects that populates the final dictionary (see Return section).

2.2) We repeat the process but considering the XMER can be followed by a proline.

2.3) We save in the dictionary only those entries for which we find matches in the primary.

3) optional if csss is given. Here, we rearrange the output dictionary so it becomes compatible with the build process. The process goes as follows:

3.1) For each slice found in 2) we inspect the secondary string if it matches any of the dssp_regex. If it matches we consider that slice to the fragment-size, the XMER identify, the DSSP key. This allow us in the build process to specify which SS to sample for specific regions of the conformer.

Parameters:

primary (str) – A concatenated version of all primary sequences in the database. In the form of “QWERY|IPASDF”, etc.
input_seq (str) – The 1-letter code amino-acid sequence of the conformer to construct.
csss (bool) – Whether to update the output according ot the CSSS probabilities of secondary structures per amino acid residue position. Will only be used when CSSS is activated.
dssp_regexes (list-like) – List of all DSSP codes to look for in the sequence. Will only be used when csss is True.
secondary (str) – A concatenated version of secondary structure codes that correspond to primary. In the form of “LLLL|HHHH”, etc. Only needed if csss True.
mers_size (iterable) – A iterable of integers denoting the size of the fragments to search for. Defaults from 1 to 5.
res_tolerance (dict) – A dictionary mapping residue tolerances, for example: {“A”: “AIL”}, noting Ala can be replaced by Ile and Leu in the search (this is a dummy example).
ncores (int) – The number of processors to use.

Returns:

dict –

A dict with the given mapping:

1) First key-level of the dict is the length of the fragments, hence, integers.

2) The second key level are the residue fragments found in the primary. A fragment in input_seq but not in primary is removed from the dict.

3) only if csss is True. Adds a new layer organizing the slice objects with the SS keys.

Lib build¶

Table of Contents

This Page