Lib build

Tools for conformer building operations.

class idpconfgen.libs.libbuild.ConfLabels(atom_labels, res_nums, res_labels)

Contain label information for a protein/conformer.

Variables:
  • atom_labels (np.array) –

  • res_nums (np.array) –

  • res_labels (np.array) –

atom_labels

Alias for field number 0

res_labels

Alias for field number 2

res_nums

Alias for field number 1

idpconfgen.libs.libbuild.ConfMasks

alias of ConfMaks

idpconfgen.libs.libbuild.are_connected(n1, n2, rn1, a1, a2, bonds_intra, bonds_inter)[source]

Detect if a certain atom pair is bonded accordind to criteria.

Considers only to the self residue and next residue

idpconfgen.libs.libbuild.build_regex_substitutions(s, options, pre_treatment=<class 'list'>, post_treatment=<built-in method join of str object>)[source]

Build character replacements in regex string.

Example

>>> build_regex_substitutions('ASD', {'S': 'SE'})
'A[SE]D'
>>> build_regex_substitutions('ASDS', {'S': 'SE'})
'A[SE]D[SE]'
>>> build_regex_substitutions('ASDS', {})
'ASDS'
Parameters:
  • s (regex string)

  • options (dict) – Dictionary of char to multichar substitutions

  • pre_treatment (callable, optional) – A treatment to apply in s before substitution. pre_treatment must return a list-like object. Default: list because it expects s to be a string.

  • post_treatment (callable, optional) – A function to apply on the resulting list-like object before returning. Default: ‘’.join, to return a string.

idpconfgen.libs.libbuild.create_Coulomb_params_raw(atom_labels, residue_numbers, residue_labels, force_field)[source]

.

idpconfgen.libs.libbuild.create_LJ_params_raw(atom_labels, residue_numbers, residue_labels, force_field)[source]

Create ACOEFF and BCOEFF parameters.

idpconfgen.libs.libbuild.create_bonds_apart_mask_for_ij_pairs(atom_labels, residue_numbers, residue_labels, bonds_intra, bonds_inter, base_bool=False)[source]

Create bool mask array identifying the pairs X bonds apart in ij pairs.

Given bonds_intra and bonds_inter criteria, idenfities those ij atom pairs in N*(N-1)/2 condition (upper all vs all diagonal) that agree with the described bonds.

Inter residue bonds are only considered for consecutive residues.

Parameters:
  • atom_labels (iterable, list or np.ndarray) – The protein atom labels. Ex: [‘N’, ‘CA, ‘C’, ‘O’, ‘CB’, …]

  • residue_numbers (iterable, list or np.ndarray) – The protein residue numbers per atom in atom_labels. Ex: [1, 1, 1, 1, 1, 2, 2, 2, 2, …]

  • residue_labels (iterable, list or np.ndarray) – The protein residue labels per atom in atom_labels. Ex: [‘Met’, ‘Met’, ‘Met’, …]

idpconfgen.libs.libbuild.create_conformer_labels(input_seq, atom_names_definition, transfunc=<function translate_seq_to_3l>)[source]

Create all atom/residue labels model based on an input sequence.

The labels are those expected for a all atom model PDB file. Hence, residue labels are repeated as needed in order to exist one residue label/number per atom.

Parameters:
  • input_seq (str) – The protein input sequence in 1-letter code format.

  • atom_names_definition (dict) – Keys are residue identity and values are list/tuple of strings identifying atoms. Atom names should be sorted by the desired order.

  • transfunc (func) – Function used to translate 1-letter input sequence to 3-letter sequence code.

Returns:

tuple (atom labels, residue numbers, residue labels) – Each is a np.ndarray of types: ‘<U4’, int, and ‘<U3’ and shape (N,) where N is the number of atoms. The three arrays have the same length.

idpconfgen.libs.libbuild.create_sidechains_masks_per_residue(residue_numbers, atom_labels, backbone_atoms)[source]

Create a map of numeric indexing masks pointing to side chains atoms.

Create separate masks per residue.

Parameters:
  • residue_numbers (np.ndarray, shape (N,)) – The atom residue numbers of the protein.

  • atom_labels (np.ndarray, shape (N,)) – The atom labels of the protein.

  • backbone_atoms (list or tuple) – The labels of all possible backbone atoms.

Returns:

list of tuples of length 2 – List indexes refer to protein residues, index 0 is residue 1. Per residue, a tuple of length 2 is given. Tuple index 0 are the indexes of that residue sidechain atoms mapped to an array of the atom_labels and residue_numbers characteristics. The tuple index 1 is an array of length M, where M is the number of sidechain atoms for that residue, defaults to np.nan.

idpconfgen.libs.libbuild.extract_ff_params_for_seq(atom_labels, residue_numbers, residue_labels, force_field, param)[source]

Extract a parameter from forcefield dictionary for a given sequence.

Parameters:
  • atom_labels, residue_numbers, residue_labels – As returned by :func:create_conformer_labels.

  • forcefield (dict)

  • param (str) – The param to extract from forcefield dictionary.

idpconfgen.libs.libbuild.gen_3l_residue_labels_per_atom(input_seq_3letter, atom_labels)[source]

Generate residue 3-letter labels per atom.

Parameters:
  • input_seq_3letter (list of 3letter residue codes) – Most not be a generator.

  • atom_labels (list or tuple of atom labels) – Most not be a generator.

Yields:

String of length 3 – The 3-letter residue code per atom.

idpconfgen.libs.libbuild.gen_atom_pair_connectivity_masks(res_names_ij, res_num_ij, atom_names_ij, connectivity_intra, connectivity_inter)[source]

Generate atom pair connectivity indexes.

Given atom information for the ij pairs and connectivity criteria, yields the index of the ij pair if the pair is connected according to the connectivity criteria.

For example, if the ij pair is covalently bonded, or 3 bonds apart, etc.

Parameters:
  • res_names_ij

  • res_num_ij,

  • atom_names_ij, iterables of the same length and synchronized information.

  • connectivity_intra,

  • connectivity_inter, dictionaries mapping atom labels connectivity

  • Depends

  • ——-

  • `are_connected`

idpconfgen.libs.libbuild.gen_ij_pairs_upper_diagonal(data)[source]

Generate upper diagonal ij pairs in tuples.

The diagonal is not considered.

Yields:

tuple of length 2 – IJ pairs in the form of N*(N-1) / 2.

idpconfgen.libs.libbuild.gen_residue_number_per_atom(atom_labels, start=1)[source]

Create a list of residue numbers based on atom labels.

This is a contextualized function, not an abstracted one. Considers N to be the first atom of the residue.

Yields:

ints – The integer residue number per atom label.

idpconfgen.libs.libbuild.get_cycle_bond_type()[source]

Return an infinite interator of the bond types.

Labels returns are synced with bgeo library. See core.definitions.bgeo_*.

idpconfgen.libs.libbuild.get_cycle_distances_backbone()[source]

Return an inifinite iterator of backbone atom distances.

Sampling, in order, distances between atom pairs:
  • N - Ca, used for OMEGA

  • Ca - C, used for PHI

  • C - N(+1), used for PSI

idpconfgen.libs.libbuild.get_indexes_from_primer_length(sequence, plen, current_residue)[source]

Get sequence fragment based on position and length.

idpconfgen.libs.libbuild.init_conflabels(*args, **kwargs)[source]

Create atom and residue labels from sequence.

Parameters:

*args, **kwargs – Whichever :func:create_conformer_labels accepts.

Returns:

namedtuple – ConfLabels named tuple populated according to input sequence.

idpconfgen.libs.libbuild.init_confmasks(atom_labels)[source]

Create a ConfMask object (namedtuple).

ConfMask is a named tuple which attributes are integer masks for the respective groups.

Parameters:

atom_labels (array-like) – The atom names of the protein.

Returns:

namedtuple – ConfMasks object.

Notes

ConfMask attributes map to the following atom groups:

bb3 : N, CA, C
bb4 : N, CA, C, O
NHs : amide protons
Hterm : N-terminal protons
OXT1 : O atom of C-terminal carboxyl group
OXT2 : OXT atom of the C-terminal carboxyl group
cterm : (OXT2, OXT1)
non_Hs : all but hydrogens
non_Hs_non_OXT : all but hydrogens and the only OXT atom
non_NHs_non_OXT : all but NHs and OXT atom
H2_N_CA_CB : these four atoms from the first residue
             if Gly, uses HA3.
non_sidechains : all atoms except sidechains beyond CB
all_sidechain : all sidechain atoms including CB and HA
idpconfgen.libs.libbuild.make_combined_regex(regexes)[source]

Make a combined regex with ORs.

To be used with re.fullmatch.

idpconfgen.libs.libbuild.make_list_atom_labels(input_seq, atom_labels_dictionary)[source]

Make a list of the atom labels for an input_seq.

Considers the N-terminal to be protonated H1 to H3, or H1 only for the case of Proline. Adds also ‘OXT’ terminal label.

Parameters:
  • input_seq (str) – 1-letter amino-acid sequence.

  • atom_labels_dictionary (dict) – The ORDERED atom labels per residue.

Returns:

list – List of consecutive atom labels for the protein.

idpconfgen.libs.libbuild.populate_dict_with_database(xmers, res_tolerance, primary, secondary, combined_dssps)[source]

Identify sampling positions.

Identifies slices in primary and secondary where the combined_dssps regexes apply. Considers also residue tolerance.

This function is used internally for prepare_slice_dict with multiprocessing.

Parameters:
  • xmers (list) – The list of all protein fragments we want to search in the primary and secondary “database” strings. This list should contain only sequence of the same length.

  • combined_dssps (string) – A string regex prepared with all the combined DSSPs that need to be searched: see code in prepare_slice_dict and make_combined_regex.

  • others – Other parameters are like described in prepare_slice_dict.

Returns:

int, dict – The length of the xmers in xmers list. The dictionary with the identified slice positions.

idpconfgen.libs.libbuild.prepare_energy_function(atom_labels, residue_numbers, residue_labels, forcefield, lj_term=True, coulomb_term=False, energy_type_ij='pairs', **kwnull)[source]

Prepare energy function.

Parameters:
  • lj_term (bool) – Whether to compute the Lennard-Jones term during building and validation. If false, expect a physically meaningless result.

  • coulomb_term (bool) – Whether to compute the Coulomb term during building and validation. If false, expect a physically meaningless result.

  • energy_type_ij (str) – How to calculate the energy for ij pairs. See libs.libenergyij.post_calc_options.

idpconfgen.libs.libbuild.prepare_slice_dict(primary, input_seq, csss=False, dssp_regexes=None, secondary=None, mers_size=(1, 2, 3, 4, 5), res_tolerance=None, ncores=1)[source]

Prepare a dictionary mapping fragments to slices in primary.

Protocol:

1) The input sequence is split into all different possible smaller peptides according to the mers_size tuple. Let’s call these smaller peptides, XMERS.

2) We search in the primary string where are all possible sequence matches for each of the XMERS. And, we consider possible residue substitution in the XMERS according to the res_tolerance dictionary, if given.

2.1) the found positions are saved in a list of slice objects that populates the final dictionary (see Return section).

2.2) We repeat the process but considering the XMER can be followed by a proline.

2.3) We save in the dictionary only those entries for which we find matches in the primary.

3) optional if csss is given. Here, we rearrange the output dictionary so it becomes compatible with the build process. The process goes as follows:

3.1) For each slice found in 2) we inspect the secondary string if it matches any of the dssp_regex. If it matches we consider that slice to the fragment-size, the XMER identify, the DSSP key. This allow us in the build process to specify which SS to sample for specific regions of the conformer.

Parameters:
  • primary (str) – A concatenated version of all primary sequences in the database. In the form of “QWERY|IPASDF”, etc.

  • input_seq (str) – The 1-letter code amino-acid sequence of the conformer to construct.

  • csss (bool) – Whether to update the output according ot the CSSS probabilities of secondary structures per amino acid residue position. Will only be used when CSSS is activated.

  • dssp_regexes (list-like) – List of all DSSP codes to look for in the sequence. Will only be used when csss is True.

  • secondary (str) – A concatenated version of secondary structure codes that correspond to primary. In the form of “LLLL|HHHH”, etc. Only needed if csss True.

  • mers_size (iterable) – A iterable of integers denoting the size of the fragments to search for. Defaults from 1 to 5.

  • res_tolerance (dict) – A dictionary mapping residue tolerances, for example: {“A”: “AIL”}, noting Ala can be replaced by Ile and Leu in the search (this is a dummy example).

  • ncores (int) – The number of processors to use.

Returns:

dict

A dict with the given mapping:

1) First key-level of the dict is the length of the fragments, hence, integers.

2) The second key level are the residue fragments found in the primary. A fragment in input_seq but not in primary is removed from the dict.

3) only if csss is True. Adds a new layer organizing the slice objects with the SS keys.