Lib Validate

Tools to validate conformers.

Recognition of this module should be grated to:
  • @AlaaShamandy

  • @joaomcteixeira

Please see: https://github.com/julie-forman-kay-lab/IDPConformerGenerator/pull/23

idpconfgen.libs.libvalidate.eval_bb_bond_length_distribution(name, pdb_data)[source]

.

idpconfgen.libs.libvalidate.evaluate_vdw_clash_by_threshold_from_disk(name, pdb_data, atoms_to_consider, elements_to_consider, **kwargs)[source]

Evaluate clashes in a structure.

Created to evaluate conformers from disk.

idpconfgen.libs.libvalidate.report_sequential_bon_len(bond_distances, expected_bond_length, invalid_bool, labels)[source]

Generate a report from bond distances of sequential bonds.

Returns:

string

idpconfgen.libs.libvalidate.report_vdw_clash(data_array, pair1, pair2, distances, radii_sum, overlap)[source]

Prepare a report of the identified clashes.

Parameters:
  • data_array (np.ndarray, shape (N, M)) – A numpy data_array as given by idpconfgen.libs.libstructure.Structure.data_array.

  • pair1, pair2 (ordered iterable of integers) – The row indexes where to retrieve atom information from `data_array´. pair1 and pair2 must be aligned in order for the report to make sense, that it, the first item of pair1 clashes with the first item of pair2.

  • distances, threshold (indexable of length equal to pair1/2 length) – Contain distances and clash thresholds for the different identified clashes.

Returns:

str – The report.

idpconfgen.libs.libvalidate.validate_bb_bond_len(coords, tolerance=0.01)[source]

Validate backbone bond lengths of coords.

Considers coords are already sorted to (N, CA, C) per residue. Considers only (N, CA, C) atoms are present in coords. Evalutes against N-CA, CA-C and C-Np1 distances used in IDPConfGen.

Parameters:

tolerance (float) – A tolerance in the same units as coords. Conflicts under the tolerance are consider valid.

Returns:

np.array, dtype=bool, shape (N-1,) – True if bond length is invalid. False if bond length is valid.

np.array, dtype=np.float, shape (N-1,)

The computed bond distances.

np.array, dtype=np.float, shape (N-1,)

The expected bond lengths

idpconfgen.libs.libvalidate.validate_bb_bonds_len_from_disk(name=None, pdb_data=None, tolerance=0.1)[source]

Validate backbone bond lengths of a structure stored in disk.

idpconfgen.libs.libvalidate.validate_conformer_for_builder(coords, atom_labels, residue_numbers, bb_mask, carbonyl_mask, LOGICAL_NOT=<ufunc 'logical_not'>, ISNAN=<ufunc 'isnan'>)[source]

.

idpconfgen.libs.libvalidate.vdw_clash_by_threshold(coords, protein_atoms, protein_elements, atoms_to_consider, elements_to_consider, residue_numbers, residues_apart=2, vdW_radii='tsai1999', vdW_overlap=0.0)[source]

Calculate vdW clashes from XYZ coordinates and identity masks.

Parameters:
  • coordinates (numpy array, dtype=float, shape (N, 3)) – The atom XYZ coordinates.

  • protein_atoms (numpy array, dtype=str, shape (N,)) – The protein atom names.

  • protein_elements (numpy array, dtype=str, shape(N,)) – The protein atom elements.

  • atoms_to_consider (list-like) – The atoms in protein_atoms to consider in the vdW clash analysis.

  • elements_to_consider (list-like) – The elements in protein_elements to consider in the vdW clash analysis.

  • residue_number (numpy array, dtype=int, shape (N,)) – The residue number corresponding to each atom.

  • residues_apart (int, optional) – The minimum number of residues apart to consider for a clash. Defaults to 2.

  • vdW_radii (str, optional) – The VDW radii set to consider. Defaults to ‘tsai1999’.

vdW_overlapfloat, optional

An overlap allowance in Angstroms. Defaults to 0.0, any distance less than vdW+vdW is considered a clash.

Returns:

Same vdw_clash_by_threshold_calc() returns.

idpconfgen.libs.libvalidate.vdw_clash_by_threshold_calc(coords, atc_mask, pure_radii_sum, distances_apart, vdW_overlap=0.0)[source]

Calculate van der Waals clashes from a pure sphere overlap.

Other masks used as parameters will be applied to the result of:

scipy.distance.cdist(coords, coords, ‘euclidean’)

Parameters:
  • coords (np.array, dtype=np.float, shape (N, 3)) – The protein XYZ coordinates.

  • atc_mask (np.array, dtype=np.bool, shape (N, N)) – A boolean masks to filter only the atoms relevant to report. Usually this mask is prepared beforehand and can contain different considerations, such as residues apart and especific atom types. If coords contain only the coordinates desired to compute, then atc_mask should contain only TRUE entries.

  • pure_radii_sum (np.array, dtype=float, shape (N, N)) – An all-to-all sum of the vdW radii. In other words, the threeshold after which a clash is considered to exist for each atom pair, before applying vdW_overlap allowance.

  • distances_apart (np.array, dtype=int, shape (N, N)) – An all-to-all atom-to-atom residue to residue distance matrix.

  • vdW_overlap (float) – The vdW overlap tolerance to apply.

Returns:

tuple – rows, cols : of cdist applied to coords where clashes where found. distances found for those clashes computed distance threshold overlap, computed overlap distance between threshold and distance

idpconfgen.libs.libvalidate.vdw_clash_by_threshold_common_preparation(protein_atoms, protein_elements, residue_numbers, atoms_to_consider=False, elements_to_consider=False, residues_apart=2, vdW_radii='tsai1999')[source]

Prepare masks for vdW clash calculation.

Masks are prepared considering all-to-all distances will be computed using scipy.distance.cdist, so a (N, 3) array originates a (N, N) distance result.

atoms_to_consider and elements_to_consider are evaluated with logical AND, that is, only entries that satisfy both are considered.

Parameters:
  • protein_atoms (np.array, of shape (N,), dtype string) – A sequence of the protein atom names.

  • protein_elements (np.array, of shape (N,), dytpe compatible with ‘<U2’) – A sequence of the protein atom elements aligned with protein_atoms.

  • residue_numbers (np.array of shape (N,), dtype=int)

  • atoms_to_consider (sequence, optional) – A tuple of the atom names to consider in the calculation. Defaults to FALSE, considers all atoms.

  • elements_to_consider (sequence, optional) – A tuple of the element types to consider in the calculation. Defaults to FALSE, considers all elements.