Lib Structure

Store internal protein structure representation.

Classes

Structure

The main API that represents a protein structure in IDPConfGen.

class idpconfgen.libs.libstructure.Structure(data, **kwargs)[source]

Hold structural data from PDB/mmCIF files.

Run the .build() method to read the structure.

Cases for PDB Files: * If there are several MODELS only the first model is considered.

Parameters:

data (str, bytes, Path) – Raw structural data from PDB/mmCIF formatted files. If data is a path to a file it must be a pathlib.Path object. If string or bytes, it must be the raw content of the input file.

Examples

Opens a PDB file, selects only chain ‘A’ and saves selection to a file. >>> s = Structure(Path(‘1ABC.pdb’)) >>> s.build() >>> s.add_filter_chain(‘A’) >>> s.write_PDB(‘out.pdb’)

Opens a mmCIF file, selects only residues above 50 and saves selection to a file. >>> s = Structure(Path(‘1ABC.cif’)) >>> s.build() >>> s.add_filter(lambda x: int(x[col_resSeq]) > 50) >>> s.write_PDB(‘out.pdb’)

>>> with open('1ABC.pdb', 'r') as fin:
>>>     lines = fin.read()
>>> s = Structure(lines)
>>> s.build()
add_filter(function)[source]

Add a function as filter.

add_filter_backbone(minimal=False)[source]

Add filter to consider only backbone atoms.

add_filter_chain(chain)[source]

Add filters for chain.

add_filter_record_name(record_name)[source]

Add filter for record names.

build()[source]

Read structure raw data in rawdata.

After .build(), filters and data can be accessed.

property chain_set

All chain IDs present in the raw dataset.

clear_filters()[source]

Clear/Deletes registered filters.

property consecutive_residues

Consecutive residue groups from filtered atoms.

property coords

Coordinates of the filtered atoms.

As float.

property data_array

Contain structure data in the form of a Numpy array.

property fasta

FASTA sequence of the filtered_atoms lines.

HETATM residues with non-canonical codes are represented as X.

property filtered_atoms

Filter data array by the selected filters.

Returns:

list – The data in PDB format after filtering.

property filtered_residues

Filter residues according to filters.

property filters

Filter functions registered ordered by registry record.

get_PDB(pdb_filters=None, renumber=True)[source]

Convert Structure to PDB format.

Considers only filtered lines.

Returns:

generator

get_sorted_minimal_backbone_coords(filtered=False)[source]

Generate a copy of the backbone coords sorted.

Sorting according N, CA, C.

This method was created because some PDBs may not have the backbone atoms sorted properly.

Parameters:

filtered (bool, optional) – Whether consider current filters or raw data.

pop_last_filter()[source]

Pop last filter.

property residues

Residues of the structure.

Without filtering, without chain separation.

write_PDB(filename, **kwargs)[source]

Write Structure to PDB file.

idpconfgen.libs.libstructure.concatenate_residue_labels(labels)[source]

Concatenate residue labels.

This function is a generator.

Parameters:

labels (numpy array of shape (N, M)) – Where N is the number of rows, and M the number of columns with the labels to be concatenated.

idpconfgen.libs.libstructure.detect_structure_type(datastr)[source]

Detect structure data parser.

Uses structure_parsers.

Returns:

parser – That which can parse datastr to a :py::class:`Structure’.

idpconfgen.libs.libstructure.filter_record_lines(lines, which='both')[source]

Filter lines to get record lines only.

idpconfgen.libs.libstructure.gen_empty_structure_data_array(number_of_atoms)[source]

Generate an array data structure to contain structure data.

Parameters:

number_of_atoms (int) – The number of atoms in the structure. Determines the size of the axis 0 of the structure array.

Returns:

np.ndarray of (N, (attr:libpdb.atom_slicers), dtype = ‘<U8’) – Where N is the ``number_of_atoms`.

idpconfgen.libs.libstructure.generate_backbone_pairs_labels(da)[source]

Generate backbone atom pairs labels.

Used to create columns in report summaries.

Parameters:

da (Structure.data_array - like)

Returns:

Numpy Array of dtype str, shape (N,) – Where N is the number of minimal backbone atoms.

idpconfgen.libs.libstructure.generate_residue_labels(*residue_labels, fmt=None, delimiter=' - ')[source]

Generate residue labels column.

Concatenate labels in residue_labels using

concatenate_residue_labels.

Parameters:

fmt (str, optional) – The string formatter by default we consider backbone atoms of a protein with less than 1000 residues. Defaults to None, uses ‘{:<8}’, 8 or multiple of 8 according to length of residue_labels.

idpconfgen.libs.libstructure.get_datastr(data)[source]

Get data in string format.

Can parse data from several formats:

  • Path, reads file content

  • bytes, converst to str

  • str, returns the input

Returns:

str – That represents the data

idpconfgen.libs.libstructure.is_backbone(atom, element, minimal=False)[source]

Whether atom is a protein backbone atom or not.

Parameters:
  • atom (str) – The atom name.

  • element (str) – The element name.

  • minimal (bool) – If True considers only C and N elements. False, considers also O.

idpconfgen.libs.libstructure.parse_cif_to_array(datastr, **kwargs)[source]

Parse mmCIF protein data to array.

Array is as given by gen_empty_structure_data_array().

idpconfgen.libs.libstructure.parse_pdb_to_array(datastr, which='both')[source]

Transform PDB data into an array.

Parameters:
  • datastr (str) – String representing the PDB format v3 file.

  • which (str) – Which lines to consider [‘ATOM’, ‘HETATM’, ‘both’]. Defaults to ‘both’, considers both ‘ATOM’ and ‘HETATM’.

Returns:

numpy.ndarray of (N, len(libpdb.atom_slicers)) – Where N are the number of ATOM and/or HETATM lines, and axis=1 the number of fields in ATOM/HETATM lines according to the PDB format v3.

idpconfgen.libs.libstructure.populate_structure_array_from_pdb(record_lines, data_array)[source]

Populate structure array from PDB lines.

Parameters:
  • record_lines (list-like) – The PDB record lines (ATOM or HETATM) to parse.

  • data_array (np.ndarray) – The array to populate.

Returns:

None – Populates array in place.

idpconfgen.libs.libstructure.save_structure_by_chains(pdb_data, pdbname, altlocs=('A', '', ' ', '1'), chains=None, record_name=('ATOM', 'HETATM'), renumber=True, **kwargs)[source]

Save PDBs/mmCIF in separated chains (PDB format).

Logic to parse PDBs from RCSB.

idpconfgen.libs.libstructure.structure_to_pdb(atoms)[source]

Convert table to PDB formatted lines.

Parameters:

atoms (np.ndarray, shape (N, 16) or similar data structure) – Where N is the number of atoms and 16 the number of cols.

Yields:

Formatted PDB line according to libpdb.atom_line_formatter.

idpconfgen.libs.libstructure.write_PDB(lines, filename)[source]

Write Structure data format to PDB.

Parameters:
  • lines (list or np.ndarray) – Lines contains PDB data as according to parse_pdb_to_array.

  • filename (str or Path) – The name of the output PDB file.