Lib Structure¶

Store internal protein structure representation.

Classes¶

Structure: The main API that represents a protein structure in IDPConfGen.

class idpconfgen.libs.libstructure.Structure(data, **kwargs)[source]¶

Hold structural data from PDB/mmCIF files.

Run the .build() method to read the structure.

Cases for PDB Files: * If there are several MODELS only the first model is considered.

Parameters:: data (str, bytes, Path) – Raw structural data from PDB/mmCIF formatted files. If data is a path to a file it must be a pathlib.Path object. If string or bytes, it must be the raw content of the input file.

Examples

Opens a PDB file, selects only chain ‘A’ and saves selection to a file. >>> s = Structure(Path(‘1ABC.pdb’)) >>> s.build() >>> s.add_filter_chain(‘A’) >>> s.write_PDB(‘out.pdb’)

Opens a mmCIF file, selects only residues above 50 and saves selection to a file. >>> s = Structure(Path(‘1ABC.cif’)) >>> s.build() >>> s.add_filter(lambda x: int(x[col_resSeq]) > 50) >>> s.write_PDB(‘out.pdb’)

>>> with open('1ABC.pdb', 'r') as fin:
>>>     lines = fin.read()
>>> s = Structure(lines)
>>> s.build()

add_filter(function)[source]¶: Add a function as filter.

add_filter_backbone(minimal=False)[source]¶: Add filter to consider only backbone atoms.

add_filter_chain(chain)[source]¶: Add filters for chain.

add_filter_record_name(record_name)[source]¶: Add filter for record names.

build()[source]¶

Read structure raw data in rawdata.

After .build(), filters and data can be accessed.

property chain_set¶: All chain IDs present in the raw dataset.

clear_filters()[source]¶: Clear/Deletes registered filters.

property consecutive_residues¶: Consecutive residue groups from filtered atoms.

property coords¶

Coordinates of the filtered atoms.

As float.

property data_array¶: Contain structure data in the form of a Numpy array.

property fasta¶

FASTA sequence of the filtered_atoms lines.

HETATM residues with non-canonical codes are represented as X.

property filtered_atoms¶

Filter data array by the selected filters.

Returns:: list – The data in PDB format after filtering.

property filtered_residues¶: Filter residues according to filters.

property filters¶: Filter functions registered ordered by registry record.

get_PDB(pdb_filters=None, renumber=True)[source]¶

Convert Structure to PDB format.

Considers only filtered lines.

Returns:: generator

get_sorted_minimal_backbone_coords(filtered=False)[source]¶

Generate a copy of the backbone coords sorted.

Sorting according N, CA, C.

This method was created because some PDBs may not have the backbone atoms sorted properly.

Parameters:: filtered (bool, optional) – Whether consider current filters or raw data.

pop_last_filter()[source]¶: Pop last filter.

property residues¶

Residues of the structure.

Without filtering, without chain separation.

write_PDB(filename, **kwargs)[source]¶: Write Structure to PDB file.

idpconfgen.libs.libstructure.concatenate_residue_labels(labels)[source]¶

Concatenate residue labels.

This function is a generator.

Parameters:: labels (numpy array of shape (N, M)) – Where N is the number of rows, and M the number of columns with the labels to be concatenated.

idpconfgen.libs.libstructure.detect_structure_type(datastr)[source]¶

Detect structure data parser.

Uses structure_parsers.

Returns:: parser – That which can parse datastr to a :py::class:`Structure’.

idpconfgen.libs.libstructure.filter_record_lines(lines, which='both')[source]¶: Filter lines to get record lines only.

idpconfgen.libs.libstructure.gen_empty_structure_data_array(number_of_atoms)[source]¶

Generate an array data structure to contain structure data.

Parameters:: number_of_atoms (int) – The number of atoms in the structure. Determines the size of the axis 0 of the structure array.
Returns:: np.ndarray of (N, (attr:libpdb.atom_slicers), dtype = ‘<U8’) – Where N is the ``number_of_atoms`.

idpconfgen.libs.libstructure.generate_backbone_pairs_labels(da)[source]¶

Generate backbone atom pairs labels.

Used to create columns in report summaries.

Parameters:: da (Structure.data_array - like)
Returns:: Numpy Array of dtype str, shape (N,) – Where N is the number of minimal backbone atoms.

idpconfgen.libs.libstructure.generate_residue_labels(*residue_labels, fmt=None, delimiter=' - ')[source]¶

Generate residue labels column.

Concatenate labels in residue_labels using: concatenate_residue_labels.

Parameters:: fmt (str, optional) – The string formatter by default we consider backbone atoms of a protein with less than 1000 residues. Defaults to None, uses ‘{:<8}’, 8 or multiple of 8 according to length of residue_labels.

idpconfgen.libs.libstructure.get_datastr(data)[source]¶

Get data in string format.

Can parse data from several formats:

Path, reads file content
bytes, converst to str
str, returns the input

Returns:: str – That represents the data

idpconfgen.libs.libstructure.is_backbone(atom, element, minimal=False)[source]¶

Whether atom is a protein backbone atom or not.

Parameters:

atom (str) – The atom name.
element (str) – The element name.
minimal (bool) – If True considers only C and N elements. False, considers also O.

idpconfgen.libs.libstructure.parse_cif_to_array(datastr, **kwargs)[source]¶

Parse mmCIF protein data to array.

Array is as given by gen_empty_structure_data_array().

idpconfgen.libs.libstructure.parse_pdb_to_array(datastr, which='both')[source]¶

Transform PDB data into an array.

Parameters:

datastr (str) – String representing the PDB format v3 file.
which (str) – Which lines to consider [‘ATOM’, ‘HETATM’, ‘both’]. Defaults to ‘both’, considers both ‘ATOM’ and ‘HETATM’.

Returns:

numpy.ndarray of (N, len(libpdb.atom_slicers)) – Where N are the number of ATOM and/or HETATM lines, and axis=1 the number of fields in ATOM/HETATM lines according to the PDB format v3.

idpconfgen.libs.libstructure.populate_structure_array_from_pdb(record_lines, data_array)[source]¶

Populate structure array from PDB lines.

Parameters:

record_lines (list-like) – The PDB record lines (ATOM or HETATM) to parse.
data_array (np.ndarray) – The array to populate.

Returns:

None – Populates array in place.

idpconfgen.libs.libstructure.save_structure_by_chains(pdb_data, pdbname, altlocs=('A', '', ' ', '1'), chains=None, record_name=('ATOM', 'HETATM'), renumber=True, **kwargs)[source]¶

Save PDBs/mmCIF in separated chains (PDB format).

Logic to parse PDBs from RCSB.

idpconfgen.libs.libstructure.structure_to_pdb(atoms)[source]¶

Convert table to PDB formatted lines.

Parameters:: atoms (np.ndarray, shape (N, 16) or similar data structure) – Where N is the number of atoms and 16 the number of cols.
Yields:: Formatted PDB line according to libpdb.atom_line_formatter.

idpconfgen.libs.libstructure.write_PDB(lines, filename)[source]¶

Write Structure data format to PDB.

Parameters:

lines (list or np.ndarray) – Lines contains PDB data as according to parse_pdb_to_array.
filename (str or Path) – The name of the output PDB file.

Lib Structure¶

Classes¶

Table of Contents

This Page