Lib Structure¶
Store internal protein structure representation.
Classes¶
- Structure
The main API that represents a protein structure in IDPConfGen.
- class idpconfgen.libs.libstructure.Structure(data, **kwargs)[source]¶
Hold structural data from PDB/mmCIF files.
Run the
.build()
method to read the structure.Cases for PDB Files: * If there are several MODELS only the first model is considered.
- Parameters:
data (str, bytes, Path) – Raw structural data from PDB/mmCIF formatted files. If data is a path to a file it must be a pathlib.Path object. If string or bytes, it must be the raw content of the input file.
Examples
Opens a PDB file, selects only chain ‘A’ and saves selection to a file. >>> s = Structure(Path(‘1ABC.pdb’)) >>> s.build() >>> s.add_filter_chain(‘A’) >>> s.write_PDB(‘out.pdb’)
Opens a mmCIF file, selects only residues above 50 and saves selection to a file. >>> s = Structure(Path(‘1ABC.cif’)) >>> s.build() >>> s.add_filter(lambda x: int(x[col_resSeq]) > 50) >>> s.write_PDB(‘out.pdb’)
>>> with open('1ABC.pdb', 'r') as fin: >>> lines = fin.read() >>> s = Structure(lines) >>> s.build()
- build()[source]¶
Read structure raw data in
rawdata
.After .build(), filters and data can be accessed.
- property chain_set¶
All chain IDs present in the raw dataset.
- property consecutive_residues¶
Consecutive residue groups from filtered atoms.
- property coords¶
Coordinates of the filtered atoms.
As float.
- property data_array¶
Contain structure data in the form of a Numpy array.
- property fasta¶
FASTA sequence of the
filtered_atoms
lines.HETATM residues with non-canonical codes are represented as X.
- property filtered_atoms¶
Filter data array by the selected filters.
- Returns:
list – The data in PDB format after filtering.
- property filters¶
Filter functions registered ordered by registry record.
- get_PDB(pdb_filters=None, renumber=True)[source]¶
Convert Structure to PDB format.
Considers only filtered lines.
- Returns:
generator
- get_sorted_minimal_backbone_coords(filtered=False)[source]¶
Generate a copy of the backbone coords sorted.
Sorting according N, CA, C.
This method was created because some PDBs may not have the backbone atoms sorted properly.
- Parameters:
filtered (bool, optional) – Whether consider current filters or raw data.
- property residues¶
Residues of the structure.
Without filtering, without chain separation.
- idpconfgen.libs.libstructure.concatenate_residue_labels(labels)[source]¶
Concatenate residue labels.
This function is a generator.
- Parameters:
labels (numpy array of shape (N, M)) – Where N is the number of rows, and M the number of columns with the labels to be concatenated.
- idpconfgen.libs.libstructure.detect_structure_type(datastr)[source]¶
Detect structure data parser.
Uses
structure_parsers
.- Returns:
parser – That which can parse datastr to a :py::class:`Structure’.
- idpconfgen.libs.libstructure.filter_record_lines(lines, which='both')[source]¶
Filter lines to get record lines only.
- idpconfgen.libs.libstructure.gen_empty_structure_data_array(number_of_atoms)[source]¶
Generate an array data structure to contain structure data.
- Parameters:
number_of_atoms (int) – The number of atoms in the structure. Determines the size of the axis 0 of the structure array.
- Returns:
np.ndarray of (N, (attr:libpdb.atom_slicers), dtype = ‘<U8’) – Where N is the ``number_of_atoms`.
- idpconfgen.libs.libstructure.generate_backbone_pairs_labels(da)[source]¶
Generate backbone atom pairs labels.
Used to create columns in report summaries.
- Parameters:
da (Structure.data_array - like)
- Returns:
Numpy Array of dtype str, shape (N,) – Where N is the number of minimal backbone atoms.
- idpconfgen.libs.libstructure.generate_residue_labels(*residue_labels, fmt=None, delimiter=' - ')[source]¶
Generate residue labels column.
- Concatenate labels in residue_labels using
concatenate_residue_labels.
- Parameters:
fmt (str, optional) – The string formatter by default we consider backbone atoms of a protein with less than 1000 residues. Defaults to None, uses ‘{:<8}’, 8 or multiple of 8 according to length of residue_labels.
- idpconfgen.libs.libstructure.get_datastr(data)[source]¶
Get data in string format.
Can parse data from several formats:
Path, reads file content
bytes, converst to str
str, returns the input
- Returns:
str – That represents the data
- idpconfgen.libs.libstructure.is_backbone(atom, element, minimal=False)[source]¶
Whether atom is a protein backbone atom or not.
- Parameters:
atom (str) – The atom name.
element (str) – The element name.
minimal (bool) – If True considers only C and N elements. False, considers also O.
- idpconfgen.libs.libstructure.parse_cif_to_array(datastr, **kwargs)[source]¶
Parse mmCIF protein data to array.
Array is as given by
gen_empty_structure_data_array()
.
- idpconfgen.libs.libstructure.parse_pdb_to_array(datastr, which='both')[source]¶
Transform PDB data into an array.
- Parameters:
datastr (str) – String representing the PDB format v3 file.
which (str) – Which lines to consider [‘ATOM’, ‘HETATM’, ‘both’]. Defaults to ‘both’, considers both ‘ATOM’ and ‘HETATM’.
- Returns:
numpy.ndarray of (N, len(libpdb.atom_slicers)) – Where N are the number of ATOM and/or HETATM lines, and axis=1 the number of fields in ATOM/HETATM lines according to the PDB format v3.
- idpconfgen.libs.libstructure.populate_structure_array_from_pdb(record_lines, data_array)[source]¶
Populate structure array from PDB lines.
- Parameters:
record_lines (list-like) – The PDB record lines (ATOM or HETATM) to parse.
data_array (np.ndarray) – The array to populate.
- Returns:
None – Populates array in place.
- idpconfgen.libs.libstructure.save_structure_by_chains(pdb_data, pdbname, altlocs=('A', '', ' ', '1'), chains=None, record_name=('ATOM', 'HETATM'), renumber=True, **kwargs)[source]¶
Save PDBs/mmCIF in separated chains (PDB format).
Logic to parse PDBs from RCSB.