Lib Parse¶
Parsing routines for different data structure.
All functions in this module receive a certain Python native datastructure, parse the information inside and return/yield the parsed information.
- idpconfgen.libs.libparse.convert_int_float_lines_to_dict(lines)[source]¶
Convert string lines composed of putative int and float to dict.
Example
>>> convert_int_float_lines_to_dict(['1 2']) {1: 2.0}
>>> convert_int_float_lines_to_dict(['1 2\n', '3 45.5\n']) {1: 2.0, 3: 45.5}
- idpconfgen.libs.libparse.convert_tuples_to_lists(data)[source]¶
Recursively processes input data and converts it all to list of lists.
Parameter¶
data : list of tuple
- returns:
result (list of list)
- idpconfgen.libs.libparse.fill_list(seq, fill, size)[source]¶
Fill list with fill to size.
If seq is not a list, converts it to a list.
- Returns:
list – The original with fill values.
- idpconfgen.libs.libparse.get_diff_between_aa1l(group1, *, group2={'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y', 'd', 'e', 'p', 's', 't'})¶
Get difference between groups as a set.
- idpconfgen.libs.libparse.get_diff_between_groups(group1, group2)[source]¶
Get difference between groups as a set.
- idpconfgen.libs.libparse.get_mers(seq, size)[source]¶
Get X-mers from seq.
Example
>>> get_mers('MEAIKHD', 3) {'MEA', 'EAI', 'AIK', 'KHD'}
- idpconfgen.libs.libparse.get_seq_chunk(seq, idx, size)[source]¶
Get a fragment from sequence at start at idx with size.
- idpconfgen.libs.libparse.get_seq_next_residue(seq, idx, size)[source]¶
Get the next residue after the fragment.
- idpconfgen.libs.libparse.group_by(data)[source]¶
Group data by indexes.
- Parameters:
data (iterable) – The data to group by.
- Returns:
list ([[type, slice],])
Examples
>>> group_by('LLLLLSSSSSSEEEEE') [['L', slice(0, 5)], ['S', slice(5, 11)], ['E', slice(11,16)]]
- idpconfgen.libs.libparse.group_runs(li, tolerance=1)[source]¶
Group consecutive numbers given a tolerance.
- idpconfgen.libs.libparse.is_valid_fasta(fasta)[source]¶
Confirm string is a valid FASTA primary sequence.
Does not accept headers, just the protein sequence in a single string.
- idpconfgen.libs.libparse.mkdssp_w_split(pdb, cmd, **kwargs)[source]¶
Execute mkdssp from DSSP.
Saves the data splitted accoring to backbone continuity as identified by mkdssp. Splits the input PDB into bb continuity segments.
- Parameters:
pdb (Path) – The path to the pdb file.
cmd (str) – The command to execute the external DSSP program.
- Yields:
from split_pdb_by_dssp
- idpconfgen.libs.libparse.parse_dssp(data, reduced=False)[source]¶
Parse DSSP file data.
JSON doesn’t accept bytes That is why data is expected as str.
- idpconfgen.libs.libparse.pop_difference_with_log(dict1, dict2, logmsg='Removing {} from the dictionary.\n')[source]¶
Pop keys in dict1 that are not present in dict2.
Reports pop’ed keys to log INFO.
Operates dict1 in place.
- Parameters:
dict1, dict2 (dict)
- Returns:
None
- idpconfgen.libs.libparse.remap_sequence(seq, target='A', group=('P', 'G'))[source]¶
Remap sequence.
- Parameters:
seq (Protein primary sequence in FASTA format.)
target (str (1-char)) – The residue to which all other residues will be converted to.
group (tuple) – The list of residues that excape map/conversion.
- Returns:
str – The remaped string.
Examples
>>> remap_sequence('AGTKLPHNG') 'AGAAAPAAG'
- idpconfgen.libs.libparse.sample_case(input_string)[source]¶
Sample all possible cases combinations from string.
Examples
>>> sample_case('A') {'A', 'a'}
>>> sample_case('Aa') {'AA', 'Aa', 'aA', 'aa'}
- idpconfgen.libs.libparse.split_by_ranges(seq, ranges)[source]¶
Split a string into substrings based on a list of custom ranges.
- Parameters:
seq (str) – String or sequence of desire to be split.
ranges (list of int) – Integers represent the index of which the split will occur. Each value is not inclusive.
- Returns:
chunks (list) – List of split strings at their desired locations.
- idpconfgen.libs.libparse.split_into_chunks(string, size=150)[source]¶
Split a string into chunks of characters.
The last chunk may be longer or shoter.
- Parameters:
string (str) – String of characters to split
size (int) – Integer value of chunk sizes. Defaults to 200.
- Returns:
chunks (list) – List of strings split into chunks of pre-determined sizes.
- idpconfgen.libs.libparse.split_pdb_by_dssp(pdbfile, dssp_text, minimum=2, reduced=False)[source]¶
Split PDB file based on DSSP raw data.
- Parameters:
minimum (int) – The minimum length allowed for a segment.
reduce (bool) – Whether to reduce the DSSP nomemclature to H/E/L.
- idpconfgen.libs.libparse.translate_seq_to_3l(input_seq)[source]¶
Translate 1-letter sequence to 3-letter sequence.
# Currently translates ‘H’ to ‘HIP’, to accommodate double protonation. Editing to ‘HIS’ causes issues with libbuild.
- idpconfgen.libs.libparse.values_to_dict(values)[source]¶
Generalization of converting parameters to dict.
Adapted from: https://github.com/joaomcteixeira/taurenmd/blob/6bf4cf5f01df206e9663bd2552343fe397ae8b8f/src/taurenmd/libs/libcli.py#L94-L138
- Parameters:
values (string) – List of values with the format “par1=1 par2=’string’ par3=[1,2,3]
- Returns:
param_dict (dictionary) – Converted string above to dictionary with = denoting linkage E.g. {‘par1’: 1, ‘par2’:’string’, ‘par3’: [1,2,3]}