Lib Parse

Parsing routines for different data structure.

All functions in this module receive a certain Python native datastructure, parse the information inside and return/yield the parsed information.

idpconfgen.libs.libparse.convert_int_float_lines_to_dict(lines)[source]

Convert string lines composed of putative int and float to dict.

Example

>>> convert_int_float_lines_to_dict(['1 2'])
{1: 2.0}
>>> convert_int_float_lines_to_dict(['1 2\n', '3 45.5\n'])
{1: 2.0, 3: 45.5}
idpconfgen.libs.libparse.convert_tuples_to_lists(data)[source]

Recursively processes input data and converts it all to list of lists.

Parameter

data : list of tuple

returns:

result (list of list)

idpconfgen.libs.libparse.fill_list(seq, fill, size)[source]

Fill list with fill to size.

If seq is not a list, converts it to a list.

Returns:

list – The original with fill values.

idpconfgen.libs.libparse.get_diff_between_aa1l(group1, *, group2={'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y', 'd', 'e', 'p', 's', 't'})

Get difference between groups as a set.

idpconfgen.libs.libparse.get_diff_between_groups(group1, group2)[source]

Get difference between groups as a set.

idpconfgen.libs.libparse.get_mers(seq, size)[source]

Get X-mers from seq.

Example

>>> get_mers('MEAIKHD', 3)
{'MEA', 'EAI', 'AIK', 'KHD'}
idpconfgen.libs.libparse.get_seq_chunk(seq, idx, size)[source]

Get a fragment from sequence at start at idx with size.

idpconfgen.libs.libparse.get_seq_next_residue(seq, idx, size)[source]

Get the next residue after the fragment.

idpconfgen.libs.libparse.get_trimer_seq(seq, idx)[source]

Get sequence of trimer.

idpconfgen.libs.libparse.group_by(data)[source]

Group data by indexes.

Parameters:

data (iterable) – The data to group by.

Returns:

list ([[type, slice],])

Examples

>>> group_by('LLLLLSSSSSSEEEEE')
[['L', slice(0, 5)], ['S', slice(5, 11)], ['E', slice(11,16)]]
idpconfgen.libs.libparse.group_runs(li, tolerance=1)[source]

Group consecutive numbers given a tolerance.

idpconfgen.libs.libparse.is_valid_fasta(fasta)[source]

Confirm string is a valid FASTA primary sequence.

Does not accept headers, just the protein sequence in a single string.

idpconfgen.libs.libparse.make_list_if_not(item)[source]

Make a list from item.

idpconfgen.libs.libparse.mkdssp_w_split(pdb, cmd, **kwargs)[source]

Execute mkdssp from DSSP.

Saves the data splitted accoring to backbone continuity as identified by mkdssp. Splits the input PDB into bb continuity segments.

https://github.com/cmbi/dssp

Parameters:
  • pdb (Path) – The path to the pdb file.

  • cmd (str) – The command to execute the external DSSP program.

Yields:

from split_pdb_by_dssp

idpconfgen.libs.libparse.parse_dssp(data, reduced=False)[source]

Parse DSSP file data.

JSON doesn’t accept bytes That is why data is expected as str.

idpconfgen.libs.libparse.pop_difference_with_log(dict1, dict2, logmsg='Removing {} from the dictionary.\n')[source]

Pop keys in dict1 that are not present in dict2.

Reports pop’ed keys to log INFO.

Operates dict1 in place.

Parameters:

dict1, dict2 (dict)

Returns:

None

idpconfgen.libs.libparse.remap_sequence(seq, target='A', group=('P', 'G'))[source]

Remap sequence.

Parameters:
  • seq (Protein primary sequence in FASTA format.)

  • target (str (1-char)) – The residue to which all other residues will be converted to.

  • group (tuple) – The list of residues that excape map/conversion.

Returns:

str – The remaped string.

Examples

>>> remap_sequence('AGTKLPHNG')
'AGAAAPAAG'
idpconfgen.libs.libparse.remove_empty_keys(ddict)[source]

Remove empty keys from dictionary.

idpconfgen.libs.libparse.sample_case(input_string)[source]

Sample all possible cases combinations from string.

Examples

>>> sample_case('A')
{'A', 'a'}
>>> sample_case('Aa')
{'AA', 'Aa', 'aA', 'aa'}
idpconfgen.libs.libparse.split_by_ranges(seq, ranges)[source]

Split a string into substrings based on a list of custom ranges.

Parameters:
  • seq (str) – String or sequence of desire to be split.

  • ranges (list of int) – Integers represent the index of which the split will occur. Each value is not inclusive.

Returns:

chunks (list) – List of split strings at their desired locations.

idpconfgen.libs.libparse.split_into_chunks(string, size=150)[source]

Split a string into chunks of characters.

The last chunk may be longer or shoter.

Parameters:
  • string (str) – String of characters to split

  • size (int) – Integer value of chunk sizes. Defaults to 200.

Returns:

chunks (list) – List of strings split into chunks of pre-determined sizes.

idpconfgen.libs.libparse.split_pdb_by_dssp(pdbfile, dssp_text, minimum=2, reduced=False)[source]

Split PDB file based on DSSP raw data.

Parameters:
  • minimum (int) – The minimum length allowed for a segment.

  • reduce (bool) – Whether to reduce the DSSP nomemclature to H/E/L.

idpconfgen.libs.libparse.translate_seq_to_3l(input_seq)[source]

Translate 1-letter sequence to 3-letter sequence.

# Currently translates ‘H’ to ‘HIP’, to accommodate double protonation. Editing to ‘HIS’ causes issues with libbuild.

idpconfgen.libs.libparse.values_to_dict(values)[source]

Generalization of converting parameters to dict.

Adapted from: https://github.com/joaomcteixeira/taurenmd/blob/6bf4cf5f01df206e9663bd2552343fe397ae8b8f/src/taurenmd/libs/libcli.py#L94-L138

Parameters:

values (string) – List of values with the format “par1=1 par2=’string’ par3=[1,2,3]

Returns:

param_dict (dictionary) – Converted string above to dictionary with = denoting linkage E.g. {‘par1’: 1, ‘par2’:’string’, ‘par3’: [1,2,3]}