`pymodulon.gene_util`

Utility functions for gene annotation

Module Contents

Functions

`cog2str`(cog)	Get the full description for a COG category letter
`_get_attr`(attributes, attr_id, ignore=False)	Helper function for parsing GFF annotations
`gff2pandas`(gff_file, feature='CDS', index=None)	Converts GFF file(s) to a Pandas DataFrame
`reformat_biocyc_tu`(tu)	param tu Biocyc-formatted transcription unit (i.e. ‘thrL // thrA // thrB //
`uniprot_id_mapping`(prot_list, input_id='ACC+ID', output_id='P_REFSEQ_AC', input_name='input_id', output_name='output_id')	Python wrapper for the uniprot ID mapping tool (See

pymodulon.gene_util.cog2str(cog)[source]

Get the full description for a COG category letter

Parameters: cog (str) – COG category letter
Returns: Description of COG category
Return type: str

pymodulon.gene_util._get_attr(attributes, attr_id, ignore=False)[source]

Helper function for parsing GFF annotations

Parameters

attributes (str) – Attribute string
attr_id (str) – Attribute ID
ignore (bool) – If true, ignore errors if ID is not in attributes (default: False)

Returns

Value of attribute

Return type

str, optional

pymodulon.gene_util.gff2pandas(gff_file, feature='CDS', index=None)[source]

Converts GFF file(s) to a Pandas DataFrame :param gff_file: Path(s) to GFF file :type gff_file: str or list :param feature: Name(s) of features to keep (default = “CDS”) :type feature: str or list :param index: Column or attribute to use as index :type index: str, optional

Returns: df_gff – GFF formatted as a DataFrame
Return type: DataFrame

pymodulon.gene_util.reformat_biocyc_tu(tu)[source]

Parameters: tu (str) – Biocyc-formatted transcription unit (i.e. ‘thrL // thrA // thrB // thrC’)
Returns: formatted_tu – Semicolon-separated sorted gene list
Return type: str

pymodulon.gene_util.uniprot_id_mapping(prot_list, input_id='ACC+ID', output_id='P_REFSEQ_AC', input_name='input_id', output_name='output_id')[source]

Python wrapper for the uniprot ID mapping tool (See https://www.uniprot.org/uploadlists/)

Parameters

prot_list (list) – List of proteins to be mapped
input_id (str) – ID type for the mapping input (default: “ACC+ID”)
output_id (str) – ID type for the mapping output (default: “P_REFSEQ_AC”)
input_name (str) – Column name for input IDs
output_name (str) – Column name for output IDs

Returns

mapping – Table containing two columns, one listing the inputs, and one listing the mapped outputs. Column names are defined by input_name and output_name.

Return type

DataFrame

pymodulon.gene_util

Module Contents

Functions

`pymodulon.gene_util`