pymodulon.gene_util

Utility functions for gene annotation

Module Contents

Functions

cog2str(cog)

Get the full description for a COG category letter

_get_attr(attributes, attr_id, ignore=False)

Helper function for parsing GFF annotations

gff2pandas(gff_file, feature='CDS', index=None)

Converts GFF file(s) to a Pandas DataFrame

reformat_biocyc_tu(tu)

param tu

Biocyc-formatted transcription unit (i.e. ‘thrL // thrA // thrB //

uniprot_id_mapping(prot_list, input_id='ACC+ID', output_id='P_REFSEQ_AC', input_name='input_id', output_name='output_id')

Python wrapper for the uniprot ID mapping tool (See

pymodulon.gene_util.cog2str(cog)[source]

Get the full description for a COG category letter

Parameters

cog (str) – COG category letter

Returns

Description of COG category

Return type

str

pymodulon.gene_util._get_attr(attributes, attr_id, ignore=False)[source]

Helper function for parsing GFF annotations

Parameters
  • attributes (str) – Attribute string

  • attr_id (str) – Attribute ID

  • ignore (bool) – If true, ignore errors if ID is not in attributes (default: False)

Returns

Value of attribute

Return type

str, optional

pymodulon.gene_util.gff2pandas(gff_file, feature='CDS', index=None)[source]

Converts GFF file(s) to a Pandas DataFrame :param gff_file: Path(s) to GFF file :type gff_file: str or list :param feature: Name(s) of features to keep (default = “CDS”) :type feature: str or list :param index: Column or attribute to use as index :type index: str, optional

Returns

df_gff – GFF formatted as a DataFrame

Return type

DataFrame

pymodulon.gene_util.reformat_biocyc_tu(tu)[source]
Parameters

tu (str) – Biocyc-formatted transcription unit (i.e. ‘thrL // thrA // thrB // thrC’)

Returns

formatted_tu – Semicolon-separated sorted gene list

Return type

str

pymodulon.gene_util.uniprot_id_mapping(prot_list, input_id='ACC+ID', output_id='P_REFSEQ_AC', input_name='input_id', output_name='output_id')[source]

Python wrapper for the uniprot ID mapping tool (See https://www.uniprot.org/uploadlists/)

Parameters
  • prot_list (list) – List of proteins to be mapped

  • input_id (str) – ID type for the mapping input (default: “ACC+ID”)

  • output_id (str) – ID type for the mapping output (default: “P_REFSEQ_AC”)

  • input_name (str) – Column name for input IDs

  • output_name (str) – Column name for output IDs

Returns

mapping – Table containing two columns, one listing the inputs, and one listing the mapped outputs. Column names are defined by input_name and output_name.

Return type

DataFrame