Parsers

Predefined parsers

class BedParser[source]

Standard Full BED Parser of 10 Columns

class ANNParser[source]

Annotation Parser, 6 columns

class BasicParser[source]

Parser for Chr, Start, Stop only (no Strand)

class NarrowPeakParser[source]

Narrow Peaks Parser. 10 columns

class RnaSeqParser[source]

Standard Full BED Parser of 10 Columns

class BedScoreParser[source]

Standard Full BED Parser of 10 Columns

Customizable parser

All the parsers in PyGMQL extend the RegionParser

class RegionParser(gmql_parser=None, chrPos=None, startPos=None, stopPos=None, strandPos=None, otherPos=None, delimiter='t', coordinate_system='0-based', schema_format='del', parser_name='parser')[source]

Creates a custom region dataset

Parameters:
  • chrPos – position of the chromosome column
  • startPos – position of the start column
  • stopPos – position of the stop column
  • strandPos – (optional) position of the strand column. Default is None
  • otherPos – (optional) list of tuples of the type [(pos, attr_name, typeFun), …]. Default is None
  • delimiter – (optional) delimiter of the columns of the file. Default ” “
  • coordinate_system – can be {‘0-based’, ‘1-based’, ‘default’}. Default is ‘0-based’
  • schema_format – (optional) type of file. Can be {‘tab’, ‘gtf’, ‘vcf’, ‘del’}. Default is ‘del’
  • parser_name – (optional) name of the parser. Default is ‘parser’
get_gmql_parser()[source]

Gets the Scala implementation of the parser

Returns:a Java Object
static parse_strand(strand)[source]

Defines how to parse the strand column

Parameters:strand – a string representing the strand
Returns:the parsed result
parse_regions(path)[source]

Given a file path, it loads it into memory as a Pandas dataframe

Parameters:path – file path
Returns:a Pandas Dataframe
get_attributes()[source]

Returns the unordered list of attributes

Returns:list of strings
get_ordered_attributes()[source]

Returns the ordered list of attributes

Returns:list of strings
get_types()[source]

Returns the unordered list of data types

Returns:list of data types
get_name_type_dict()[source]

Returns a dictionary of the type {‘column_name’: data_type, …}

Returns:dict
get_ordered_types()[source]

Returns the ordered list of data types

Returns:list of data types