Documentation read from 05/19/2023 16:24:13 version of /vol/kmer-server-prod/FIGdisk.server.rhel6/dist/releases/dev/common/lib/FigKernelPackages/SAP.pm.
This file contains the functions and utilities used by the Sapling Server (sap_server.cgi). The various methods listed in the sections below represent function calls direct to the server. These all have a signature similar to the following.
my $results = $sapObject->function_name($args);
where $sapObject
is an object created by this module, $args
is a parameter structure, and function_name
is the Sapling Server function name. The output $results is a scalar, generally a hash reference, but sometimes a string or a list reference.
Several methods deal with gene locations. Location information from the Sapling server is expressed as location strings. A location string consists of a contig ID (which includes the genome ID), an underscore, a starting location, a strand indicator (+
or -
), and a length. The first location on the contig is 1
.
For example, 100226.1:NC_003888_3766170+612
indicates contig NC_003888
in genome 100226.1
(Streptomyces coelicolor A3(2)) beginning at location 3766170 and proceeding forward on the plus strand for 612 bases.
Use
my $sapObject = SAPserver->new();
to create a new sapling server function object. The server function object is used to invoke the "Primary Methods" listed below. See SAPserver for more information on how to create this object and the options available.
You will not use the methods in this section very often. Some are used by the server framework for maintenance and control purposes ("methods"), while others ("query" and "get") provide access to data in the database in case you need data not available from one of the standard methods.
my $methodList = $sapObject->methods();
Return a reference to a list of the methods allowed on this object.
my $idHash = $sapObject->exists({ -type => 'Genome', -ids => [$id1, $id2, ...] });
Return a hash indicating which of the specified objects of the given type exist in the database. This method is used as a general mechanism for finding what exists and what doesn't exist when you know the ID. In particular, you can use it to check for the presence or absence of subsystems, genomes, features, or FIGfams.
The parameter should be a reference to a hash with the following keys.
The type of object whose existence is being queried. The type specification is case-insensitive: genome
and Genome
are treated the same. The permissible types are
Genomes, identified by taxon ID: 100226.1
, 83333.1
, 360108.3
Features (genes), identified by FIG ID: fig|100226.1.peg.3361
, fig|360108.3.rna.4
Subsystem, identified by subsystem name: Arginine biosynthesis extended
FIGfam protein family, identified by ID: FIG000171
, FIG001501
Reference to a list of identifiers for objects of the specified type.
Returns a reference to a hash keyed by ID. For each incoming ID, it maps to 1
if an object of the specified type with that ID exists, else 0
.
$idHash = { $id1 => $flag1, $id2 => $flag2, ... };
my $hashList = $sapObject->get({ -objects => $objectNameString, -filter => { $label1 => $criterion1, $label2 => $criterion2, ... }, -limit => $maxRows, -fields => { $label1 => $name1, $label2 => $name2, ... }, -multiples => 'list', -firstOnly => 1 });
Query the Sapling database. This is a variant of the "query" method in which a certain amount of power is sacrificed for ease of use. Instead of a full-blown filter clause, the caller specifies a filter hash that maps field identifiers to values.
The parameter should be a reference to a hash with the following keys.
The object name string listing all the entities and relationships in the query. See "Object Name List" in ERDB for more details.
Reference to a hash that maps field identifiers in "Standard Field Name Format" in ERDB to criteria. A criterion is either an object or scalar value (which is asserted as the value of the field), a 2-tuple consisting of a relational operator and a value (which is asserted to be in the appropriate relation to the field), or a sub-list consisting of the word IN
and two or more values (which asserts that the field has one of the listed values). A record satisfies the filter if it satisfies all the criteria in the hash.
Maximum number of rows to return for this query. The default is no limit.
Reference to a hash mapping field identifiers to field names. In this case, the field identifier is a field name in "Standard Field Name Format" in ERDB and the field name is the key value that will be used for the field in the returned result hashes. If this parameter is omitted, then instead of a returning the results, this method will return a count of the number of records found.
Rule for handling field values in the result hashes. The default option is smart
, which maps single-valued fields to scalars and multi-valued fields to list references. If primary
is specified, then all fields are mapped to scalars-- only the first value of a multi-valued field is retained. If list
is specified, then all fields are mapped to lists.
If TRUE, only the first result will be returned. In this case, the return value will be a hash reference instead of a list of hash references. The default is FALSE.
Returns a reference to a list of hashes. Each hash represents a single record in the result set, and maps the output field names to the field values for that record. Note that if a field is multi-valued, it will be represented as a list reference.
$hashList = [{ $label1 => $row1value1, $label2 => $row1value2, ... }, { $label1 => $row2value1, $label2 => $row2value2, ... }, ... ];
my $rowList = $sapObject->query({ -objects => $objectNameString, -filterString => $whereString, -limit => $maxRows, -parameters => [$parm1, $parm2, ...], -fields => [$name1, $name2, ...] });
This method queries the Sapling database and returns a reference to a list of lists. The query is specified in the form of an object name string, a filter string, an optional list of parameter values, and a list of desired output fields. The result document can be thought of as a two-dimensional array, with each row being a record returned by the query and each column representing an output field.
This function buys a great deal of flexibility as the cost of ease of use. Before attempting to formulate a query, you will need to look at the ERDB documentation.
The parameter should be a reference to a hash with the following keys.
The object name string listing all the entities and relationships in the query. See "Object Name List" in ERDB for more details.
The filter string for the query. It cannot contain a LIMIT
clause, but can otherwise be anything described in "Filter Clause" in ERDB.
Maximum number of rows to return for this query. The default is 1000
. To make an unlimited query, specify none
.
Reference to a list of parameter values. These should be numbers or strings, and are substituted for any parameter marks in the query on a one-for-one basis. See also "Parameter List" in ERDB.
Reference to a list containing the names of the desired output fields.
Returns a reference to a list of lists. Each row corresponds to a database result row, and each column corresponds to one of the incoming output fields. Note that some fields contain complex PERL data structures, and fields that are multi-valued will contain sub-lists.
$rowList = [[$row1field1, $row1field2, ...], [$row2field1, $row2field2, ...], [$row3field1, $row3field2, ...], ... ];
my $listList = $sapObject->select({ -path => $objectNameString, -filter => { $field1 => $list1, $field2 => $list2, ... }, -fields => [$fieldA, $fieldB, ... ], -limit => $maxRows, -multiples => 'list' });
Query the Sapling database. This is a variant of the "get" method in which a further amount of power is sacrificed for ease of use. The return is a list of lists, and the criteria are always in the form of lists of possible values.
The parameter should be a reference to a hash with the following keys.
The object name string listing all the entities and relationships in the query. See "Object Name List" in ERDB for more details.
Reference to a hash that maps field identifiers in "Standard Field Name Format" in ERDB to lists of permissible values. A record matches the filter if the field value matches at least one element of the list.
Reference to a list of field names in "Standard Field Name Format" in ERDB.
Maximum number of rows to return for this query. The default is no limit.
Rule for handling field values in the result hashes. The default option is smart
, which maps single-valued fields to scalars and multi-valued fields to list references. If primary
is specified, then all fields are mapped to scalars-- only the first value of a multi-valued field is retained. If list
is specified, then all fields are mapped to lists.
Returns a reference to a list of lists. Each sub-list represents a single record in the result set, and contains the field values in the order the fields were lists in the -fields
parameter. Note that if a field is multi-valued, it will be represented as a list reference.
$listList = [[$row1value1, $row1value2, ... ], [$row2value1, $row2value2, ...], ... ];
my $idHash = $sapObject->equiv_precise_assertions({ -ids => [$id1, $id2, ...] });
Return the assertions for all genes in the database that match the identified gene. The gene can be specified by any prefixed gene identifier (e.g. uni|AYQ44
, gi|85841784
, or fig|360108.3.peg.1041
).
The parameter should be a reference to a hash with the following keys.
Reference to a list of gene identifiers.
For backward compatibility, the parameter can also be a reference to a list of gene identifiers.
Returns a reference to a hash that maps each incoming ID to a list of 4-tuples. Each 4-tuple contains (0) an identifier that is for the same gene as the input identifier, (1) the asserted function of that identifier, (2) the source of the assertion, and (3) a flag that is TRUE if the assertion is by a human expert.
$idHash = { $id1 => [$otherID1, $function1, $source1, $flag1], $id2 => [$otherID2, $function2, $source2, $flag2], ... };
In backward-compatibility mode, returns a reference to a list of 2-tuples. Each 2-tuple consists of an incoming ID and the list of 4-tuples with the asserted function information.
my $idHash = $sapObject->equiv_sequence_assertions({ -ids => [$id1, $id2, ...] });
Return the assertions for all genes in the database that match the identified protein sequences. A protein sequence can be identified by a protein MD5 code or any prefixed gene identifier (e.g. uni|AYQ44
, gi|85841784
, or fig|360108.3.peg.1041
).
The parameter should be a reference to a hash with the following keys.
Reference to a list of protein identifiers. Each identifier should be a prefixed gene identifier or the (optionally) prefixed MD5 of a protein sequence.
Returns a reference to a hash mapping each incoming protein identifier to a list of 5-tuples, consisting of (0) an identifier that is sequence-equivalent to the input identifier, (1) the asserted function of that identifier, (2) the source of the assertion, (3) a flag that is TRUE if the assertion is by an expert, and (4) the name of the genome relevant to the identifer (if any).
$idHash = { $id1 => [$otherID1, $function1, $source1, $flag1], $id2 => [$otherID2, $function2, $source2, $flag2], ... };
my $featureHash = $sapObject->feature_assignments({ -genome => $genomeID, -type => 'peg', -hypothetical => 1 });
Return all features of the specified type for the specified genome along with their assignments.
The parameter should be a reference to a hash with the following keys.
ID of the genome whose features are desired.
If specified, the type of feature desired (peg
, rna
, etc.). If omitted, all features will be returned.
If 1
, only hypothetical genes will be returned; if 0
, only non-hypothetical genes will be returned. If undefined or not specified, all genes will be returned.
Returns a hash mapping the ID of each feature in the specified genome to its assignment.
$featureHash = { $fid1 => $function1, $fid2 => $function2, ... };
my $idHash = $sapObject->ids_to_assertions({ -ids => [$id1, $id2, ...] });
Return the assertions associated with each prefixed ID.
The parameter should be a reference to a hash with the following keys.
Reference to a list of prefixed feature IDs (e.g. gi|17017961
, NP_625335.1
, fig|360108.3.peg.1041
). The assertions associated with each particular identifier will be returned. In this case, there will be no processing for equivalent IDs. For that, you should use equiv_sequence_assertions or equiv_precise_assertions.
Returns a reference to a hash mapping every incoming ID to a list of 3-tuples, each consisting of (0) an asserted function, (1) the source of the assertion, and (2) a flag that is TRUE if the assertion was made by an expert.
$idHash = { $id1 => [[$assertion1a, $source1a, $expert1a], [$assertion1b, $source1b, $expert1b], ...], $id2 => [[$assertion2a, $source2a, $expert2a], [$assertion2b, $source2b, $expert2b], ...], ... };
my $idHash = $sapObject->ids_to_annotations({ -ids => [$id1, $id2, ...] });
Return the annotations associated with each prefixed ID. Annotations are comments attached to each feature (gene), and include past functional assignments as well as more general information.
The parameter should be a reference to a hash with the following keys.
Reference to a list of feature IDs.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
Returns a reference to a hash mapping every incoming ID to a list of 3-tuples, each consisting of (0) annotation text, (1) the name of the annotator, and (2) the timestamp of the annotation (as a number of seconds since the epoch).
$idHash = { $id1 => [[$annotation1a, $name1a, $time1a], [$annotation1b, $name1b, $time1b], ...], $id2 => [[$annotation2a, $name2a, $time2a], [$annotation2b, $name2b, $time2b], ...], ... };
my $featureHash = $sapObject->ids_to_functions({ -ids => [$id1, $id2, ...], -source => 'CMR' -genome => $genome });
Return the functional assignment for each feature in the incoming list.
The parameter should be a reference to a hash with the following keys.
Reference to a list of feature IDs.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return genes for all genomes.
Returns a reference to a hash mapping each feature ID to the feature's current functional assignment. Features that do not exist in the database will not be present in the hash. For IDs that correspond to multiple features, only one functional assignment will be returned.
$featureHash = { $id1 => $function1, $id2 => $function2, ...};
my $roleHash = $sapObject->occ_of_role({ -roles => [$role1, $role2, ...], -functions => [$function3, $function4, ...], -genomes => [$genome1, $genome2, ...], });
Search for features in a specified genome with the indicated roles or functions.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the roles to search for.
Reference to a list of the functional assignments to search for.
ID of the genomes whose genes are to be searched for the specified roles and assignments.
Returns a reference to a hash that maps each specified role ID or functional assignment to a list of the FIG IDs of genes that have that role or assignment.
$roleHash = { $role1 => [$fid1a, $fid1b, ...], $role2 => [$fid2a, $fid2b, ...], $function3 => [$fid3a, $fid3b, ...], $function4 => [$fid4a, $fid4b, ...], ... };
my $complexList = $sapObject->all_complexes();
Return a list of all the complexes in the database.
Returns a reference to a list of complex IDs.
$complexList = [$cpx1, $cpx2, ...]
my $modelHash = $sapObject->all_models();
Return a hash of all the models in the database, mapping each one to the relevant genome.
Returns a reference to a hash that maps each model ID to a genome ID.
$modelHash = { $model1 => $genome1, $model2 => $genome2, ... };
my $reactions = $sapObject->all_reactions();
Return a list of all the reactions in the database.
Returns a reference to a list of all the reactions.
$reactions = [$rx1, $rx2, ...];
my $rolesList = $sapObject->all_roles_used_in_models();
Return a list of all the roles used in models.
Returns a reference to a list of role names. Each named role triggers a complex used in at least one reaction belonging to a model.
$rolesList = [$role1, $role2, ...]
my $complexHash = $sapObject->complex_data({ -ids => [$cpx1, $cpx2, ...], -data => [$fieldA, $fieldB, ...] });
Return the specified data items for each incoming reaction complex.
Reference to hash with the following keys.
Reference to a list of the IDs of reaction complexes of interest.
Reference to a list of the names of the data items desired for each of the specified complexes.
Name of the complex (or undef
if the complex is nameless).
Reference to a list of the reactions in the complex.
Reference to a list of 2-tuples for the roles in the complex, each containing (0) the role name, and (1) a flag that is TRUE if the role is optional to trigger the complex and FALSE if it is necessary.
Returns a reference to a hash mapping each incoming complex to an n-tuple containing the desired data fields in the order specified.
$complexHash = { $cpx1 => [$data1A, $data1B, ...], $cpx2 => [$data2A, $data2B, ...] ... };
my $reactionHash = $sapObject->coupled_reactions({ -ids => [$rx1, $irx2, ...] });
For each of a set of reactions, get the adjacent reactions in the metabolic network. Two reactions are considered adjacent if they share at least one compound that is neither a cofactor or a ubiquitous compound (like water or oxygen). The compounds that relate the adjacent reactions are called the connecting compounds. In most cases, each pair of adjacent reactions will have only one connecting compound, but this is not guaranteed to be true.
The parameter should be a reference to a hash with the following keys.
Reference to a list of reaction IDs.
Returns a reference to a hash mapping each reaction ID to a sub-hash. Each sub-hash maps adjacent reactions to the relevant connecting compounds.
$reactionHash = { $rx1 => { $rx1a => [$cpd1ax, $cpd1ay, ...], $rx1b => [$cpd1bx, $cpd1by, ...], ...};
my $modelHash = $sapObject->models_to_reactions({ -ids => [$model1, $model2, ...] });
Return the list of reactions in each specified model.
The parameter should be a reference to a hash with the following keys.
Reference to a list of model IDs, indicating the models of interest.
Returns a reference to a hash that maps each model ID to a list of the reactions in the model.
$modelHash = { $model1 => [$rx1a, $rx1b, ...], $model2 => [$rx2a, $rx2b, ...], ... };
my $reactionHash = $sapObject->reactionNeighbors({ -ids => [$rx1, $rx2, ...], -depth => 1 });
Return a list of the reactions in the immediate neighborhood of the specified reactions. A separate neighborhood list will be generated for each incoming reaction; the neighborhood will consist of reactions connected to the incoming reaction and reactions connected to those reactions up to the specified depth. (Two reactions are connected if they have a compound in common that is not a cofactor or a ubiquitous chemical like water or ATP).
The parameter should be a reference to a hash with the following keys:
Reference to a list of IDs for the reactions of interest.
Number of levels to which the neighborhood search should take place. If the depth is n, then the neighborhood will consist of the original reaction and every other reaction for which there is a sequence of n+1 or fewer reactions starting with the original and ending with the other reaction. Thus, if n is zero, the original reaction is returned as a singleton. If n is 1, then the neighborhood is the original reaction and every reaction connected to it. The default is 2
.
Returns a reference to a hash mapping each incoming reaction to a sub-hash. The sub-hash maps each reaction in the neighborhood to its distance from the original reaction.
$reactionHash = { $rx1 => { $rx1a => $dist1a, $rx1b => $dist1b, ... }, $rx2 => { $rx2a => $dist2a, $rx2b => $dist2b, ... }, ... };
my $reactionList = $sapObject->reaction_path({ -roles => [$role1, $role2, ...], -maxLength => 10 });
Find the shortest reaction path that represents as many of the specified roles as possible. Note that since the a reaction may be associated with multiple roles, it is possible for a single role to be represented more than once in the path.
The search is artificially limited to paths under a maximum length that can be specified in the parameters.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the roles to be covered by the reaction path.
Maximum number of reactions to allow in the reaction path. The default is two more than the number of roles.
Returns a reference to a list of the best reaction paths. Each reaction path is represented by a list of lists, the sub-lists containing the reaction IDs followed by the roles represented by the reaction. The paths returned will be the shortest ones found with the minimal number of missing roles.
$reactionList = [ [[$rxn1a, $role1ax, $role1ay, ...], [$rxn1b, $role1bx, $role1by, ...], ...], [[$rxn2a, $role2ax, $role2ay, ...], [$rxn2b, $role2bx, $role2by, ...], ...], ... ];
my $reactionHash = $sapObject->reaction_strings({ -ids => [$rx1, $rx2, ...], -roles => 1, -names => 1 });
Return the display string for each reaction. The display string contains the compound IDs (as opposed to the atomic formulas) and the associated stoichiometries, with the substrates on the left of the arrow and the products on the right.
The parameter should be a reference to a hash with the following keys.
Reference to a list of IDs for the reactions of interest.
If TRUE, then each reaction string will be associated with a list of the reaction's roles in the result. The default is FALSE.
If 1
, then the compound name will be included with the ID in the output. If only
, the compound name will be included instead of the ID. If 0
, only the ID will be included. The default is 0
.
Returns a reference to a hash mapping each reaction ID to a displayable string describing the reaction. If -roles
is TRUE, then instead of a string, the hash will map each reaction ID to a list consisting of the string followed by the roles associated with the reaction.
$reactionHash = { $rx1 => $string1, $rx2 => $string2, ... }
$reactionHash = { $rx1 => [$string1, $role1a, $role1b, ...], $rx2 => [$string2, $role2a, $role2b, ...], ... }
my $reactionHash = $sapObject->reactions_to_complexes({ -ids => [$rxn1, $rxn2, ...] });
Return the complexes containing each reaction. Note that most reactions are in more than one complex, so the complexes for each reaction are returned as a list.
The parameter should be a reference to a hash with the following keys.
Reference to a list of reaction IDs for the reactions of interest.
Returns a reference to a hash mapping each incoming reaction to a list of the associated complexes.
$reactionHash = { $rxn1 => [$cpx1a, $cpx1b, ...], $rxn2 => [$cpx2a, $cpx2b, ...], ... };
my $reactionHash = $sapObject->reactions_to_roles({ -ids => [$rx1, $rx2,...] });
Return the roles associated with each reaction.
The parameter should be a reference to a hash with the following keys.
Reference to a list of reaction IDs for the reactions of interest.
Returns a reference to a hash mapping each incoming reaction to a list of the associated roles.
$reactionHash = { $rx1 => [$role1a, $role1b, ...], $rx2 => [$role2a, $role2b, ...], ... };
my $roleHash = $sapObject({ -ids => [$role1, $role2, ...] });
For each role, return a list of roles in the immediate chemical neighborhood. A role is in the immediate chemical neighborhood of another role if the two roles are associated with reactions that share a compound that is not ubiquitous or a cofactor.
The parameter should be a reference to a hash with the following keys:
Reference to a list of role names.
Returns a reference to a hash that maps each incoming role name to a list of the names of the neighboring roles.
$roleHash = { $role1 => [$role1a, $role1b, ...], $role2 => [$role2a, $role2b, ...], ... };
my $roleHash = $sapObject->role_reactions({ -ids => [$role1, $role2, ...], -formulas => 1 });
Return a list of all the reactions associated with each incoming role.
The parameter should be a reference to a hash with the following keys.
Reference to a list of role IDs for the roles of interest.
If TRUE, then each reaction will be associated with its formula. The default is FALSE, in which case for each role a simple list of reactions is returned.
Returns a reference to a hash, keyed by role ID. If -formulas
is FALSE, then each role will map to a list of reaction IDs. If -formulas
is TRUE, then each role maps to a sub-hash keyed by reaction ID. The sub-hash maps each reaction to a chemical formula string with compound IDs in place of the chemical labels.
$roleHash = { $role1 => [$rxn1a, $rxn1b, ...], $role2 => [$rxn2a, $rxn2b, ...}, ... };
$roleHash = { $role1 => { $rx1a => "$s1a1*$cpd1a1 + $s1a2*$cpd1a2 + ... => $s1ax*$cpd1ax + $s1ay*$cpd1ay + ...", $rx1b => "$s1b1*$cpd1b1 + $s1b2*$cpd1b2 + ... => $s1bx*$cpd1bx + $s1by*$cpd1by + ...", ... }, $role2 => { $rx2a => "$s2a1*$cpd2a1 + $s2a2*$cpd2a2 + ... => $s2ax*$cpd2ax + $s2ay*$cpd2ay + ...", $rx2b => "$s2b1*$cpd2b1 + $s2b2*$cpd2b2 + ... => $s2bx*$cpd2bx + $s2by*$cpd2by + ...", ... }, ... };
my $roleHash = $sapObject->roles_to_complexes({ -ids => [$role1, $role2, ...], });
Return the complexes (sets of related reactions) associated with each role in the incoming list. Roles trigger many complexes, and a complex may be triggered by many roles. A given role is considered either optional or necessary to the complex, and an indication of this will be included in the output.
The parameter should be a reference to a hash with the following keys:
Reference to a list of the IDs for the roles of interest.
Returns a reference to a hash mapping each incoming role ID to a list of 2-tuples, each consisting of (0) a complex ID, and (1) a flag that is TRUE if the role is optional and FALSE if the role is necessary for the complex to trigger.
$roleHash = { $role1 => [[$complex1a, $flag1a], [$complex1b, $flag1b], ...], $role2 => [[$complex2a, $flag2a], [$complex2b, $flag2b], ...], ... };
my $idHash = $sapObject->dlits_for_ids({ -ids => [id1,id2,...], -full => 1 });
Find the PUBMED literature references for a list of proteins. The proteins can be specified either was FIG feature IDs or protein sequence MD5s.
The parameter should be a reference to a hash with the following keys.
Reference to a list of gene and protein IDs. For each gene, literature references will be returned for the feature's protein. For each protein, the literature references for the protein will be returned. Genes should be specified using FIG feature IDs and proteins using the MD5 of the protein sequence.
If TRUE, then in addition to each literature article's PUBMED ID, the article title and URL will be returned. (NOTE: these will not always be available). The default is FALSE.
Returns a reference to a hash that maps each incoming ID to a list of publications. The publications will normally be represented by PUBMED IDs, but if -full
is TRUE, then each will be represented by a 3-tuple consisting of (0) the PUBMED ID, (1) the article title, and (2) the article URL.
$idHash = { $id1 => [$pubmed1a, $pubmed1b, ...], $id2 => [$pubmed2a, $pubmed2b, ...], ... };
$idHash = { $id1 => [[$pubmed1a, $title1a, $url1a], [$pubmed1b, $title1b, $url1b], ...], $id2 => [[$pubmed2a, $title2a, $url2a], [$pubmed2b, $title2b, $url2b], ...], ... };
my $labelHash = $sapObject->equiv_ids_for_sequences({ -seqs => [[$label1, $comment1, $sequence1], [$label2, $comment2, $sequence2], ...] });
Find all the identifiers in the database that produce the specified proteins.
The parameter should be a reference to a hash with the following keys.
Reference to a list of protein specifications. A protein specification can be a FASTA string, a 3-tuple consisting of (0) a label, (1) a comment, and (2) a protein sequence, OR a 2-tuple consisting of (0) a label and (1) a protein sequence. In other words, each specification can be a raw FASTA string, a parsed FASTA string, or a simple [id, sequence] pair. In every case, the protein sequence will be used to find identifiers and the label will be used to identify the results.
Returns a hash mapping each incoming label to a list of identifiers from the database that name the protein or a feature that produces the protein.
$labelHash = { $label1 => [$id1a, $id1b, ...], $label2 => [$id2a, $id2b, ...], ... };
my $nameHash = $sapObject->find_closest_genes({ -genome => $genome1, -seqs => { $name1 => $seq1, $name2 => #seq2, ... }, -protein => 1 });
Find the closest genes to the specified sequences in the specified genome.
Each indicated sequence will be converted to a DNA sequence and then the contigs of the specified genome will be searched for the sequence. The genes in closest proximity to the sequence will be returned. The sequences are named; in the return hash, the genes found will be associated with the appropriate sequence name.
The parameter should be a reference to a hash with the following keys.
ID of the genome to search.
Reference to a hash mapping names to sequences. The names will be used to associate the genes found with the incoming sequences. DNA sequences should not contain ambiguity characters.
If TRUE, the sequences will be interpreted as protein sequences. If FALSE, the sequences will be interpreted as DNA sequences.
Returns a reference to a hash mapping each sequence name to a list of 3-tuples, each consisting of (0) a gene ID, (1) the location of the gene, and (2) the location of the matching sequence.
$nameHash = { $name1 => [[$fid1a, $loc1a, $match1a], [$fid1b, $loc1b, $match1b], ...], $name2 => [[$fid2a, $loc2a, $match2a], [$fid2b, $loc2b, $match2b], ...], ... }
my $idHash = $sapObject->ids_to_sequences({ -ids => [$id1, $id2, ...], -protein => 1, -fasta => 1, -source => 'LocusTag', -genome => $genome, -comments => { $id1 => $comment1, $id2 => $comment2, ... } });
Compute a DNA or protein string for each incoming feature ID.
The parameter should be a reference to a hash with the following keys.
Reference to a list of feature IDs.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for all genomes.
If TRUE, the output FASTA sequences will be protein sequences; otherwise, they will be DNA sequences. The default is FALSE.
If TRUE, the output sequences will be multi-line FASTA strings instead of sequences. The default is FALSE, meaning the output sequences will be ordinary strings.
Allows the user to add a label or description to each FASTA formatted sequence. The values is a reference to a hash whose keys are the ids, and the values are the desired labels. This parameter is only used when the -fasta
option is specified.
Returns a hash mapping the incoming IDs to sequence strings. IDs that are not found in the database will not appear in the hash.
$idHash = { $id1 => $sequence1, $id2 => $sequence2, ... };
my $locHash = $sapObject->locs_to_dna({ -locations => { $label1 => $loc1, $label2 => $loc2, ... }, -fasta => 1 });
Return the DNA sequences for the specified locations.
The parameter should be a reference to a hash with the following keys.
Reference to a hash that maps IDs to locations. A location can be in the form of a "Location String", a reference to a list of location strings, a FIG feature ID, or a contig ID.
If TRUE, the DNA sequences will be returned in FASTA format instead of raw format. The default is FALSE.
Returns a reference to a hash that maps the incoming IDs to FASTA sequences for the specified DNA locations. The FASTA ID will be the ID specified in the incoming hash.
$locHash = { $label1 => $sequence1, $label2 => $sequence2, ... };
my $roleHash = $sapObject->roles_to_proteins({ -roles => [$role1, $role2, ...] });
Return a list of the proteins associated with each of the incoming functional roles.
The parameter should be a reference to a hash with the following keys.
Reference to a list of functional roles.
Returns a reference to a hash mapping each incoming role to a list of the proteins generated by features that implement the role. The proteins will be represented by MD5 protein IDs.
$roleHash = { $role1 => [$prot1a, $prot1b, ...], $role2 => [$prot2a, $prot2b, ...], ... };
my $featureHash = $sapObject->upstream({ -ids => [$fid1, $fid2, ...], -size => 200, -skipGene => 1, -fasta => 1, -comments => { $fid1 => $comment1, $fid2 => $comment2, ...} });
Return the DNA sequences for the upstream regions of the specified features. The nucleotides inside coding regions are displayed in upper case; others are displayed in lower case.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs of interest.
Number of upstream nucleotides to include in the output. The default is 200
.
If TRUE, only the upstream region is included. Otherwise, the content of the feature is included in the output.
If TRUE, the output sequences will be multi-line FASTA strings instead of sequences. The default is FALSE, meaning the output sequences will be ordinary strings.
Allows the user to add a label or description to each FASTA formatted sequence. The values is a reference to a hash whose keys are the ids, and the values are the desired labels. This parameter is only used when the -fasta
option is specified.
Returns a hash mapping each incoming feature ID to the DNA sequence of its upstream region.
$featureHash = { $fid1 => $sequence1, $fid2 => $sequence2, ... };
my $expList = $sapObject->all_experiments();
Return a list of all the experiment names.
Returns a reference to a list of experiment names.
$expList = [$exp1, $exp2, ...];
my $regulonHash = $sapObject->atomic_regulon_vectors({ -ids => [$ar1, $ar2, ...], -raw => 0 });
Return a map of the expression levels for each specified atomic regulon. The expression levels will be returned in the form of vectors with values -1
(suppressed), 1
(expressed), or 0
(unknown) in each position. The positions will correspond to the experiments in the order returned by "genome_experiments".
The parameter should be a reference to a hash with the following key.
Reference to a list of atomic regulon IDs.
If TRUE, then the vectors will be returned in the form of strings. Each string will have the character +
, -
, or space for the values 1, -1, and 0 respectively.
Returns a reference to a hash mapping the incoming atomic regulon IDs to the desired vectors. The vectors will normally be references to lists of values pf 1, 0, and -1, but they can also be represented as strings.
$regulonHash = { $ar1 => [$level1a, $level2a, ...], $ar2 => [$level2a, $level2b, ...], ... };
$regulonHash = { $ar1 => $string1, $ar2 => $string2, ... };
my $regulonHash = $sapObject->atomic_regulons({ -id => $genome1 });
Return a map of the atomic regulons for the specified genome. Each atomic regulon is a set of genes that are always regulated together. The map will connect each regulon ID to a list of those genes. A given gene can only be in one atomic regulon.
The parameter should be a reference to a hash with the following key.
The ID of the genome of interest.
Returns a reference to a hash that maps each atomic regulon ID to a list of the FIG IDs of its constituent genes.
$regulonHash = { $regulon1 => [$fid1a, $fid1b, ...], $regulon2 => [$fid2a, $fid2b, ...], ... };
my $fidHash = $sapObject->coregulated_correspondence({ -ids => [$fid1, $fid2, ...], -pcLevel => 0.8, -genomes => [$genome1, $genome2, ...] });
Given a gene, return genes that may be coregulated because they correspond to coregulated genes in genomes for which we have expression data (an expression-analyzed genome). For each incoming gene, a corresponding gene will be found in each expression-analyzed genome. The coregulated genes for the corresponding gene will be determined, and then these will be mapped back to the original genome. The resulting genes can be considered likely candidates for coregulation in the original genome.
The parameter should be a reference to a hash with the following key.
Reference to a list of FIG feature IDs.
Minimum pearson coefficient level for a gene to be considered coregulated. The default is 0.5
.
Reference to a list of genome IDs. If specified, only expression data from the listed genomes will be used in the analysis; otherwise, all genomes with expression data will be used.
Returns a reference to a hash that maps each incoming gene to a list of 4-tuples, each 4-tuple consisting of (0) a hypothetical coregulated gene in this genome, (1) a gene in an expression-analyzed genome corresponding to the input gene, (2) a gene in the expression-analyzed genome coregulated with it (and that corresponds to the hypothetical coregulated gene), and (3) the correlation score.
$fidHash = { $fid1 => [[$fid1a, $fid1ax, $fid1ay, $score1a], [$fid1b, $fid1bx, $fid1by, $score1b], ...], $fid2 => [[$fid2a, $fid2ax, $fid2ay, $score2a], [$fid2b, $fid2bx, $fid2by, $score2b], ...], ... };
my $fidHash = $sapObject->coregulated_fids({ -ids => [$fid1, $fid2, ...] });
Given a gene, return the coregulated genes and their pearson coefficients. Two genes are considered coregulated if there is some experimental evidence that their expression levels are related: the pearson coefficient indicates the strength of the relationship.
The parameter should be a reference to a hash with the following key.
Reference to a list of FIG feature IDs.
Returns a reference to a hash that maps each incoming FIG ID to a sub-hash. The sub-hash in turn maps each related feature's FIG ID to its pearson coefficient with the incoming FIG ID.
$fidHash = { $fid1 => { $fid1a => $coeff1a, $fid1b => $coeff1b, ...}, $fid2 => { $fid2a => $coeff2a, $fid2b => $coeff2b, ...}, ... };
my $expHash = $sapObject->experiment_fid_levels({ -ids => [$exp1, $exp2, ...] });
Given an experiment, return the on/off levels for all genes in that experiment. An on/off level is either 1
(expressed), -1
(inhibited), or 0
(unknown).
The parameter should be a reference to a hash with the following key.
Reference to a list of experiment IDs.
Returns a reference to a hash that maps each experiment ID to a sub-hash that indicates the expression level of each gene for which the experiment showed a result.
$expHash = { $exp1 => { $fid1a => $level1a, $fid1b => $level1b, ... }, $exp2 => { $fid2a => $level2a, $fid2b => $level2b, ... }, ... };
my $expHash = $sapObject->experiment_regulon_levels({ -ids => [$exp1, $exp2, ...] });
Given an experiment, return the on/off levels for all atomic regulons affected by that experiment. An on/off level is either 1
(expressed), -1
(inhibited), or 0
(unknown).
The parameter should be a reference to a hash with the following key.
Reference to a list of experiment IDs.
Returns a reference to a hash that maps each experiment ID to a sub-hash that indicates the expression level of each atomic regulon for which the experiment showed a result.
$expHash = { $exp1 => { $regulon1a => $level1a, $regulon1b => $level1b, ... }, $exp2 => { $regulon2a => $level2a, $regulon2b => $level2b, ... }, ... };
my $genomeList = $sapObject->expressed_genomes(( -names => 1 });
List the IDs of genomes for which expression data exists in the database.
The parameter should be a reference to a hash with the following keys.
If TRUE, then the return will be a reference to a hash mapping the genome IDs to genome names; if FALSE, the return will be a reference to a list of genome IDs. The default is FALSE.
Returns a reference to a list of genome IDs or a hash mapping genome IDs to genome names.
$genomeList = [$genome1, $genome2, ...];
$genomeList = { $genome1 => $name1, $genome2 => $name2, ... };
my $fidHash = $sapObject->fid_experiments({ -ids => [$fid1, $fid2, ...], -experiments => [$exp1, $exp2, ...] });
Return the expression levels for the specified features in all experiments for which they have results.
The parameter should be a reference to a hash with the following key.
Reference to a list of FIG feature IDs.
A list of experiments. If specified, only levels from the indicated experiments will be returned.
Returns a reference to a hash mapping each incoming feature ID to a list of 3-tuples, each 3-tuple containing (0) an experiment ID, (1) the expression on/off indication (1/0/-1), and (2) the normalized rma-value.
$fidHash = { $fid1 => [[$exp1a, $level1a, $rma1a], [$exp1b, $level1b, $rma1b], ...], $fid2 => [[$exp2a, $level2a, $rma2a], [$exp2b, $level2b, $rma2b], ...], ... };
my $regulonHash = $sapObject->fid_vectors({ -ids => [$fid1, $fid2, ...], -raw => 0 });
Return a map of the expression levels for each specified feature (gene). The expression levels will be returned in the form of vectors with values -1
(suppressed), 1
(expressed), or 0
(unknown) in each position. The positions will correspond to the experiments in the order returned by "genome_experiments".
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs.
If TRUE, then the vectors will be returned in the form of strings. Each string will have the character +
, -
, or space for the values 1, -1, and 0 respectively.
Returns a reference to a hash mapping the incoming atomic regulon IDs to the desired vectors. The vectors will normally be references to lists of values pf 1, 0, and -1, but they can also be represented as strings.
$regulonHash = { $fid1 => [$level1a, $level2a, ...], $fid2 => [$level2a, $level2b, ...], ... };
$regulonHash = { $fid1 => $string1, $fid2 => $string2, ... };
my $genomeHash = $sapObject->fids_expressed_in_range({ -ids => [$genome1, $genome2, ...], -minLevel => $min, -maxLevel => $max });
Return for each genome the genes that are expressed in a given fraction of the experiments for that ganome.
The parameter should be a reference to a hash containing the following keys.
Reference to a list of IDs for the genomes of interest.
Minimum expression level. Only genes expressed at least this fraction of the time will be output. Must be between 0
and 1
(inclusive) to be meaningful. The default is 0
, which gets everything less than or equal to the maximum level.
Maximum expression level. Only genes expressed no more than this fraction of the time will be output. Must be between 0
and 1
(inclusive) to be meaningful. The default is 1
, which gets everything greater than or equal to the minimum level.
Returns a hash that maps each incoming genome ID to a sub-hash. The sub-hash maps the FIG ID for each qualifying feature to the level (as a fraction of the total experiments recorded) that it is expressed.
$genomeHash = { $genome1 => { $fid1a => $level1a, $fid1b => $level1b, ...}, $genome1 => { $fid2a => $level2a, $fid2b => $level2b, ...}, };
my $fidHash = $sapObject->fids_to_regulons({ -ids => [$fid1, $fid2, ...] });
Return the atomic regulons associated with each incoming gene.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs for the genes of interest.
Returns a reference to a hash of hashes, keyed on FIG feature ID. Each feature is mapped to a sub-hash that maps the feature's atomic regulons to the number of features in each regulon.
$fidHash = { $fid1 => { $regulon1a => $size1a, $regulon1b => $size1b, ...}, $fid2 => { $regulon2a => $size2a, $regulon2b => $size2b, ...}, ... };
my $genomeHash = $sapObject->genome_experiments({ -ids => [$genome1, $genome2, ...] });
Return a list of the experiments for each indicated genome.
The parameter should be a reference to a hash with the following keys.
Reference to a list of genome IDs. For each genome ID, a list of relevant experiments will be produced.
Returns a hash mapping each incoming genome ID to a list of experiments related to that genome ID.
$featureHash = { $id1 => [$exp1a, $exp1b, ...], $id2 => [$exp2a, $exp2b, ...] };
my $fidHash = $sapObject->genome_experiment_levels({ -genome => $genome1, -experiments => [$exp1, $exp2, ...] });
Return the expression levels for the specified features in all experiments for which they have results.
The parameter should be a reference to a hash with the following keys.
ID of a genome for which expression data is present.
A list of experiments. If specified, only levels from the indicated experiments will be returned.
Returns a reference to a hash mapping each of the genome's feature IDs to a list of 3-tuples, each 3-tuple containing (0) an experiment ID, (1) the expression on/off indication (1/0/-1), and (2) the normalized rma-value.
$fidHash = { $fid1 => [[$exp1a, $level1a, $rma1a], [$exp1b, $level1b, $rma1b], ...], $fid2 => [[$exp2a, $level2a, $rma2a], [$exp2b, $level2b, $rma2b], ...], ... };
my $regHash = $sapObject->regulons_to_fids({ -ids => [$regulon1, $regulon2, ...] });
Return the list of genes in each specified atomic regulon.
The parameter should be a reference to a hash with the following keys.
Reference to a list of atomic regulon IDs.
Returns a reference to a hash mapping each incoming atomic regulon ID to a list of the FIG feature IDs for the genes found in the regulon.
$regHash = { $regulon1 => [$fid1a, $fid1b, ...], $regulon2 => [$fid2a, $fid2b, ...], ... };
NOTE: To get the functional assignment for a feature, see "Annotation and Assertion Data Methods".
my $result = $sapObject->compared_regions({ -focus => $fid1, -genomes => [$genome1, $genome2, ... ], -extent => 16000 });
Return information about the context of a focus gene and the corresponding genes in other genomes (known as pinned genes). The information returned can be used to create a compare-regions display.
The return information will be in the form of a reference to a list of contexts, each context containing genes in a region surrounding the pinned gene on a particular genome. The genome containing the focus gene will always be the first in the list.
The parameter should be a reference to a hash with the following keys.
The FIG ID of the focus gene.
The number of pinned genes desired. If specified, the closest genes to the focus gene will be located, at most one per genome. The default is 4
.
Reference to a list of genomes. If specified, only genes in the specified genomes will be considered pinned.
Reference to a list of FIG feature IDs. The listed genes will be used as the pinned genes. If this option is specified, it overrides -count
and -genomes
.
The number of base pairs to show in the context for each particular genome. The default is 16000
.
Returns a hash that maps each focus gene to the compared regions view for that gene.
Each compared regions view is a list of hashes, one hash per genome.
Each genome has the following keys:
genome_id => this genome's id genome_name => this genome's name row_id => the row number for this genome features => the features for this genome.
The features lists will consist of one or more 9-tuples, one per gene in the context. Each 8-tuple will contain (0) the gene's FIG feature ID, (1) its functional assignment, (2) its FIGfam ID, (3) the contig ID, (4) the start location, (5) the end location, (6) the direction (+
or -
), (7) the row index, and (8) the color index. All genes with the same color have similar functions.
$result = { focus_fid => [ { row_id => 0, genome_name => "g1name", genome_id => "g1id", features => [[$fid1a, $function1a, $figFam1a, $contig1a, $start1a, $end1a, $dir1a, 0, $color1a], [$fid1b, $function1b, $figFam1b, $contig1b, $start1b, $end1b, $dir1b, 0, $color1b], ... ], }, { row_id => 1, genome_name => "g2name", genome_id => "g2id", features => [[$fid2a, $function2a, $figFam2a, $contig2a, $start2a, $end2a, $dir2a, 1, $color2a], [$fid2b, $function2b, $figFam2b, $contig2b, $start2b, $end2b, $dir2b, 1, $color2b], ... ], }, ... ] };
my $idHash = $sapObject->equiv_sequence_ids({ -ids => [$id1, $id2, ...], -precise => 1 });
Return all identifiers for genes in the database that are protein-sequence-equivalent to the specified identifiers. In this case, the identifiers are assumed to be in their natural form (without prefixes). For each identifier, the identified protein sequences will be found and then for each protein sequence, all identifiers for that protein sequence or for genes that produce that protein sequence will be returned.
Alternatively, you can ask for identifiers that are precisely equivalent, that is, that identify the same location on the same genome.
The parameter should be a reference to a hash with the following keys.
Reference to a list of identifiers of interest. These can be normal feature identifiers in prefixed form (e.g. cmr|NT03SD3201
, gi|90022544
, fig|100226.1.peg.3361
) or their natural, un-prefixed form (NT03SD3201
, 90022544
). In addition, they can be protein sequence IDs formed by taking the hexadecimal MD5 hash of the protein sequence with an optional md5
or gnl|md5
prefix (500009d8cf094fa4e6a1ebb15295c60f
, gnl|md5|6a00b57a9facf5056c68e5d7fe157814
).
If TRUE, then only identifiers that refer to the same location on the same genome will be returned. The default is FALSE (return all sequence-equivalent IDs). If this option is specified, identifiers that refer to proteins rather than features will return no result.
If TRUE, then instead of returning a hash of lists, this method will return a hash of sub-hashes. Each sub-hash will be keyed by the equivalent IDs, and will map each ID to a list of 3-tuples describing assertions about the ID, each 3-tuple consisting of (0) an assertion of function, (1) the source of the assertion, and (2) a flag that is TRUE for an expert assertion and FALSE otherwise. IDs in a sub-hash which are not associated with assertions will map to an empty list.
Returns a reference to a hash that maps each incoming identifier to a list of sequence-equivalent identifiers.
$idHash = { $id1 => [$id1a, $id1b, ...], $id2 => [$id2a, $id2b, ...], ... };
$idHash = { $id1 => { $id1a => [[$assert1ax, $source1ax, $flag1ax], [$assert1ay, $source1ay, $flag1ay], ...], $id1b => [[$assert1bx, $source1bx, $flag1bx], [$assert1by, $source1by, $flag1by], ...]}, ... }, $id2 => { $id2a => [[$assert2ax, $source2ax, $flag2ax], [$assert2ay, $source2ay, $flag2ay], ...], $id2b => [[$assert2bx, $source2bx, $flag2bx], [$assert2by, $source2by, $flag2by], ...]}, ... }, ... };
The output identifiers will not include protein sequence IDs: these are allowed on input only as a convenience.
my $featureHash = $sapObject->fid_correspondences({ -ids => [$fid1, $fid2, ...], -genomes => [$genome1, $genome2, ...] });
Return the corresponding genes for the specified features in the specified genomes. The correspondences are determined in the same way as used by "gene_correspondence_map", but this method returns substantially less data.
The parameter should be a reference to a hash with the following keys.
Returns a reference to a hash that maps each incoming feature ID to a list of corresponding feature IDs in the specified genomes. If no sufficiently corresponding feature is found in any of the genomes, the feature ID will map to an empty list.
$featureHash = { $fid1 => [$fid1a, $fid1b, ...], $fid2 => [$fid2a, $fid2b, ...], ... };
my $featureHash = $sapObject->fid_locations({ -ids => [$fid1, $fid2, ...], -boundaries => 1 });
Return the DNA locations for the specified features.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs.
If TRUE, then for any multi-location feature, a single location encompassing all the location segments will be returned instead of a list of all the segments. If the segments cross between contigs, then the behavior in this mode is undefined (something will come back, but it may not be what you're expecting). The default is FALSE, in which case the locations for each feature will be presented in a list.
Returns a reference to a hash mapping each feature ID to a list of location strings representing the feature locations in sequence order.
$featureHash = { $fid1 => [$loc1a, $loc1b, ...], $fid2 => [$loc2a, $loc2b, ...], ... };
my $idHash = $sapObject->get_map_for_genome({ -idHash => { $myID1 => [$id1a, $id1b, ...], $myID2 => [$id2a, $id2b, ...], ... }, -genome => $genome1 });
Find FIG IDs corresponding to caller-provided genes in a specific genome.
In some situations you may have multiple external identifiers for various genes in a genome without knowing which ones are present in the Sapling database and which are not. The external identifiers present in the Sapling database are culled from numerous sources, but different genomes will tend to have coverage from different identifier types: some genomes are represented heavily by CMR identifiers and have no Locus Tags, others have lots of Locus Tags but no CMR identifiers, and so forth. This method allows you to throw everything you have at the database in hopes of finding a match.
The parameter should be a reference to a hash with the following keys.
Reference to a hash that maps caller-specified identifiers to lists of external identifiers in prefixed form (e.g. LocusTag:SO1103
, uni|QX8I1
, gi|4808340
). Each external identifier should be an alternate name for the same gene.
ID of a target genome. If specified, only genes in the specified target genome will be returned.
Returns a hash mapping the original caller-specified identifiers to FIG IDs in the target genome. If the identifier list is ambiguous, the first matching FIG ID will be used. If no matching FIG ID is found, an undefined value will be used.
$idHash = { $myID1 => $fid1, $myID2 => $fid2, ... };
my $featureHash = $sapObject->fid_possibly_truncated({ -ids => [$fid1, $fid2, ...], -limit => 300 });
For each specified gene, return stop
if its end is possibly truncated, start
if its beginning is possibly truncated, and an empty string otherwise. Truncation occurs if the gene is located near either edge of a contig.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG gene IDs.
The distance from the end of a contig considered to be at risk for truncation. the default is 300.
Returns a hash mapping each incoming gene ID to the appropriate value (start
if it has a possibly-truncated start, stop
if it has a possibly-truncated stop, or the empty string otherwise). Note that the empty string is expected to be the most common result.
$featureHash = { $fid1 => $note1, $fid2 => $note2, ... };
my $featureHash = $sapObject->fids_to_ids({ -ids => [$fid1, $fid2, ...], -types => [$typeA, $typeB, ...], -protein => 1 });
Find all aliases and/or synonyms for the specified FIG IDs. For each FIG ID, a hash will be returned that maps each ID type to a list of the IDs of that type.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs of interest,
Reference to a list of permissible ID types. Only ID types in this list will be present in the output. If omitted, all ID types are permissible.
If TRUE, then IDs for features with equivalent protein sequences will be returned; otherwise, only IDs for precisely equivalent genes will be returned. The default is FALSE
If TRUE, then the IDs will be returned in their natural form; otherwise, the IDs are returned in prefixed form. The default is FALSE.
Returns a reference to a hash that maps each feature ID to a sub-hash. Each sub-hash maps an ID type to a list of equivalent IDs of that type.
$featureHash = { $fid1 => { $typeA => [$id1A1, $id1A2, ...], $typeB => [$id1B1, $id1B2, ...], ... }, $fid2 => { $typeA => [$id2A1, $id2A2, ...], $typeB => [$id2B1, $id2B2, ...], ... }, ... };
my $fidHash = $sapObject->fids_to_proteins({ -ids => [$fid1, $fid2, ...], -sequence => 1 });
Return the ID or amino acid sequence associated with each specified gene's protein. If the gene does not produce a protein, it will not be included in the output.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs, representing the features of interest.
If TRUE, then the output will include protein sequences; otherwise, the output will include MD5 protein IDs. The default is FALSE.
Returns a reference to a hash keyed by feature ID. If -sequence
is FALSE, then the hash maps each feature ID to the MD5 ID of the relevant gene's protein sequence. If -sequence
is TRUE, then the hash maps each feature ID to the relevant protein sequence itself.
$fidHash = { $fid1 => $sequence1, $fid2 => $sequence2, ... };
$fidHash = { $fid1 => $md5id1, $fid2 => $md5id2, ... };
my $featureHash = $sapObject->fids_with_evidence_codes({ -codes => [$code1, $code2, ...], -genomes => [$genome1, $genome2, ...] });
Return the ID, assignment, and evidence for all features having an evidence code of one of the specified types. The output can be restricted to one or more specified genomes.
The parameter should be a reference to a hash with the following keys.
Reference to a list of evidence code types. This is only the prefix, not a full-blown code. So, for example, ilit
would be used for indirect literature references, dlit
for direct literature references, and so forth.
Reference to a list of genome IDs. If no genome IDs are specified, all features in all genomes will be processed.
Returns a hash mapping each feature to a list containing the function followed by all of the feature's evidence codes.
$featureHash = { $fid1 => [$function1, $code1A, $code1B, ...], $fid2 => [$function2, $code2A, $code2B, ...], ... };
my $locHash = $sapObject->genes_in_region({ -locations => [$loc1, $loc2, ...], -includeLocation => 1 });
Return a list of the IDs for the features that overlap the specified regions on a contig.
The parameter should be a reference to a hash with the following keys.
Reference to a list of location strings (e.g. 360108.3:NZ_AANK01000002_264528_264007
or 100226.1:NC_003888_3766170+612
). A location string consists of a contig ID (which includes the genome ID), an underscore, a begin offset, and either an underscore followed by an end offset or a direction (+
or -
) followed by a length.
If TRUE, then instead of mapping each location to a list of IDs, the hash will map each location to a hash reference that maps the IDs to their locations.
Returns a reference to a hash mapping each incoming location string to a list of the IDs for the features that overlap that location.
$locHash = { $loc1 => [$fid1A, $fid1B, ...], $loc2 => [$fid2A, $fid2B, ...], ... };
my $featureHash = $sapObject->ids_to_data({ -ids => [$id1, $id2, ...], -data => [$fieldA, $fieldB, ...], -source => 'UniProt' });
Return the specified data items for the specified features.
The parameter should be a reference to a hash with the following keys.
Reference to a list of gene identifiers. Normally, these would be FIG feature IDs, but other identifier types can be specified if you use the -source
option.
Reference to a list of data field names. The possible data field names are given below.
Comma-delimited list of evidence codes indicating the reason for the gene's current assignment.
The FIG ID of the gene.
Current functional assignment.
Name of the genome containing the gene.
Number of base pairs in the gene.
Comma-delimited list of location strings indicated the location of the gene in the genome. A location string consists of a contig ID, an underscore, the starting offset, the strand (+
or -
), and the number of base pairs.
Comma-delimited list of PUBMED IDs for publications related to the gene.
Database source of the IDs specified-- e.g. SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.
Returns a hash mapping each incoming ID to a list of tuples, There will be one tuple for each feature identified by the incoming ID (because some IDs are ambiguous there may be more than one), and the tuple will contain the specified data fields for the computed gene in the specified order.
$featureHash = { $id1 => [$tuple1A, $tuple1B, ...], $id2 => [$tuple2A, $tuple2B, ...], ... };
my $idHash = $sapObject->ids_to_fids({ -ids => [$id1, $id2, ...], -protein => 1, -genomeName => $genusSpeciesString, -source => 'UniProt' });
Return a list of the FIG IDs corresponding to each of the specified identifiers. The correspondence can either be gene-based (same feature) or sequence-based (same protein).
The parameter should be a reference to a hash with the following keys.
Reference to a list of identifiers.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth).
If TRUE, then all FIG IDs for equivalent proteins will be returned. The default is FALSE, meaning that only FIG IDs for the same gene will be returned.
The full or partial name of a genome or a comma-delimited list of genome IDs. This parameter is useful for narrowing the results when a protein match is specified. If it is omitted, no genome filtering is performed.
Returns a reference to a hash mapping each incoming identifier to a list of equivalent FIG IDs.
$idHash = { $id1 => [$fid1A, $fid1B, ...], $id2 => [$fid2A, $fid2B, ...], ... };
my $featureHash = $sapObject->ids_to_genomes({ -ids => [$id1, $id2, ...], -source => 'SwissProt', -name => 1 });
Return the genome information for each incoming gene ID.
The parameter should be a reference to a hash with the following keys.
Reference to a list of gene IDs.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
If TRUE, the genomes names will be returned; if FALSE, the genome IDs will be returned. The default is FALSE.
Returns a reference to a hash mapping each incoming ID to the associated genome ID, or alternatively to the associated genome name.
$featureHash = { $id1 => $genome1, $id2 => $genome2, ... };
my $geneHash = $sapObjects->ids_to_lengths({ -ids => [$id1, $id2, ...], -protein => 1, -source => 'NCBI' });
Return the DNA or protein length of each specified gene.
The parameter should be a reference to a hash with the following keys.
Reference to a list of gene IDs.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.
If TRUE, then the length of each gene's protein will be returned. Otherwise, the DNA length of each gene will be returned. The default is FALSE (DNA lengths).
Returns a reference to a hash mapping each incoming ID to the length of the associated gene. If no gene is found, or -protein is TRUE and the gene is not a protein-encoding gene, the ID will not be present in the return hash.
$geneHash = { $id1 => $length1, $id2 => $length2, ... };
my $groupHash = $sapObject->make_runs({ -groups => ["$fid0a, $fid0b, ...", "$fid1a, $fid1b, ...", ...], -maxGap => 200, -justFirst = 1, -operonSize => 10000 });
Look at sequences of feature IDs and separate them into operons. An operon contains features that are close together on the same contig going in the same direction.
The parameter should be a reference to a hash with the following keys.
Reference to a list of strings. Each string will contain a comma-separated list of FIG feature IDs for the features in a group. Alternatively, this can be a reference to a list of lists, in which each sub-list contains the feature IDs in a group.
Maximum number of base pairs that can be between to genes in order for them to be considered as part of the same operon. The default is 200.
If TRUE, then only the first feature in an operon will be included in the output operon strings. The default is FALSE.
Estimate of the typical size of an operon. This is a tuning parameter; the default is 10000
.
Returns a hash mapping group numbers to lists of operons. In other words, for each incoming group, the hash will map the group's (zero-based) index number to a list of operon strings. Each operon string is a comma-separated list of feature IDs in operon order.
$groupHash = { 0 => [[$fid1op1, $fid2op1, ...], [$fid1op2, $fid2op2, ...], ... ], 1 => [[$fid1opA, $fid2opB, ...], [$fid1opB, $fid2opB, ...], ... ], ... };
my $protHash = $sapObject->proteins_to_fids({ -prots => [$prot1, $prot2, ...] });
Return the FIG feature IDs associated with each incoming protein. The protein can be specified as an amino acid sequence or MD5 protein ID.
The parameter should be a reference to a hash with the following keys.
Reference to a list of proteins. Each protein can be specified as either an amino acid sequence or an MD5 protein ID. The method will assume a sequence of 32 hex characters is an MD5 ID and anything else is an amino acid sequence. Amino acid sequences should be in upper-case only.
Returns a hash mapping each incoming protein to a list of FIG feature IDs for the genes that produce the protein.
$protHash = { $prot1 => [$fid1a, $fid1b, ...], $prot2 => [$fid2a, $fid2b, ...], ... };
my $ffHash = $sapObject->all_figfams({ -roles => [$role1, $role2, ...], -functions => [$function1, $function2, ...] });
Return a list of all the FIGfams along with their functions. Optionally, you can specify a role or a function, and only FIGfams with that role or function will be returned.
The parameter should be a reference to a hash with the following keys.
If specified, a reference to a list of roles. Only FIGfams with one of the specified roles (or one of the functions listed in -functions
) will be returned in the hash.
If specified, a reference to a list of functions. Only FIGfams with one of the specified functions (or one of the roles listed in -roles
) will be returned in the hash.
Returns a reference to a hash mapping each qualifying FIGfam ID to its function.
$ffHash = { $ff1 => $function1, $ff2 => $function2, ... };
my $groupList = $sapObject->discriminating_figfams({ -group1 => [$genome1a, $genome2a, ...], -group2 => [$genome2a, $genome2b, ...] });
Determine the FIGfams that discriminate between two groups of genomes.
A FIGfam discriminates between genome groups if it is common in one group and uncommon in the other. The degree of discrimination is assigned a score based on statistical significance, with 0 being insignificant and 2 being extremely significant. FIGfams with a score greater than 1 are returned by this method.
The parameter should be a reference to a hash with the following keys.
Returns a reference to a 2-tuple, consisting of (0) a hash mapping FIGfam IDs to scores for FIGfams common in group 1 and (1) a hash maping FIGfam IDs to scores for FIGfams common in group 2.
$groupList = [{ $ff1a => $score1a, $ff1b => $score1b, ... }, { $ff2a => $score2a, $ff2b => $score2b, ... }];
my $fidList = $sapObject->figfam_fids({ -id => $figFam1, -fasta => 1 });
Return a list of all the protein encoding genes in a FIGfam. The genes can be returned as IDs or as FASTA strings.
The parameter should be a reference to a hash with the following keys.
Returns a reference to a list of genes in the form of FIG feature IDs or protein FASTA strings.
$fidList = [$fid1, $fid2, ...];
$fidList = [$fasta1, $fasta2, ...];
my $fidHash = $sapObject->figfam_fids_batch({ -ids => [$ff1, $ff2, ...], -genomeFilter => $genome1 });
Return a list of all the protein encoding genes in one or more FIGfams. This method is an alternative to "figfam_fids" that is faster when you need the feature IDs but not the protein sequences.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the IDs of the desired FIGfams.
The ID of a genome. If specified, then only feature IDs from the specified genome will be returned.
Returns a hash mapping each incoming FIGfam ID to a list of the IDs for the features in that FIGfam.
$fidHash = { $ff1 => [$fid1a, $fid1b, ...], $ff2 => [$fid2a, $fid2b, ...], ... };
my $ffHash = $sapObject->figfam_function({ -ids => [$ff1, $ff2, ...] });
For each incoming FIGfam ID, return its function, that is, the common functional assignment of all its members.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIGfam IDs.
Returns a hash mapping each incoming FIGfam ID its function string.
$ffHash => { $ff1 => $function1, $ff2 => $function2, ... };
my $genomeHash = $sapObject->genome_figfams({ -ids => [$genome1, $genome2, ...] });
Compute the list of FIGfams represented in each specific genome.
The parameter should be a reference to a hash with the following keys.
Reference to a list of genome identifiers.
Returns a reference to a hash mapping each incoming genome ID to a list of the IDs of the FIGfams represented in that genome.
$genomeHash = { $genome1 => [$ff1a, $ff1b, ...], $genome2 => [$ff2a, $ff2b, ...], ... };
my $featureHash = $sapObject->ids_to_figfams({ -ids => [$id1, $id2, ...], -functions => 1, -source => 'RefSeq' });
This method returns a hash mapping each incoming feature to its FIGfam.
The parameter should be a reference to a hash with the following keys.
Reference to a list of feature identifiers.
If TRUE, the family function will be returned in addition to the list of FIGfam IDs. In this case, instead of a list of FIGfam IDs, each feature ID will point to a list of 2-tuples, each consisting of (0) a FIGfam ID followed by (1) a function string. The default is FALSE.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.
Returns a reference to a hash mapping each incoming feature ID to a list of the IDs of the FIGfams that contain it. (In general the list will be a singleton unless the feature ID corresponds to multiple actual features.) Features not in FIGfams will be omitted from the hash.
$featureHash = { $id1 => [$ff1a, $ff1b, ...], $id2 => [$ff2a, $ff2b, ...], ... };
my $ffHash = $sapObject->related_figfams({ -ids => [$ff1, $ff2, ...], -expscore => 1, -all => 1 });
This method takes a list of FIGfam IDs. For each FIGfam, it returns a list of FIGfams related to it by functional coupling.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIGfam IDs.
If TRUE, then the score returned will be the co-expression score. If FALSE, the score returned will be the co-occurrence score. This option is ignored if -all
is specified. The default is FALSE.
If TRUE, then both scores will be returned. The default is FALSE, meaning only one score is returned.
Returns a reference to a hash mapping each incoming FIGfam ID to a list of 2-tuples for other FIGfams. The 2-tuples each consist of (0) a related FIGfam's ID followed by (1) a 2-tuple containing a coupling score and the related FIGfam's function.
$ffHash = { $ff1 => [[$ff1a, [$score1a, $function1a]], [$ff1b, [$score1b, $function1b]], ...], $ff2 => [[$ff2a, [$score2a, $function2a]], [$ff2b, [$score2b, $function2b]], ...], ... };
Returns a reference to a hash mapping each incoming FIGfam ID to a list of 2-tuples for other FIGfams. The 2-tuples each consist of (0) a related FIGfam's ID followed by (1) a 3-tuple containing the co-occurrence coupling score, the co-expression coupling score, and the related FIGfam's function.
$ffHash = { $ff1 => [[$ff1a, [$score1ax, $score1ay, $function1a]], [$ff1b, [$score1bx, $score1by, $function1b]], ...], $ff2 => [[$ff2a, [$score2ax, $score2ay, $function2a]], [$ff2b, [$score2bx, $score2by, $function2b]], ...], ... };
my $roleHash = $sapObject->roles_to_figfams({ -roles => [$role1, $role2, ...] });
For each incoming role, return a list of the FIGfams that implement the role, that is, whose functional assignments include the role.
The parameter should be a reference to a hash with the following keys.
Reference to a list of role names.
Returns a reference to a hash mapping each incoming role to a list of FIGfam IDs for the FIGfams that implement the role.
$roleHash = { $role1 => [$ff1a, $ff1b, ...], $role2 => [$ff2a, $ff2b, ...], ... };
my $featureHash = $sapObject->clusters_containing({ -ids => [$fid1, $fid2, ...] });
This method takes as input a list of FIG feature IDs. For each feature, it returns the IDs and functions of other features in the same cluster of functionally-coupled features.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs.
For backward compatibility, this method can also take as input a reference to a list of FIG feature IDs.
Returns a reference to a hash. The hash maps each incoming feature ID to a 2-tuple containing (0) the feature's functional assignment and (1) a reference to a hash that maps each clustered feature to its functional assignment.
$featureHash = { $fid1 => [$function1, { $fid1a => $function1a, $fid1b => $function1b, ...}], $fid2 => [$function2, { $fid2a => $function2a, $fid2b => $function2b, ...}], ... };
In backward-compatibility mode, this method returns a reference to a list. For each incoming feature, there is a list entry containing the feature ID, the feature's functional assignment, and a sub-list of 2-tuples. Each 2-tuple contains the ID of another feature in the same cluster and its functional assignment.
my $pairHash = $sapObject->co_occurrence_evidence({ -pairs => ["$fid1:$fid2", "$fid3:$fid4", ...] });
For each specified pair of genes, this method returns the evidence that the genes are functionally coupled (if any); that is, it returns a list of the physically close homologs for the pair.
The parameter should be a reference to a hash with the following keys.
Reference to a list of functionally-coupled pairs. Each pair is represented by two FIG gene IDs, either in the form of a 2-tuple or as a string with the two gene IDs separated by a colon.
Returns a hash mapping each incoming gene pair to a list of 2-tuples. Each 2-tuple contains a pair of physically close genes, the first of which is similar to the first gene in the input pair, and the second of which is similar to the second gene in the input pair. The hash keys will consist of the two gene IDs separated by a colon (e.g. fig|273035.4.peg.1016:fig|273035.4.peg.1018
).
$pairHash = { "$fid1:$fid2" => [[$fid1a, $fid2a], [$fid1b, $fid2b], ...], "$fid3:$fid4" => [[$fid3a, $fid4a], [$fid3b, $fid4b], ...], ... };
my $featureHash = $sapObject->conserved_in_neighborhood({ -ids => [$fid1, $fid2, ...] });
This method takes a list of feature IDs. For each feature ID, it will return the set of other features to which it is functionally coupled, along with the appropriate score.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs.
For backward compatibility, this method can also take as input a reference to a list of FIG feature IDs.
Returns a reference to a hash mapping each incoming feature ID to a list of 4-tuples, one 4-tuple for each feature coupled to the incoming feature. Each 4-tuple contains (0) the coupling score, (1) the FIG ID of the coupled feature, (2) the coupled feature's current functional assignment, and (3) the ID of the pair set to which the coupling belongs.
$featureHash = { $fid1 => [[$score1A, $fid1A, $function1A, $psID1A], [$score1B, $fid1B, $function1B, $psID1B], ...], $fid2 => [[$score2A, $fid2A, $function2A, $psID2A], [$score2B, $fid2B, $function2B, $psID2B], ...], ... };
In backward compatibility mode, returns a list of sub-lists, each sub-list corresponding to the value that would be found in the hash for the feature in the specified position of the input list.
my $psHash = $sapObject->pairsets({ -ids => [$psID1, $psID2, ...] });
This method takes as input a list of functional-coupling pair set IDs (such as those returned in the output of "conserved_in_neighborhood"). For each pair set, it returns the set's score (number of significant couplings) and a list of the coupled pairs in the set.
The parameter should be a reference to a hash with the following keys.
Reference to a list of functional-coupling pair set IDs.
For backward compatibility, you may also specify a reference to a list of pair set IDs.
Returns a reference to a hash that maps each incoming pair-set ID to a 2-tuple that consists of (0) the set's score and (1) a reference to a list of 2-tuples containing the pairs in the set.
$psHash = { $psID1 => [$score1, [[$fid1A, $fid1B], [$fid1C, $fid1D], ...]], $psID2 => [$score2, [[$fid2A, $fid2B], [$fid2C, $fid2D], ...]], ... };
In backward-compatibility mode, returns a reference to a list of 2-tuples, each consisting of (0) an incoming pair-set ID, and (1) the 2-tuple that would be its hash value in the normal output.
my $featureHash = $sapObject->related_clusters({ -ids => [$fid1, $fid2, ...] });
This method returns the functional-coupling clusters related to the specified input features. Each cluster contains features on a single genome that are related by functional coupling.
The parameter should be a reference to a hash with the following keys.
Reference to a list of FIG feature IDs.
Returns a reference to a hash that maps each incoming feature ID to a list of clusters. Each cluster in the list is a 3-tuple consisting of (0) the ID of a feature similar to the incoming feature, (1) the similarity P-score, and (2) a reference to a list of 2-tuples containing clustered features and their functional assignments.
$featureHash = { $fid1 => [[$fid1A, $score1A, [[$fid1Ax, $function1Ax], [$fid1Ay, $function1Ay], ...]], [$fid1B, $score1B, [[$fid1Bx, $function1Bx], [$fid1By, $function1By], ...]], ...], $fid2 => [[$fid2A, $score2A, [[$fid2Ax, $function2Ax], [$fid2Ay, $function2Ay], ...]], [$fid2B, $score2B, [[$fid2Bx, $function2Bx], [$fid2By, $function2By], ...]], ...], ... };
my $genomeHash = $sapObject->all_features({ -ids => [$genome1, $genome2, ...], -type => [$type1, $type2, ...], });
Return a list of the IDs for all features of a specified type in a specified genome.
The parameter should be a reference to a hash with the following keys.
Reference to a list of genome IDs.
Type of feature desired (e.g. peg
, rna
), or a reference to a list of desired feature types. If omitted, all features regardless of type are returned.
Returns a reference to a hash that maps each incoming genome ID to a list of the desired feature IDs for that genome. If a genome does not exist or has no features of the desired type, its ID will map to an empty list.
$genomeHash = { $genome1 => [$fid1a, $fid1b, ...], $genome2 => [$fid2a, $fid2b, ...], ... };
my $genomeHash = $sapObject->all_genomes({ -complete => 1, -prokaryotic => 1 });
Return a list of the IDs for all the genomes in the system.
Reference to a hash containing the following keys.
If TRUE, only complete genomes will be returned. The default is FALSE (return all genomes).
If TRUE, only prokaryotic genomes will be returned. The default is FALSE (return all genomes).
Returns a reference to a hash mapping genome IDs to genome names.
$genomeHash = { $genome1 => $name1, $genome2 => $name2, ... };
my $fidHash = $sapObject->all_proteins({ -id => $genome1 });
Return the protein sequences for all protein-encoding genes in the specified genome.
The parameter should be a reference to a hash with the following keys.
A single genome ID. All of the protein sequences for genes in the specified genome will be extracted.
Returns a reference to a hash that maps the FIG ID of each protein-encoding gene in the specified genome to its protein sequence.
$fidHash = { $fid1 => $protein1, $fid2 => $protein2, ... };
my $genomeHash = $sapObject->close_genomes({ -ids => [$genome1, $genome2, ...], -count => 10, });
Find the genomes functionally close to the input genomes.
Functional closeness is determined by the number of FIGfams in common. As a result, this method will not produce good results for genomes that do not have good FIGfam coverage.
The parameter should be a reference to a hash with the following keys.
Reference to a list of genome IDs for the genomes whose close neighbors are desired.
Maximum number of close genomes to return for each input genome. The default is 10
.
Returns a reference to a hash mapping each incoming genome ID to a list of 2-tuples. Each 2-tuple consists of (0) the ID of a close genome and (2) the score (from 0 to 1) for the match. The list will be sorted from closest to furthest.
my $contigHash = $sapObject->contig_sequences({ -ids => [$contig1, $contig2, ...] });
Return the DNA sequences for the specified contigs.
The parameter should be a reference to a hash with the following keys.
Reference to a list of contig IDs. Note that the contig ID contains the genome ID as a prefix (e.g. 100226.1:NC_003888
).
Returns a reference to a hash that maps each contig ID to its DNA sequence.
$contigHash = { $contig1 => $dna1, $contig2 => $dna2, ... };
my $contigHash = $sapObject->contig_lengths({ -ids => [$contig1, $contig2, ...] });
Return the lengths for the specified contigs.
The parameter should be a reference to a hash with the following keys.
Reference to a list of contig IDs. Note that the contig ID contains the genome ID as a prefix (e.g. 100226.1:NC_003888
).
Returns a reference to a hash that maps each contig ID to its length in base pairs.
$contigHash = { $contig1 => $len1, $contig2 => $len2, ... };
my $geneHash = $sapObject->gene_correspondence_map({ -genome1 => $genome1, -genome2 => $genome2, -fullOutput => 1, -passive => 0 });
Return a map of genes in the specified second genome that correspond to genes in the specified first genome.
The parameter should be a reference to a hash with the following keys.
ID of the first genome of interest.
ID of the second genome of interest.
If 1
, then instead of a simple hash map, a list of lists will be returned. If 2
, then the list will contain unidirectional correspondences from the target back to the source as well as bidirectional corresopndences and unidirectional correspondences from the source to the target. The default is 0
, which returns the hash map.
If TRUE, then an undefined value will be returned if no correspondence file exists. If FALSE, a correspondence file will be created and cached on the server if one does not already exist. This is an expensive operation, so set the flag to TRUE if you are worried about performance. The default is FALSE.
This method will return an undefined value if either of the genome IDs is missing, not found, or incomplete.
Returns a hash that maps each gene in the first genome to a corresponding gene in the second genome. The correspondence is determined by examining factors such as functional role, conserved neighborhood, and similarity.
$geneHash = { $g1gene1 => $g2gene1, $g1gene2 => $g2gene2, $g1gene3 => $g2gene3, ... };
Returns a reference to list of sub-lists. Each sub-list contains 18 data items, as detailed in "Gene Correspondence List" in ServerThing.
my $genomeHash = $sapObject->genome_contig_md5s({ -ids => [$genome1, $genome2, ...] });
For each incoming genome, return a hash mapping its contigs to their MD5 identifiers.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the genome IDs.
Returns a hash that maps each incoming genome ID to a sub-hash that maps its contig IDs to their MD5 identifiers. The MD5 identifiers are computed directly from the contig DNA sequences.
$genomeHash = { $genome1 => {$contig1a => $md5id1a, $contig1b => $md5id1b, ... }, $genome2 => {$contig2a => $md5id2a, $contig2b => $md5id2b, ... }, ... };
my $genomeHash = $sapObject->genome_contigs({ -ids => [$genome1, $genome2, ...] });
For each incoming genome, return a list of its contigs.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the genome IDs.
Returns a hash that maps each incoming genome ID to a list of its contig IDs.
$genomeHash = { $genome1 => [$contig1a, $contig1b, ...], $genome2 => [$contig2a, $contig2b, ...], ... };
my $genomeHash = $sapObject->genome_data({ -ids => [$genome1, $genome2, ...], -data => [$fieldA, $fieldB, ...] });
Return the specified data items for the specified genomes.
The parameter should be a reference to a hash with the following keys.
Reference to a list of genome IDs.
Reference to a list of data field names. The possible data field names are given below.
1
if the genome is more or less complete, else 0
.
The number of contigs for the genome
The number of base pairs in the genome
The domain of the genome (Archaea, Bacteria, ...).
The amount of GC base pairs in the genome, expressed as a percentage of the genome's DNA.
The genetic code used by this genome.
The number of protein encoding genes in the genome.
The number of RNAs in the genome.
The scientific name of the genome.
The genome's full taxonomy as a comma-separated string.
The MD5 identifier computed from the genome's DNA sequences.
Returns a hash mapping each incoming genome ID to an n-tuple. Each tuple will contain the specified data fields for the computed gene in the specified order.
$genomeHash = { $id1 => [$data1A, $data1B, ...], $id2 => [$data2A, $data2B, ...], ... };
my $genomeHash = $sapObject->genome_domain({ -ids => [$genome1, $genome2, ...] });
Return the domain for each specified genome (e.g. Archaea
, Bacteria
, Plasmid
).
The parameter should be a reference to a hash with the following keys.
Reference to a list of the genome IDs.
Returnss a hash that maps each incoming genome ID to its taxonomic domain.
my $genomeHash = $sapObject->genome_fid_md5s({ -ids => [$genome1, $genome2, ...] });
For each incoming genome, return a hash mapping its genes to their MD5 identifiers.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the genome IDs.
Returns a hash that maps each incoming genome ID to a sub-hash that maps its FIG feature IDs to their MD5 identifiers. The MD5 identifiers are computed from the genome's MD5 identifier and the gene's location in the genome.
$genomeHash = { $genome1 => {$fid1a => $md5id1a, $fid1b => $md5id1b, ... }, $genome2 => {$fid2a => $md5id2a, $fid2b => $md5id2b, ... }, ... };
my $genomeHash = $sapObject->genome_ids({ -names => [$name1, $name2, ...], -taxons => [$tax1, $tax2, ...] });
Find the specific genome ID for each specified genome name or taxonomic number. This method helps to find the correct version of a given genome when only the species and strain are known.
The parameter should be a reference to a hash with the following keys.
Reference to a list of genome scientific names, including genus, species, and strain (e.g. Streptomyces coelicolor A3(2)
). A genome ID will be found (if any) for each specified name.
Reference to a list of genome taxonomic numbers. These are essentially genome IDs without an associated version number (e.g. 100226
). A specific matching genome ID will be found; the one chosen will be the one with the highest version number that is not a plasmid.
Returns a hash mapping each incoming name or taxonomic number to the corresponding genome ID.
$genomeHash = { $name1 => $genome1, $name2 => $genome2, ... $tax1 => $genome3, $tax2 => $genome4, ... };
my $genomeHash = $sapObject->genome_metrics({ -ids => [$genome1, $genome2, ...] });
For each incoming genome ID, returns the number of contigs, the total number of base pairs in the genome's DNA, and the genome's default genetic code.
The parameter should be a reference to a hash with the following keys.
Reference to a list of genome IDs.
Returns a hash mapping each incoming genome ID to a 3-tuple consisting of (0) the number of contigs, (1) the total DNA size, and (2) the genome's default genetic code.
$genomeHash = { $genome1 => [$contigCount1, $baseCount1, $geneticCode1], $genome2 => [$contigCount2, $baseCount2, $geneticCode2], ... };
my $idHash = $sapObject->genome_names({ -ids => [$id1, $id2, ...], -numbers => 1 });
Return the name of the genome containing each specified feature or genome.
The parameter should be a reference to a hash with the following keys.
Reference to a list of identifiers. Each identifier can be a prefixed feature ID (e.g. fig|100226.1.peg.3361
, uni|P0AC98
) or a genome ID (83333.1
, 360108.3
).
If TRUE, the genome ID number will be returned instead of the name. Note that this facility is only useful when the incoming identifiers are feature IDs, as genome IDs would be mapped to themselves.
Returns a reference to a hash mapping each incoming feature ID to the scientific name of its parent genome. If an ID refers to more than one real feature, only the first feature's genome is returned.
$idHash = { $id1 => $genomeName1, $id2 => $genomeName2, ... };
my $md5Hash = $sapObject->genomes_by_md5({ -ids => [$md5id1, $md5id2, ...], -names => 1 });
Find the genomes associated with each specified MD5 genome identifier. The MD5 genome identifier is computed from the DNA sequences of the genome's contigs; as a result, two genomes with identical sequences arranged in identical contigs will have the same MD5 identifier even if they have different genome IDs.
The parameter should be a reference to a hash with the following keys.
Reference to a list of MD5 genome identifiers.
If TRUE, then both genome IDs and their associated names will be returned; otherwise, only the genome IDs will be returned. The default is FALSE.
Returns a reference to a hash keyed by incoming MD5 identifier. Each identifier maps to a list of genomes. If -names
is FALSE, then the list is of genome IDs; if -names
is TRUE, then the list is of 2-tuples, each consisting of (0) a genome ID and (1) the associated genome's scientific name.
-names
= TRUE$md5Hash = { $md5id1 => [[$genome1a, $name1a], [$genome1b, $name1b], ...], $md5id2 => [[$genome2a, $name2a], [$genome2b, $name2b], ...], ... };
-names
= FALSE$md5Hash = { $md5id1 => [$genome1a, $genome1b, ...], $md5id2 => [$genome2a, $genome2b, ...], ... };
my $locList = $sapObject->intergenic_regions({ -genome => $genome1, -type => ['peg', 'rna'] });
Return a list of "Location Strings" for the regions in the specified genome that are not occupied by genes of the specified types. All of these will be construed to be on the forward strand, and sorted by contig ID and start location within contig.
The parameter should be a reference to a hash with the following keys.
ID of the genome whose intergenic regions are to be returned.
Reference to a list of gene types. Only genes of the specified type will be considered to be occupying space on the contigs. Typically, this parameter will either be peg
or a list consisting of peg
and rna
. The default is to allow all gene types, but this will not generally produce a good result.
Returns a reference to a list of location strings, indicating the intergenic region locations for the genome.
$locList = [$loc1, $loc2, ...]
my $genomeHash = $sapObject->is_prokaryotic({ -ids => [$genome1, $genome2, ...] });
For each incoming genome ID, returns 1 if it is prokaryotic and 0 otherwise.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the relevant genome IDs.
Returns a reference to a hash that maps each incoming genome ID to 1
if it is a prokaryotic genome and 0
otherwise.
$genomeHash = { $genome1 => $flag1, $genome2 => $flag2, ... };
my $genomeHash = $sapObject->mapped_genomes({ -ids => [$genome1, $genome2, ...] });
For each incoming genome, return a list of the genomes that have an existing gene correspondence map (see "Gene Correspondence List" in ServerThing). Gene correspondence maps indicate which genes in the target genome are the best hit of each gene in the source genome. If a correspondence map does not yet exist, it will be created when you ask for it, but this is an expensive process and it is sometimes useful to find an alternate genome that will give you a faster result.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the IDs for the genomes of interest. A (possibly empty) result list will be returned for each one.
Returns a reference to a hash mapping each incoming genome ID to a list of the IDs for the genomes which have existing correspondence maps on the server.
$genomeHash = { $genome1 => [$genome1a, $genome1b, ...], $genome2 => [$genome2a, $genome2b, ...], ... };
my $genomeHash = $sapObject->otu_members({ -ids => [$genome1, $genome2, ...] });
For each incoming genome, return the name and ID of each other genome in the same OTU.
The parameter shoudl be a reference to a hash with the following keys.
Reference to a list of the IDs for the genomes of interest.
Returns a reference to a hash mapping each incoming genome ID to a sub-hash. The sub-hash is keyed by genome ID, and maps the ID of each genome in the same OTU to its name.
$genomeHash = { $genome1 => { $genome1a => $name1a, $genome1b => $name1b, ... }, $genome2 => { $genome2a => $name2a, $genome2b => $name2b, ... }, ... };
my $genomeHash = $sapObject->representative({ -ids => [$genome1, $genome2, ...] });
Return the representative genome for each specified incoming genome ID. Genomes with the same representative are considered closely related, while genomes with a different representative would be considered different enough that similarities between them have evolutionary significance.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the IDs for the genomes of interest.
Returns a reference to a hash mapping each incoming genome ID to the ID of its representative genome.
$genomeHash = { $genome1 => $genome1R, $genome2 => $genome2R, ... };
my $mappings = $sapObject->representative_genomes();
Compute mappings for the genome sets (OTUs) in the database. This method will return a mapping from each genome to its genome set ID and from each genome set ID to a list of the genomes in the set. For the second mapping, the first genome in the set will be the representative.
This method does not require any parameters.
Returns a reference to a 2-tuple. The first element is a reference to a hash mapping genome IDs to genome set IDs; the second element is a reference to a hash mapping each genome set ID to a list of the genomes in the set. The first genome in each of these lists will be the set's representative.
$mappings = [ { $genome1 => $set1, $genome2 => $set2, ... }, { $set1 => [$genome1R, $genome1a, $genome1b, ...], $set2 => [$genome2R, $genome2a, $genome2b, ...], ... } ];
my $statusCode = $sapObject->submit_gene_correspondence({ -genome1 => $genome1, -genome2 => $genome2, -correspondences => $corrList, -passive => 1 });
Submit a set of gene correspondences to be stored on the server.
The parameter should be a reference to a hash with the following keys.
ID of the source genome for the correspondence.
ID of the target genome for the correspondence.
Reference to a list of lists containing the correspondence data (see "Gene Correspondence List" in ServerThing).
If TRUE, then the file will not be stored if one already exists. If FALSE, an existing correspondence file will be overwritten. The default is FALSE.
Returns TRUE (1
) if the correspondences were successfully stored, FALSE (0
) if they were rejected or an error occurred.
my $genomeHash = $sapObject->taxonomy_of({ -ids => [$genome1, $genome2, ...], -format => 'numbers' });
Return the taxonomy of each specified genome. The taxonomy will start at the domain level and moving down to the node where the genome is attached.
The parameter should be a reference to a hash with the following keys.
Reference to a list of genome IDs. A taxonomy will be generated for each specified genome.
Format for the elements of the taxonomy string. If numbers
, then each taxonomy element will be represented by its number; if names
, then each taxonomy element will be represented by its primary name; if both
, then each taxonomy element will be represented by a number followed by the name. The default is names
.
Returns a reference to a hash mapping incoming genome IDs to taxonomies. Each taxonomy will be a list of strings, starting from the domain and ending with the genome.
$genomeHash = { $genome1 => [$name1a, $name1b, ...], $genome2 => [$name2a, $name2b, ...], ... };
$genomeHash = { $genome1 => [$num1a, $num1b, ...], $genome2 => [$num2a, $num2b, ...], ... };
$genomeHash = { $genome1 => ["$num1a $name1a", "$num1b $name1b", ...], $genome2 => ["$num2a $name2a", "$num2b $name2b", ...], ... };
my $scenarioHash = $sapObject->scenario_names({ -subsystem => $subsys1 });
Return the names of all the scenarios for the specified subsystem. Each scenario has an internal ID number and a common name. This method returns both.
The parameter should be a reference to a hash with the following keys.
Name of the subsystem whose scenarios are desired.
Returns a hash mapping the ID numbers of the subsystem's scenarios to their common names.
$scenarioHash = { $id1 => $name1, $id2 => $name2, ... };
my $subsysHash = $sapObject->all_subsystems({ -usable => 1, -exclude => [$type1, $type2, ...], -aux => 1 });
Return a list of all subsystems in the system. For each subsystem, this method will return the ID, curator, the classifications, and roles.
The parameter should be a reference to a hash with the following keys, all of which are optional. Because all of the keys are optional, it is permissible to pass an empty hash or no parameters at all.
If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.
Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based
and experimental
. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable
option is turned off.
If TRUE, then auxiliary roles will be included in the output. The default is FALSE, meaning they will be excluded.
Returns a hash mapping each subsystem ID to a 3-tuple consisting of (0) the name of the curator, (1) a reference to a list of the subsystem classifications, and (2) a reference to a list of the subsystem's roles.
$subsysHash = { $sub1 => [$curator1, [$class1a, $class1b, ...], [$role1a, $role1b, ...]], $sub2 => [$curator2, [$class2a, $class2b, ...], [$role2a, $role2b, ...]], ... };
my $subsysHash = $sapObject->classification_of({ -ids => [$sub1, $sub2, ...] });
Return the classification for each specified subsystem.
Reference to a hash of parameters with the following possible keys.
Reference to a list of subsystem IDs.
Returns a hash mapping each incoming subsystem ID to a list reference. Each list contains the classification names in order from the largest classification to the most detailed.
$subsysHash = { $sub1 => [$class1a, $class1b, ...], $sub2 => [$class2a, $class2b, ...], ... };
my $genomeHash = $sapObject->genomes_to_subsystems({ -ids => [$genome1, $genome2, ...], -all => 1, -usable => 0, -exclude => ['cluster-based', 'experimental', ...] });
Return a list of the subsystems participated in by each of the specified genomes.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the genome IDs.
If TRUE, all subsystems will be returned, including those in which the genome does not appear to implement the subsystem and those in which the subsystem implementation is incomplete. The default is FALSE, in which case only subsystems that are completely implemented by the genome will be returned.
If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.
Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based
and experimental
. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable
option is turned off.
Returns a hash mapping each genome ID to a list of 2-tuples. Each 2-tuple will contain a subsystem name followed by a variant code.
$genomeHash = { $genome1 => [[$sub1a, $variantCode1a], [$sub1b, $variantCode1b], ...], $genome2 => [[$sub2a, $variantCode2a], [$sub2b, $variantCode2b], ...], ... };
my $subsysHash = $sapObject->get_subsystems({ -ids => [$sub1, $sub2, ...] });
Get a complete description of each specified subsystem. This will include the basic subsystem properties, the list of roles, and the spreadsheet.
The parameter should be a reference to a hash with the following keys.
Reference to a list of subsystem IDs.
Returns a reference to a hash mapping each incoming subsystem ID to a sub-hash that completely describes the subsystem. The keys for the sub-hash are as follows.
The name of the subsystem's curator.
The subsystem's current version number.
The text of the subsystem notes.
The description of the subsystem.
Reference to a list of 3-tuples, one for each role in the subsystem. Each 3-tuple will contain (0) the role abbreviation, (1) 1
if the role is auxiliary and 0
otherwise, and (2) the ID (name) of the role.
Reference to a list of 5-tuples. For each molecular machine implementing the subsystem, there is a 5-tuple containing (0) the target genome ID, (1) the relevant region string, (2) 1
if the molecular machine is curated and 0
if it was computer-assigned, (3) the variant code for the implemented variant, and (4) a reference to a list of sub-lists, one per role (in order), with each sub-list containing the IDs of all features performing that role.
$subsysHash = { $sub1 => { curator => $curator1, version => $version1, notes => $notes1, desc => $desc1, roles => [[$abbr1a, $aux1a, $role1a], [$abbr1b, $aux1b, $role1b], ... ], spreadsheet => [ [$genome1x, $region1x, $curated1x, $variant1x, [[$fid1xa1, $fid1xa2, ...], [$fid1xb1, $fid1xb2, ...], ...]], [$genome1y, $region1y, $curated1y, $variant1y, [[$fid1ya1, $fid1ya2, ...], [$fid1yb1, $fid1yb2, ...], ...]], ... ] }, $sub2 => { curator => $curator2, version => $version2, notes => $notes2, desc => $desc2, roles => [[$abbr2a, $aux2a, $role2a], [$abbr2b, $aux2b, $role2b], ... ], spreadsheet => [ [$genome2x, $region2x, $curated2x, $variant2x, [[$fid2xa1, $fid2xa2, ...], [$fid2xb1, $fid2xb2, ...], ...]], [$genome2y, $region2y, $curated1y, $variant1y, [[$fid1ya1, $fid1ya2, ...], [$fid1yb1, $fid1yb2, ...], ...]], ... ] },
my $subsysHash = $sapObject->ids_in_subsystems({ -subsystems => [$sub1, $sub2, ...], -genome => $genome1, -grouped => 1, -roleForm => 1, -source => 'UniProt' });
Return the features of each specified subsystems in the specified genome, or alternatively, return all features of each specified subsystem.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the IDs for the desired subsystems.
ID of the relevant genome, or all
to return the genes in all genomes for the subsystem. The default is all
.
If specified, then instead of being represented in a list, the feature IDs will be represented in a comma-delimited string.
If abbr
, then roles will be represented by the role abbreviation; if full
, then the role will be represented by its full name; if none
, then roles will not be included and there will only be a single level of hashing-- by subsystem ID. The default is abbr
.
Database source for the output IDs-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. The default is SEED
.
Returns a hash mapping each subsystem ID to a sub-hash. Each sub-hash maps the roles of the subsystem to lists of feature IDs. The roles are recorded as role abbreviations.
$subsysHash = { $sub1 => { $roleAbbr1A => [$fid1Ax, $fid1Ay, ...], $roleAbbr1B => [$fid1Bx, $fid1By, ...], ... }, $sub2 => { $roleAbbr2A => [$fid2Ax, $fid2Ay, ...], $roleAbbr2B => [$fid2Bx, $fid2By, ...], ... }, ... };
$subsysHash = { $sub1 => { $role1A => [$fid1Ax, $fid1Ay, ...], $role1B => [$fid1Bx, $fid1By, ...], ... }, $sub2 => { $role2A => [$fid2Ax, $fid2Ay, ...], $role2B => [$fid2Bx, $fid2By, ...], ... }, ... };
$subsysHash = { $sub1 => [$fid1a, $fid1b, ...], $sub2 => [$fid2a, $fid2b, ...], ... };
my $featureHash = $sapObject->ids_to_publications({ -ids => [$id1, $id2, ...], -source => 'UniProt' });
Return the PUBMED ID and title of each publication relevant to the specified feature IDs.
The parameter should be a reference to a hash with the following keys.
Reference to a list of feature IDs. Normally, these are FIG feature IDs (e.g. fig|100226.1.peg.3361
, fig|360108.3.peg.1041
), but other ID types are permissible if the source
parameter is overridden.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.
Returns a reference to a hash mapping feature IDs to lists of 2-tuples. Each 2-tuple consists of a PUBMED ID followed by a publication title.
$featureHash = { $id1 => [[$pub1a, $title1a], [$pub1b, $title1b], ...], $id2 => [[$pub2a, $title2a], [$pub2b, $title2b], ...], ... };
my $featureHash = $sapObject->ids_to_subsystems({ -ids => [$id1, $id2, ...], -usable => 0, -exclude => ['cluster-based', 'private', ...], -source => 'RefSeq', -subsOnly => 1 });
Return the subsystem and role for each feature in the incoming list. A feature may have multiple roles in a subsystem and may belong to multiple subsystems, so the role/subsystem information is returned in the form of a list of ordered pairs for each feature.
The parameter should be a reference to a hash with the following keys.
Reference to a list of feature IDs. Normally, these are FIG feature IDs (e.g. fig|100226.1.peg.3361
, fig|360108.3.peg.1041
), but other ID types are permissible if the source
parameter is overridden.
If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.
Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based
and experimental
. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable
option is turned off.
Database source of the IDs specified-- SEED
for FIG IDs, GENE
for standard gene identifiers, or LocusTag
for locus tags. In addition, you may specify RefSeq
, CMR
, NCBI
, Trembl
, or UniProt
for IDs from those databases. Use mixed
to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed
to allow IDs with prefixing indicating the ID type (e.g. uni|P00934
for a UniProt ID, gi|135813
for an NCBI identifier, and so forth). The default is SEED
.
ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.
If TRUE, instead of a list of (role, subsystem) 2-tuples, each feature ID will be mapped to a simple list of subsystem names. The default is FALSE.
Returns a reference to a hash mapping feature IDs to lists of 2-tuples. Each 2-tuple consists of a role name followed by a subsystem name. If a feature is not in a subsystem, it will not be present in the return hash.
$featureHash = { $id1 => [[$role1a, $sub1a], [$role1b, $sub1b], ...], $id2 => [[$role2a, $sub2a], [$role2b, $sub2b], ...], ... };
$featureHash = { $id1 => [$sub1a, $sub1b, ...], $id2 => [$sub2a, $sub2b, ...], ... };
my $featureHash = $sapObject->is_in_subsystem({ -ids => [$fid1, $fid2, ...], -usable => 0, -exclude => [$type1, $type2, ...] });
Return the subsystem and role for each specified feature.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the FIG feature IDs for the features of interest.
If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.
Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based
and experimental
. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable
option is turned off.
For backward compatibility, the parameter may also be a reference to a list of FIG feature IDs.
Returns a reference to a hash that maps each incoming feature ID to a list of 2-tuples, each 2-tuple consisting of (0) the ID of a subsystem containing the feature and (1) the feature's role in that subsystem. If an incoming feature is not in any subsystem, its ID will be mapped to an empty list.
$featureHash = { $fid1 => [[$sub1a, $role1a], [$sub1b, $role1b], ...], $fid2 => [[$sub2a, $role2a], [$sub2b, $role2b[, ...], ... };
In backward-compatible mode, returns a reference to a list of 3-tuples, each 3-tuple consisting of (0) a subsystem ID, (1) a role ID, and (2) the ID of a feature from the input list.
my $featureHash = $sapObject->is_in_subsystem_with({ -ids => [$fid1, $fid2, ...], -usable => 0, -exclude => [$type1, $type2, ...] });
For each incoming feature, returns a list of the features in the same genome that are part of the same subsystem. For each other feature returned, its role, functional assignment, subsystem variant, and subsystem ID will be returned as well.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the FIG feature IDs for the features of interest.
If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.
Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based
and experimental
. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable
option is turned off.
For backward compatibility, the parameter may also be a reference to a list of FIG feature IDs.
Returns a reference to a hash that maps each incoming feature ID to a list of 5-tuples relating to features in the same subsystem. Each 5-tuple contains (0) a subsystem ID, (1) a variant ID, (2) the related feature ID, (3) the related feature's functional assignment, and (4) the related feature's role in the subsystem.
$featureHash = { $fid1 => [[$sub1a, $variant1a, $fid1a, $function1a, $role1a], [$sub1b, $variant1b, $fid1b, $function1b, $role1b], ...], $fid2 => [[$sub2a, $variant2a, $fid2a, $function2a, $role2a], [$sub2b, $variant2b, $fid2b, $function2b, $role2b], ...], ... };
In backward-compatibility mode, returns a reference to a list of lists. Each sub-list contains 6-tuples relating to a single incoming feature ID. Each 6-tuple consists of a subsystem ID, a variant ID, the incoming feature ID, the other feature ID, the other feature's functional assignment, and the other feature's role in the subsystem.
my $roleHash = $sapObject->pegs_implementing_roles({ -subsystem => $subsysID, -roles => [$role1, $role2, ...] });
Given a subsystem and a list of roles, return a list of the subsystem's features for each role.
The parameter should be a reference to a hash with the following keys.
ID of a subsystem.
Reference to a list of roles.
For backward compatibility, the parameter can also be a reference to a 2-tuple consisting of (0) a subsystem ID and (1) a reference to a list of roles.
Returns a hash that maps each role ID to a list of the IDs for the features that perform the role in that subsystem.
$roleHash = { $role1 => [$fid1a, $fid1b, ...], $role2 => [$fid2a, $fid2b, ...], ... };
In backward-compatibility mode, returns a list of 2-tuples. Each tuple consists of a role and a reference to a list of the features in that role.
my $subsysHash = $sapObject->pegs_in_subsystems({ -genomes => [$genome1, $genome2, ...], -subsystems => [$sub1, $sub2, ...] });
This method takes a list of genomes and a list of subsystems and returns a list of the roles represented in each genome/subsystem pair.
Reference to a hash of parameter values with the following possible keys.
Reference to a list of genome IDs.
Reference to a list of subsystem IDs.
For backward compatibility, the parameter may also be a reference to a 2-tuple, the first element of which is a list of genome IDs and the second of which is a list of subsystem IDs.
Returns a reference to a hash of hashes. The main hash is keyed by subsystem ID. Each subsystem's hash is keyed by role ID and maps the role to a list of the feature IDs for that role in the subsystem that belong to the specified genomes.
$subsysHash = { $sub1 => { $role1A => [$fid1Ax, $fid1Ay, ...], $role1B => [$fid1Bx, $fid1By, ...], ... }, $sub2 => { $role2A => [$fid2Ax, $fid2Ay, ...], $role2B => [$fid2Bx, $fid2By, ...], ... }, ... };
In backward-compatibility mode, returns a list of 2-tuples. Each tuple consists of a subsystem ID and a second 2-tuple that contains a role ID and a reference to a list of the feature IDs for that role that belong to the specified genomes.
my $subsysHash = $sapObject->pegs_in_variants({ -genomes => [$genomeA, $genomeB, ...], -subsystems => [$sub1, $sub2, ...] });
This method takes a list of genomes and a list of subsystems and returns a list of the pegs represented in each genome/subsystem pair.
The main difference between this method and "pegs_in_subsystems" is in the organization of the output, which is more like a subsystem spreadsheet.
Reference to a hash of parameter values with the following possible keys.
Reference to a list of genome IDs. If the list is omitted, all genomes will be included in the output (which will be rather large in most cases).
Reference to a list of subsystem IDs.
Returns a reference to a hash mapping subsystem IDs to sub-hashes. Each sub-hash is keyed by genome ID and maps the genome ID to a list containing the variant code and one or more n-tuples, each n-tuple containing a role ID followed by a list of the genes in the genome having that role in the subsystem.
$subsysHash = { $sub1 => { $genomeA => [$vc1A, [$role1Ax, $fid1Ax1, $fid1Ax2, ...], [$role1Ay, $fid1Ay1, $fid1Ay2, ...], ...], $genomeB => [$vc1B, [$role1Bx, $fid1Bx1, $fid1Bx2, ...], [$role1By, $fid1By1, $fid1By2, ...], ...], ... }, $sub2 => { $genomeA => [$vc2A, [$role2Ax, $fid2Ax1, $fid2Ax2, ...], [$role2Ay, $fid2Ay1, $fid2Ay2, ...], ...], $genomeB => [$vc2B, [$role2Bx, $fid2Bx1, $fid2Bx2, ...], [$role2By, $fid2By1, $fid2By2, ...], ...], ... }, ... };
Note that in some cases the genome ID will include a region string. This happens when the subsystem has multiple occurrences in the genome.
my $rolesHash = $sapObject->roles_exist_in_subsystem({ -subsystem => $sub1, -roles => [$role1, $role2, ...] });
Indicate which roles in a given list belong to a specified subsystem.
The parameter should be a reference to a hash with the following keys.
The name of the subsystem of interest.
A reference to a list of role IDs.
Returns a reference to a hash mapping each incoming role ID to 1
if it exists in the specified subsystem and 0
otherwise.
$roleHash = { $role1 => $flag1, $role2 => $flag2, ... };
my $roleHash = $sapObject->({ -roles => [$role1, $role2, ...], -usable => 0 });
Return the subsystems containing each specified role.
The parameter should be a reference to a hash with the following keys.
Reference to a list of role names.
If TRUE, only usable subsystems will be returned. If FALSE, all subsystems will be returned. The defult is TRUE.
Returns a reference to a hash mapping each incoming role to a list of the names of subsystems containing that role.
$roleHash = { $role1 => [$sub1a, $sub1b, ...], $role2 => [$sub2a, $sub2b, ...], ... };
my $subHash = $sapObject->({ -subs => [$sub1, $sub2, ...], -genomes => [$genomeA, $genomeB, ...], ... });
Return the subsystem row for each subsystem/genome pair. A row in this case consists of a reference to a hash mapping role names to a list of the FIG feature IDs for the features in the genome performing that role.
In the Sapling database, a subsystem row is represented by the MolecularMachine entity. The strategy of this method is therefore to find the molecular machine for each subsystem/genome pair, and then use its ID to get the roles and features.
The parameter should be a reference to a hash with the following keys.
Returns a reference to a hash mapping each incoming subsystem ID to a sub-hash keyed by genome ID. In the sub-hash, each genome ID will map to a sub-sub-hash that maps role names to lists of feature IDs.
$subHash = { $sub1 => { $genomeA => { $role1Aa => [$fid1Aax, $fid1Aay, ... ], $role1Ab => [$fid1Abx, $fid1Aby, ... ], ... }, $genomeB => { $role1Ba => [$fid1Bax, $fid1Bay, ... ], $role1Bb => [$fid1Bbx, $fid1Bby, ... ], ... }, ... }, $sub2 => { $genomeA => { $role2Aa => [$fid2Aax, $fid2Aay, ... ], $role2Ab => [$fid2Abx, $fid2Aby, ... ], ... }, $genomeB => { $role2Ba => [$fid2Bax, $fid2Bay, ... ], $role2Bb => [$fid2Bbx, $fid2Bby, ... ], ... }, ... }, ... };
my $subsysHash = $sapObject->subsystem_data({ -ids => [$sub1, $sub2, ...], -field => 'version' });
For each incoming subsystem ID, return the specified data field. This method can be used to find the curator, description, or version of the specified subsystems.
The parameter should be a reference to a hash with the following keys.
Reference to a list of subsystem IDs.
Name of the desired data field-- curator
to retrieve the name of each subsystem's curator, version
to get the subsystem's version number, or description
to get the subsystem's description, or notes
to get the subsystem's notes. The default is description
.
Returns a hash mapping each incoming subsystem ID to the associated data value.
$subsysHash = { $sub1 => $value1, $sub2 => $value2, ... };
my $subHash = $sapObject->subsystem_genomes({ -ids => [$sub1, $sub2, ...], -all => 1 });
For each subsystem, return the genomes that participate in it and their associated variant codes.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the names of the subsystems whose genome information is desired.
If TRUE, then all genomes associated with the subsystem will be listed. The default is FALSE, meaning that only genomes that completely implement the subsystem will be listed.
Returns a reference to a hash that maps each subsystem ID to a sub-hash. Each sub-hash in turn maps the ID of each subsystem that participates in the subsystem to its variant code.
$subHash = { $sub1 => { $genome1a => $code1a, $genome1b => $code1b, ...}, $sub2 => { $genome2a => $code2a, $genome2b => $code2b, ...}, ... };
my $nameList = $sapObject->subsystem_names({ -usable => 0, -exclude => ['cluster-based', ...] });
Return a list of all subsystems in the database.
The parameter should be a reference to a hash with the following keys.
If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.
Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based
and experimental
. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable
option is turned off.
Returns a reference to a list of subsystem names.
$nameList = [$sub1, $sub2, ...];
my $subHash = $sapObject->subsystem_roles({ -ids => [$sub1, $sub2, ...], -aux => 1 });
Return the list of roles for each subsystem, in order.
Reference to a hash of parameters with the following possible keys.
Reference to a list of subsystem IDs.
If TRUE, auxiliary roles will be included. The default is FALSE, which excludes auxiliary roles.
If TRUE, then the role abbreviations will be included in the results. In this case, each subsystem name will be mapped to a list of 2-tuples, with each 2-tuple consisting of (0) the role name and (1) the role abbreviation. The default is FALSE (normal output).
Return a hash mapping each subsystem ID to a list of roles (normal) or a list of role/abbreviation pairs (extended output).
$subHash = { $sub1 => [$role1a, $role1b, ...], $sub2 => [$role2a, $role2b, ...], ... };
$subHash = { $sub1 => [[$role1a, $abbr1a], [$role1b, $abbr1b], ...], $sub2 => [[$role2a, $abbr2a], [$role2b, $abbr2b], ...], ... };
my $subsysHash = $sapObject->subsystem_spreadsheet({ -ids => [$sub1, $sub2, ...] });
This method takes a list of subsystem IDs, and for each one returns a list of the features in the subsystem. For each feature, it will include the feature's functional assignment, the subsystem name and variant (spreadsheet row), and its role (spreadsheet column).
Reference to a hash of parameters with the following possible keys.
Reference to a list of subsystem IDs.
For backward compatibility, this method can also accept a reference to a list of subsystem IDs.
Returns a hash mapping each incoming subsystem ID to a list of 4-tuples. Each tuple contains (0) a variant ID, (1) a feature ID, (2) the feature's functional assignment, and (3) the feature's role in the subsystem.
$subsysHash = { $sub1 => [[$variant1a, $fid1a, $function1a, $role1a], [$variant1b, $fid1b, $function1b, $role1b], ...], $sub2 => [[$variant2a, $fid2a, $function2a, $role2a], [$variant2b, $fid2b, $function2b, $role2b], ...], ... };
In backward-compatability mode, returns a list of 5-tuples. Each tuple contains (0) a subsystem ID, (1) a variant ID, (2) a feature ID, (3) the feature's functional assignment, and (4) the feature's role in the subsystem.
my $subsysHash = $sapObject->subsystem_type({ -ids => [$sub1, $sub2, ...], -type => 'cluster-based' });
For each incoming subsystem, return TRUE if it has the specified characteristic, else FALSE.
The parameter should be a reference to a hash with the following keys.
Reference to a list of subsystem names.
Name of the subsystem characteristic of interest. The default is usable
. The possible characteristics are
A cluster-based subsystem is one in which there is functional-coupling evidence that genes belong together, but we do not yet know what they do.
An experimental subsystem is designed for investigation and is not yet ready to be used in comparative analysis and annotation.
A private subsystem has valid data, but is not considered ready for general distribution.
An unusable subsystem is one that is experimental or is of such low quality that it can negatively affect analysis. A usable subsystem is one that is not unusable.
Returns a hash mapping the incoming subsystem names to TRUE/FALSE flags indicating the value of the specified characteristic.
$subsysHash = { $sub1 => $flag1, $sub2 => $flag2, ... };
my $roleHash = $sapObject->subsystems_for_role({ -ids => [$role1, $role2, ...], -usable => 1, -exclude => ['cluster-based', ...] });
For each role, return a list of the subsystems containing that role. The results can be filtered to include unusable subsystems or exclude subsystems of certain exotic types.
The parameter should be a reference to a hash with the following keys.
Reference to a list of the IDs of the roles of interest.
If TRUE, then subsystems in which the role is auxiliary will be included. The default is not to include such subsystems.
If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.
Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based
and experimental
. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable
option is turned off.
Returns a reference to a hash that maps each incoming role ID to a list of subsystem names.
$roleHash = { $role1 => [$ss1a, $ss1b, ...], $role2 => [$ss2a, $ss2b, ...], ... };