Documentation read from 05/19/2023 16:24:13 version of /vol/kmer-server-prod/FIGdisk.server.rhel6/dist/releases/dev/common/lib/FigKernelPackages/SAP.pm.

Sapling Server Function Object

Sapling Server Function Object

This file contains the functions and utilities used by the Sapling Server (sap_server.cgi). The various methods listed in the sections below represent function calls direct to the server. These all have a signature similar to the following.

    my $results = $sapObject->function_name($args);

where $sapObject is an object created by this module, $args is a parameter structure, and function_name is the Sapling Server function name. The output $results is a scalar, generally a hash reference, but sometimes a string or a list reference.

Location Strings

Several methods deal with gene locations. Location information from the Sapling server is expressed as location strings. A location string consists of a contig ID (which includes the genome ID), an underscore, a starting location, a strand indicator (+ or -), and a length. The first location on the contig is 1.

For example, 100226.1:NC_003888_3766170+612 indicates contig NC_003888 in genome 100226.1 (Streptomyces coelicolor A3(2)) beginning at location 3766170 and proceeding forward on the plus strand for 612 bases.

Constructor

Use

    my $sapObject = SAPserver->new();

to create a new sapling server function object. The server function object is used to invoke the "Primary Methods" listed below. See SAPserver for more information on how to create this object and the options available.

Primary Methods

Server Utility Methods

You will not use the methods in this section very often. Some are used by the server framework for maintenance and control purposes ("methods"), while others ("query" and "get") provide access to data in the database in case you need data not available from one of the standard methods.

methods

    my $methodList =        $sapObject->methods();

Return a reference to a list of the methods allowed on this object.

exists

    my $idHash =            $sapObject->exists({
                                -type => 'Genome',
                                -ids => [$id1, $id2, ...]
                            });

Return a hash indicating which of the specified objects of the given type exist in the database. This method is used as a general mechanism for finding what exists and what doesn't exist when you know the ID. In particular, you can use it to check for the presence or absence of subsystems, genomes, features, or FIGfams.

parameter

The parameter should be a reference to a hash with the following keys.

-type

The type of object whose existence is being queried. The type specification is case-insensitive: genome and Genome are treated the same. The permissible types are

Genome

Genomes, identified by taxon ID: 100226.1, 83333.1, 360108.3

Feature

Features (genes), identified by FIG ID: fig|100226.1.peg.3361, fig|360108.3.rna.4

Subsystem

Subsystem, identified by subsystem name: Arginine biosynthesis extended

FIGfam

FIGfam protein family, identified by ID: FIG000171, FIG001501

-ids

Reference to a list of identifiers for objects of the specified type.

RETURN

Returns a reference to a hash keyed by ID. For each incoming ID, it maps to 1 if an object of the specified type with that ID exists, else 0.

    $idHash = { $id1 => $flag1, $id2 => $flag2, ... };

get

    my $hashList =          $sapObject->get({
                                -objects => $objectNameString,
                                -filter => { $label1 => $criterion1, $label2 => $criterion2, ... },
                                -limit => $maxRows,
                                -fields => { $label1 => $name1, $label2 => $name2, ... },
                                -multiples => 'list',
                                -firstOnly => 1
                            });

Query the Sapling database. This is a variant of the "query" method in which a certain amount of power is sacrificed for ease of use. Instead of a full-blown filter clause, the caller specifies a filter hash that maps field identifiers to values.

parameter

The parameter should be a reference to a hash with the following keys.

-objects

The object name string listing all the entities and relationships in the query. See "Object Name List" in ERDB for more details.

-filter (optional)

Reference to a hash that maps field identifiers in "Standard Field Name Format" in ERDB to criteria. A criterion is either an object or scalar value (which is asserted as the value of the field), a 2-tuple consisting of a relational operator and a value (which is asserted to be in the appropriate relation to the field), or a sub-list consisting of the word IN and two or more values (which asserts that the field has one of the listed values). A record satisfies the filter if it satisfies all the criteria in the hash.

-limit (optional)

Maximum number of rows to return for this query. The default is no limit.

-fields (optional)

Reference to a hash mapping field identifiers to field names. In this case, the field identifier is a field name in "Standard Field Name Format" in ERDB and the field name is the key value that will be used for the field in the returned result hashes. If this parameter is omitted, then instead of a returning the results, this method will return a count of the number of records found.

-multiples (optional)

Rule for handling field values in the result hashes. The default option is smart, which maps single-valued fields to scalars and multi-valued fields to list references. If primary is specified, then all fields are mapped to scalars-- only the first value of a multi-valued field is retained. If list is specified, then all fields are mapped to lists.

-firstOnly (optional)

If TRUE, only the first result will be returned. In this case, the return value will be a hash reference instead of a list of hash references. The default is FALSE.

RETURN

Returns a reference to a list of hashes. Each hash represents a single record in the result set, and maps the output field names to the field values for that record. Note that if a field is multi-valued, it will be represented as a list reference.

    $hashList = [{ $label1 => $row1value1, $label2 => $row1value2, ... },
                 { $label1 => $row2value1, $label2 => $row2value2, ... },
                 ... ];

query

    my $rowList =           $sapObject->query({
                                -objects => $objectNameString,
                                -filterString => $whereString,
                                -limit => $maxRows,
                                -parameters => [$parm1, $parm2, ...],
                                -fields => [$name1, $name2, ...]
                            });

This method queries the Sapling database and returns a reference to a list of lists. The query is specified in the form of an object name string, a filter string, an optional list of parameter values, and a list of desired output fields. The result document can be thought of as a two-dimensional array, with each row being a record returned by the query and each column representing an output field.

This function buys a great deal of flexibility as the cost of ease of use. Before attempting to formulate a query, you will need to look at the ERDB documentation.

parameter

The parameter should be a reference to a hash with the following keys.

-objects

The object name string listing all the entities and relationships in the query. See "Object Name List" in ERDB for more details.

-filterString

The filter string for the query. It cannot contain a LIMIT clause, but can otherwise be anything described in "Filter Clause" in ERDB.

-limit (optional)

Maximum number of rows to return for this query. The default is 1000. To make an unlimited query, specify none.

-parameters (optional)

Reference to a list of parameter values. These should be numbers or strings, and are substituted for any parameter marks in the query on a one-for-one basis. See also "Parameter List" in ERDB.

-fields

Reference to a list containing the names of the desired output fields.

RETURN

Returns a reference to a list of lists. Each row corresponds to a database result row, and each column corresponds to one of the incoming output fields. Note that some fields contain complex PERL data structures, and fields that are multi-valued will contain sub-lists.

    $rowList = [[$row1field1, $row1field2, ...],
                [$row2field1, $row2field2, ...],
                [$row3field1, $row3field2, ...],
                ... ];

select

    my $listList =          $sapObject->select({
                                -path => $objectNameString,
                                -filter => { $field1 => $list1, $field2 => $list2, ... },
                                -fields => [$fieldA, $fieldB, ... ],
                                -limit => $maxRows,
                                -multiples => 'list'
                            });

Query the Sapling database. This is a variant of the "get" method in which a further amount of power is sacrificed for ease of use. The return is a list of lists, and the criteria are always in the form of lists of possible values.

parameter

The parameter should be a reference to a hash with the following keys.

-path

The object name string listing all the entities and relationships in the query. See "Object Name List" in ERDB for more details.

-filter (optional)

Reference to a hash that maps field identifiers in "Standard Field Name Format" in ERDB to lists of permissible values. A record matches the filter if the field value matches at least one element of the list.

-fields

Reference to a list of field names in "Standard Field Name Format" in ERDB.

-limit (optional)

Maximum number of rows to return for this query. The default is no limit.

-multiples (optional)

Rule for handling field values in the result hashes. The default option is smart, which maps single-valued fields to scalars and multi-valued fields to list references. If primary is specified, then all fields are mapped to scalars-- only the first value of a multi-valued field is retained. If list is specified, then all fields are mapped to lists.

RETURN

Returns a reference to a list of lists. Each sub-list represents a single record in the result set, and contains the field values in the order the fields were lists in the -fields parameter. Note that if a field is multi-valued, it will be represented as a list reference.

    $listList = [[$row1value1, $row1value2, ... ], [$row2value1, $row2value2, ...], ... ];

Annotation and Assertion Data Methods

equiv_precise_assertions

    my $idHash =            $sapObject->equiv_precise_assertions({
                                -ids => [$id1, $id2, ...]
                            });

Return the assertions for all genes in the database that match the identified gene. The gene can be specified by any prefixed gene identifier (e.g. uni|AYQ44, gi|85841784, or fig|360108.3.peg.1041).

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of gene identifiers.

For backward compatibility, the parameter can also be a reference to a list of gene identifiers.

RETURN

Returns a reference to a hash that maps each incoming ID to a list of 4-tuples. Each 4-tuple contains (0) an identifier that is for the same gene as the input identifier, (1) the asserted function of that identifier, (2) the source of the assertion, and (3) a flag that is TRUE if the assertion is by a human expert.

    $idHash = { $id1 => [$otherID1, $function1, $source1, $flag1],
                $id2 => [$otherID2, $function2, $source2, $flag2],
                ... };

In backward-compatibility mode, returns a reference to a list of 2-tuples. Each 2-tuple consists of an incoming ID and the list of 4-tuples with the asserted function information.

equiv_sequence_assertions

    my $idHash =            $sapObject->equiv_sequence_assertions({
                                -ids => [$id1, $id2, ...]
                            });

Return the assertions for all genes in the database that match the identified protein sequences. A protein sequence can be identified by a protein MD5 code or any prefixed gene identifier (e.g. uni|AYQ44, gi|85841784, or fig|360108.3.peg.1041).

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of protein identifiers. Each identifier should be a prefixed gene identifier or the (optionally) prefixed MD5 of a protein sequence.

RETURN

Returns a reference to a hash mapping each incoming protein identifier to a list of 5-tuples, consisting of (0) an identifier that is sequence-equivalent to the input identifier, (1) the asserted function of that identifier, (2) the source of the assertion, (3) a flag that is TRUE if the assertion is by an expert, and (4) the name of the genome relevant to the identifer (if any).

    $idHash = { $id1 => [$otherID1, $function1, $source1, $flag1],
                $id2 => [$otherID2, $function2, $source2, $flag2],
                ... };

feature_assignments

    my $featureHash =       $sapObject->feature_assignments({
                                -genome => $genomeID,
                                -type => 'peg',
                                -hypothetical => 1
                            });

Return all features of the specified type for the specified genome along with their assignments.

parameter

The parameter should be a reference to a hash with the following keys.

-genome

ID of the genome whose features are desired.

-type (optional)

If specified, the type of feature desired (peg, rna, etc.). If omitted, all features will be returned.

-hypothetical (optional)

If 1, only hypothetical genes will be returned; if 0, only non-hypothetical genes will be returned. If undefined or not specified, all genes will be returned.

RETURN

Returns a hash mapping the ID of each feature in the specified genome to its assignment.

    $featureHash = { $fid1 => $function1, $fid2 => $function2, ... };

ids_to_assertions

    my $idHash =            $sapObject->ids_to_assertions({
                                -ids => [$id1, $id2, ...]
                            });

Return the assertions associated with each prefixed ID.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of prefixed feature IDs (e.g. gi|17017961, NP_625335.1, fig|360108.3.peg.1041). The assertions associated with each particular identifier will be returned. In this case, there will be no processing for equivalent IDs. For that, you should use equiv_sequence_assertions or equiv_precise_assertions.

RETURN

Returns a reference to a hash mapping every incoming ID to a list of 3-tuples, each consisting of (0) an asserted function, (1) the source of the assertion, and (2) a flag that is TRUE if the assertion was made by an expert.

    $idHash = { $id1 => [[$assertion1a, $source1a, $expert1a],
                         [$assertion1b, $source1b, $expert1b], ...],
                $id2 => [[$assertion2a, $source2a, $expert2a],
                         [$assertion2b, $source2b, $expert2b], ...],
                ... };

ids_to_annotations

    my $idHash =            $sapObject->ids_to_annotations({
                                -ids => [$id1, $id2, ...]
                            });

Return the annotations associated with each prefixed ID. Annotations are comments attached to each feature (gene), and include past functional assignments as well as more general information.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of feature IDs.

-source (optional)

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

RETURN

Returns a reference to a hash mapping every incoming ID to a list of 3-tuples, each consisting of (0) annotation text, (1) the name of the annotator, and (2) the timestamp of the annotation (as a number of seconds since the epoch).

    $idHash = { $id1 => [[$annotation1a, $name1a, $time1a],
                         [$annotation1b, $name1b, $time1b], ...],
                $id2 => [[$annotation2a, $name2a, $time2a],
                         [$annotation2b, $name2b, $time2b], ...],
                ... };

ids_to_functions

    my $featureHash =       $sapObject->ids_to_functions({
                                -ids => [$id1, $id2, ...],
                                -source => 'CMR'
                                -genome => $genome
                            });

Return the functional assignment for each feature in the incoming list.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of feature IDs.

-source (optional)

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

-genome (optional)

ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return genes for all genomes.

RETURN

Returns a reference to a hash mapping each feature ID to the feature's current functional assignment. Features that do not exist in the database will not be present in the hash. For IDs that correspond to multiple features, only one functional assignment will be returned.

    $featureHash = { $id1 => $function1,
                     $id2 => $function2,
                     ...};

occ_of_role

    my $roleHash =          $sapObject->occ_of_role({
                                -roles => [$role1, $role2, ...],
                                -functions => [$function3, $function4, ...],
                                -genomes => [$genome1, $genome2, ...],
                            });

Search for features in a specified genome with the indicated roles or functions.

parameter

The parameter should be a reference to a hash with the following keys.

-roles (optional)

Reference to a list of the roles to search for.

-functions (optional)

Reference to a list of the functional assignments to search for.

-genomes (optional)

ID of the genomes whose genes are to be searched for the specified roles and assignments.

RETURN

Returns a reference to a hash that maps each specified role ID or functional assignment to a list of the FIG IDs of genes that have that role or assignment.

    $roleHash = { $role1 => [$fid1a, $fid1b, ...],
                  $role2 => [$fid2a, $fid2b, ...],
                  $function3 => [$fid3a, $fid3b, ...],
                  $function4 => [$fid4a, $fid4b, ...],
                  ... };

Chemistry Methods

all_complexes

    my $complexList =       $sapObject->all_complexes();

Return a list of all the complexes in the database.

RETURN

Returns a reference to a list of complex IDs.

    $complexList = [$cpx1, $cpx2, ...]

all_models

    my $modelHash =         $sapObject->all_models();

Return a hash of all the models in the database, mapping each one to the relevant genome.

RETURN

Returns a reference to a hash that maps each model ID to a genome ID.

    $modelHash = { $model1 => $genome1, $model2 => $genome2, ... };

all_reactions

    my $reactions =         $sapObject->all_reactions();

Return a list of all the reactions in the database.

RETURN

Returns a reference to a list of all the reactions.

    $reactions = [$rx1, $rx2, ...];

all_roles_used_in_models

    my $rolesList =         $sapObject->all_roles_used_in_models();

Return a list of all the roles used in models.

RETURN

Returns a reference to a list of role names. Each named role triggers a complex used in at least one reaction belonging to a model.

    $rolesList = [$role1, $role2, ...]

complex_data

    my $complexHash =       $sapObject->complex_data({
                                -ids => [$cpx1, $cpx2, ...],
                                -data => [$fieldA, $fieldB, ...]
                            });

Return the specified data items for each incoming reaction complex.

parameter

Reference to hash with the following keys.

-ids

Reference to a list of the IDs of reaction complexes of interest.

-data

Reference to a list of the names of the data items desired for each of the specified complexes.

name

Name of the complex (or undef if the complex is nameless).

reactions

Reference to a list of the reactions in the complex.

roles

Reference to a list of 2-tuples for the roles in the complex, each containing (0) the role name, and (1) a flag that is TRUE if the role is optional to trigger the complex and FALSE if it is necessary.

RETURN

Returns a reference to a hash mapping each incoming complex to an n-tuple containing the desired data fields in the order specified.

    $complexHash = { $cpx1 => [$data1A, $data1B, ...],
                     $cpx2 => [$data2A, $data2B, ...]
                     ... };

coupled_reactions

    my $reactionHash =      $sapObject->coupled_reactions({
                                -ids => [$rx1, $irx2, ...]
                            });

For each of a set of reactions, get the adjacent reactions in the metabolic network. Two reactions are considered adjacent if they share at least one compound that is neither a cofactor or a ubiquitous compound (like water or oxygen). The compounds that relate the adjacent reactions are called the connecting compounds. In most cases, each pair of adjacent reactions will have only one connecting compound, but this is not guaranteed to be true.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of reaction IDs.

RETURN

Returns a reference to a hash mapping each reaction ID to a sub-hash. Each sub-hash maps adjacent reactions to the relevant connecting compounds.

    $reactionHash = { $rx1 => { $rx1a => [$cpd1ax, $cpd1ay, ...],
                                $rx1b => [$cpd1bx, $cpd1by, ...],
                     ...};

models_to_reactions

    my $modelHash =         $sapObject->models_to_reactions({
                                -ids => [$model1, $model2, ...]
                            });

Return the list of reactions in each specified model.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of model IDs, indicating the models of interest.

RETURN

Returns a reference to a hash that maps each model ID to a list of the reactions in the model.

    $modelHash = { $model1 => [$rx1a, $rx1b, ...],
                   $model2 => [$rx2a, $rx2b, ...],
                   ... };

reaction_neighbors

    my $reactionHash =      $sapObject->reactionNeighbors({
                                -ids => [$rx1, $rx2, ...],
                                -depth => 1
                            });

Return a list of the reactions in the immediate neighborhood of the specified reactions. A separate neighborhood list will be generated for each incoming reaction; the neighborhood will consist of reactions connected to the incoming reaction and reactions connected to those reactions up to the specified depth. (Two reactions are connected if they have a compound in common that is not a cofactor or a ubiquitous chemical like water or ATP).

parameter

The parameter should be a reference to a hash with the following keys:

-ids

Reference to a list of IDs for the reactions of interest.

-depth (optional)

Number of levels to which the neighborhood search should take place. If the depth is n, then the neighborhood will consist of the original reaction and every other reaction for which there is a sequence of n+1 or fewer reactions starting with the original and ending with the other reaction. Thus, if n is zero, the original reaction is returned as a singleton. If n is 1, then the neighborhood is the original reaction and every reaction connected to it. The default is 2.

RETURN

Returns a reference to a hash mapping each incoming reaction to a sub-hash. The sub-hash maps each reaction in the neighborhood to its distance from the original reaction.

    $reactionHash = { $rx1 => { $rx1a => $dist1a, $rx1b => $dist1b, ... },
                      $rx2 => { $rx2a => $dist2a, $rx2b => $dist2b, ... },
                      ... };

reaction_path

    my $reactionList =      $sapObject->reaction_path({
                                -roles => [$role1, $role2, ...],
                                -maxLength => 10
                            });

Find the shortest reaction path that represents as many of the specified roles as possible. Note that since the a reaction may be associated with multiple roles, it is possible for a single role to be represented more than once in the path.

The search is artificially limited to paths under a maximum length that can be specified in the parameters.

parameter

The parameter should be a reference to a hash with the following keys.

-roles

Reference to a list of the roles to be covered by the reaction path.

-maxLength (optional)

Maximum number of reactions to allow in the reaction path. The default is two more than the number of roles.

RETURN

Returns a reference to a list of the best reaction paths. Each reaction path is represented by a list of lists, the sub-lists containing the reaction IDs followed by the roles represented by the reaction. The paths returned will be the shortest ones found with the minimal number of missing roles.

    $reactionList = [
                     [[$rxn1a, $role1ax, $role1ay, ...], [$rxn1b, $role1bx, $role1by, ...], ...],
                     [[$rxn2a, $role2ax, $role2ay, ...], [$rxn2b, $role2bx, $role2by, ...], ...],
                     ... ];

reaction_strings

    my $reactionHash =      $sapObject->reaction_strings({
                                -ids => [$rx1, $rx2, ...],
                                -roles => 1,
                                -names => 1
                            });

Return the display string for each reaction. The display string contains the compound IDs (as opposed to the atomic formulas) and the associated stoichiometries, with the substrates on the left of the arrow and the products on the right.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of IDs for the reactions of interest.

-roles (optional)

If TRUE, then each reaction string will be associated with a list of the reaction's roles in the result. The default is FALSE.

-names (optional)

If 1, then the compound name will be included with the ID in the output. If only, the compound name will be included instead of the ID. If 0, only the ID will be included. The default is 0.

RETURN

Returns a reference to a hash mapping each reaction ID to a displayable string describing the reaction. If -roles is TRUE, then instead of a string, the hash will map each reaction ID to a list consisting of the string followed by the roles associated with the reaction.

-roles FALSE
    $reactionHash = { $rx1 => $string1, $rx2 => $string2, ... }
-roles TRUE
    $reactionHash = { $rx1 => [$string1, $role1a, $role1b, ...],
                      $rx2 => [$string2, $role2a, $role2b, ...],
                      ...
                    }

reactions_to_complexes

    my $reactionHash =      $sapObject->reactions_to_complexes({
                                -ids => [$rxn1, $rxn2, ...]
                            });

Return the complexes containing each reaction. Note that most reactions are in more than one complex, so the complexes for each reaction are returned as a list.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of reaction IDs for the reactions of interest.

RETURN

Returns a reference to a hash mapping each incoming reaction to a list of the associated complexes.

    $reactionHash = { $rxn1 => [$cpx1a, $cpx1b, ...],
                      $rxn2 => [$cpx2a, $cpx2b, ...],
                      ...
                    };

reactions_to_roles

    my $reactionHash =      $sapObject->reactions_to_roles({
                                -ids => [$rx1, $rx2,...]
                            });

Return the roles associated with each reaction.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of reaction IDs for the reactions of interest.

RETURN

Returns a reference to a hash mapping each incoming reaction to a list of the associated roles.

    $reactionHash = { $rx1 => [$role1a, $role1b, ...],
                      $rx2 => [$role2a, $role2b, ...],
                      ...
                    };

role_neighbors

    my $roleHash =          $sapObject({
                                -ids => [$role1, $role2, ...]
                            });

For each role, return a list of roles in the immediate chemical neighborhood. A role is in the immediate chemical neighborhood of another role if the two roles are associated with reactions that share a compound that is not ubiquitous or a cofactor.

parameter

The parameter should be a reference to a hash with the following keys:

-ids

Reference to a list of role names.

RETURN

Returns a reference to a hash that maps each incoming role name to a list of the names of the neighboring roles.

    $roleHash = { $role1 => [$role1a, $role1b, ...],
                  $role2 => [$role2a, $role2b, ...],
                  ... };

role_reactions

    my $roleHash =          $sapObject->role_reactions({
                                -ids => [$role1, $role2, ...],
                                -formulas => 1
                            });

Return a list of all the reactions associated with each incoming role.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of role IDs for the roles of interest.

-formulas (optional)

If TRUE, then each reaction will be associated with its formula. The default is FALSE, in which case for each role a simple list of reactions is returned.

RETURN

Returns a reference to a hash, keyed by role ID. If -formulas is FALSE, then each role will map to a list of reaction IDs. If -formulas is TRUE, then each role maps to a sub-hash keyed by reaction ID. The sub-hash maps each reaction to a chemical formula string with compound IDs in place of the chemical labels.

-formulas FALSE
    $roleHash = { $role1 => [$rxn1a, $rxn1b, ...],
                  $role2 => [$rxn2a, $rxn2b, ...},
                  ... };
-formulas TRUE
    $roleHash = { $role1 => { $rx1a => "$s1a1*$cpd1a1 + $s1a2*$cpd1a2 + ... => $s1ax*$cpd1ax + $s1ay*$cpd1ay + ...",
                              $rx1b => "$s1b1*$cpd1b1 + $s1b2*$cpd1b2 + ... => $s1bx*$cpd1bx + $s1by*$cpd1by + ...",
                              ... },
                  $role2 => { $rx2a => "$s2a1*$cpd2a1 + $s2a2*$cpd2a2 + ... => $s2ax*$cpd2ax + $s2ay*$cpd2ay + ...",
                              $rx2b => "$s2b1*$cpd2b1 + $s2b2*$cpd2b2 + ... => $s2bx*$cpd2bx + $s2by*$cpd2by + ...",
                              ... },
                 ... };

roles_to_complexes

    my $roleHash =          $sapObject->roles_to_complexes({
                                -ids => [$role1, $role2, ...],
                            });

Return the complexes (sets of related reactions) associated with each role in the incoming list. Roles trigger many complexes, and a complex may be triggered by many roles. A given role is considered either optional or necessary to the complex, and an indication of this will be included in the output.

parameter

The parameter should be a reference to a hash with the following keys:

-ids

Reference to a list of the IDs for the roles of interest.

RETURN

Returns a reference to a hash mapping each incoming role ID to a list of 2-tuples, each consisting of (0) a complex ID, and (1) a flag that is TRUE if the role is optional and FALSE if the role is necessary for the complex to trigger.

    $roleHash = { $role1 => [[$complex1a, $flag1a], [$complex1b, $flag1b], ...],
                  $role2 => [[$complex2a, $flag2a], [$complex2b, $flag2b], ...],
                  ... };

DNA and Protein Sequence Methods

dlits_for_ids

    my $idHash =            $sapObject->dlits_for_ids({
                                -ids => [id1,id2,...],
                                -full => 1
                            });

Find the PUBMED literature references for a list of proteins. The proteins can be specified either was FIG feature IDs or protein sequence MD5s.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of gene and protein IDs. For each gene, literature references will be returned for the feature's protein. For each protein, the literature references for the protein will be returned. Genes should be specified using FIG feature IDs and proteins using the MD5 of the protein sequence.

-full (optional)

If TRUE, then in addition to each literature article's PUBMED ID, the article title and URL will be returned. (NOTE: these will not always be available). The default is FALSE.

RETURN

Returns a reference to a hash that maps each incoming ID to a list of publications. The publications will normally be represented by PUBMED IDs, but if -full is TRUE, then each will be represented by a 3-tuple consisting of (0) the PUBMED ID, (1) the article title, and (2) the article URL.

-full = FALSE
    $idHash = { $id1 => [$pubmed1a, $pubmed1b, ...],
                $id2 => [$pubmed2a, $pubmed2b, ...],
                ...
    };
-full = TRUE
    $idHash = { $id1 => [[$pubmed1a, $title1a, $url1a],
                         [$pubmed1b, $title1b, $url1b], ...],
                $id2 => [[$pubmed2a, $title2a, $url2a],
                         [$pubmed2b, $title2b, $url2b], ...],
                ...
    };

equiv_ids_for_sequences

    my $labelHash =         $sapObject->equiv_ids_for_sequences({
                                -seqs => [[$label1, $comment1, $sequence1],
                                          [$label2, $comment2, $sequence2], ...]
                            });

Find all the identifiers in the database that produce the specified proteins.

parameter

The parameter should be a reference to a hash with the following keys.

-seqs

Reference to a list of protein specifications. A protein specification can be a FASTA string, a 3-tuple consisting of (0) a label, (1) a comment, and (2) a protein sequence, OR a 2-tuple consisting of (0) a label and (1) a protein sequence. In other words, each specification can be a raw FASTA string, a parsed FASTA string, or a simple [id, sequence] pair. In every case, the protein sequence will be used to find identifiers and the label will be used to identify the results.

RETURN

Returns a hash mapping each incoming label to a list of identifiers from the database that name the protein or a feature that produces the protein.

    $labelHash = { $label1 => [$id1a, $id1b, ...],
                   $label2 => [$id2a, $id2b, ...],
                   ... };

find_closest_genes

    my $nameHash =          $sapObject->find_closest_genes({
                                -genome => $genome1,
                                -seqs => { $name1 => $seq1,
                                           $name2 => #seq2,
                                           ... },
                                -protein => 1
                            });

Find the closest genes to the specified sequences in the specified genome.

Each indicated sequence will be converted to a DNA sequence and then the contigs of the specified genome will be searched for the sequence. The genes in closest proximity to the sequence will be returned. The sequences are named; in the return hash, the genes found will be associated with the appropriate sequence name.

parameter

The parameter should be a reference to a hash with the following keys.

-genome

ID of the genome to search.

-seqs

Reference to a hash mapping names to sequences. The names will be used to associate the genes found with the incoming sequences. DNA sequences should not contain ambiguity characters.

protein (optional)

If TRUE, the sequences will be interpreted as protein sequences. If FALSE, the sequences will be interpreted as DNA sequences.

RETURN

Returns a reference to a hash mapping each sequence name to a list of 3-tuples, each consisting of (0) a gene ID, (1) the location of the gene, and (2) the location of the matching sequence.

    $nameHash = { $name1 => [[$fid1a, $loc1a, $match1a],
                             [$fid1b, $loc1b, $match1b], ...],
                  $name2 => [[$fid2a, $loc2a, $match2a],
                             [$fid2b, $loc2b, $match2b], ...],
                  ... }

ids_to_sequences

    my $idHash =            $sapObject->ids_to_sequences({
                                -ids => [$id1, $id2, ...],
                                -protein => 1,
                                -fasta => 1,
                                -source => 'LocusTag',
                                -genome => $genome,
                                -comments => { $id1 => $comment1,
                                               $id2 => $comment2,
                                               ... }
                            });

Compute a DNA or protein string for each incoming feature ID.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of feature IDs.

-source (optional)

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

-genome (optional)

ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for all genomes.

-protein (optional)

If TRUE, the output FASTA sequences will be protein sequences; otherwise, they will be DNA sequences. The default is FALSE.

-fasta (optional)

If TRUE, the output sequences will be multi-line FASTA strings instead of sequences. The default is FALSE, meaning the output sequences will be ordinary strings.

-comments (optional)

Allows the user to add a label or description to each FASTA formatted sequence. The values is a reference to a hash whose keys are the ids, and the values are the desired labels. This parameter is only used when the -fasta option is specified.

RETURN

Returns a hash mapping the incoming IDs to sequence strings. IDs that are not found in the database will not appear in the hash.

    $idHash = { $id1 => $sequence1, $id2 => $sequence2, ... };

locs_to_dna

    my $locHash =           $sapObject->locs_to_dna({
                                -locations => {
                                    $label1 => $loc1,
                                    $label2 => $loc2,
                                    ... },
                                -fasta => 1
                                });

Return the DNA sequences for the specified locations.

parameter

The parameter should be a reference to a hash with the following keys.

-locations

Reference to a hash that maps IDs to locations. A location can be in the form of a "Location String", a reference to a list of location strings, a FIG feature ID, or a contig ID.

-fasta (optional)

If TRUE, the DNA sequences will be returned in FASTA format instead of raw format. The default is FALSE.

RETURN

Returns a reference to a hash that maps the incoming IDs to FASTA sequences for the specified DNA locations. The FASTA ID will be the ID specified in the incoming hash.

    $locHash = { $label1 => $sequence1,
                 $label2 => $sequence2,
                 ... };

roles_to_proteins

    my $roleHash =          $sapObject->roles_to_proteins({
                                -roles => [$role1, $role2, ...]
                            });

Return a list of the proteins associated with each of the incoming functional roles.

parameter

The parameter should be a reference to a hash with the following keys.

-roles

Reference to a list of functional roles.

RETURN

Returns a reference to a hash mapping each incoming role to a list of the proteins generated by features that implement the role. The proteins will be represented by MD5 protein IDs.

    $roleHash = { $role1 => [$prot1a, $prot1b, ...],
                  $role2 => [$prot2a, $prot2b, ...],
                  ... };

upstream

    my $featureHash =       $sapObject->upstream({
                                -ids => [$fid1, $fid2, ...],
                                -size => 200,
                                -skipGene => 1,
                                -fasta => 1,
                                -comments => { $fid1 => $comment1,
                                               $fid2 => $comment2, ...}
                            });

Return the DNA sequences for the upstream regions of the specified features. The nucleotides inside coding regions are displayed in upper case; others are displayed in lower case.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs of interest.

-size (optional)

Number of upstream nucleotides to include in the output. The default is 200.

-skipGene (optional)

If TRUE, only the upstream region is included. Otherwise, the content of the feature is included in the output.

-fasta (optional)

If TRUE, the output sequences will be multi-line FASTA strings instead of sequences. The default is FALSE, meaning the output sequences will be ordinary strings.

-comments (optional)

Allows the user to add a label or description to each FASTA formatted sequence. The values is a reference to a hash whose keys are the ids, and the values are the desired labels. This parameter is only used when the -fasta option is specified.

RETURN

Returns a hash mapping each incoming feature ID to the DNA sequence of its upstream region.

    $featureHash = { $fid1 => $sequence1, $fid2 => $sequence2, ... };

Expression Data Methods

all_experiments

    my $expList =           $sapObject->all_experiments();

Return a list of all the experiment names.

RETURN

Returns a reference to a list of experiment names.

    $expList = [$exp1, $exp2, ...];

atomic_regulon_vectors

    my $regulonHash =       $sapObject->atomic_regulon_vectors({
                                -ids => [$ar1, $ar2, ...],
                                -raw => 0
                            });

Return a map of the expression levels for each specified atomic regulon. The expression levels will be returned in the form of vectors with values -1 (suppressed), 1 (expressed), or 0 (unknown) in each position. The positions will correspond to the experiments in the order returned by "genome_experiments".

parameter

The parameter should be a reference to a hash with the following key.

-ids

Reference to a list of atomic regulon IDs.

-raw (optional)

If TRUE, then the vectors will be returned in the form of strings. Each string will have the character +, -, or space for the values 1, -1, and 0 respectively.

RETURN

Returns a reference to a hash mapping the incoming atomic regulon IDs to the desired vectors. The vectors will normally be references to lists of values pf 1, 0, and -1, but they can also be represented as strings.

Normal Output
    $regulonHash = { $ar1 => [$level1a, $level2a, ...],
                     $ar2 => [$level2a, $level2b, ...],
                     ... };
Output if -raw is TRUE
    $regulonHash = { $ar1 => $string1, $ar2 => $string2, ... };

atomic_regulons

    my $regulonHash =       $sapObject->atomic_regulons({
                                -id => $genome1
                            });

Return a map of the atomic regulons for the specified genome. Each atomic regulon is a set of genes that are always regulated together. The map will connect each regulon ID to a list of those genes. A given gene can only be in one atomic regulon.

parameter

The parameter should be a reference to a hash with the following key.

-id

The ID of the genome of interest.

RETURN

Returns a reference to a hash that maps each atomic regulon ID to a list of the FIG IDs of its constituent genes.

    $regulonHash = { $regulon1 => [$fid1a, $fid1b, ...],
                     $regulon2 => [$fid2a, $fid2b, ...],
                     ... };

coregulated_correspondence

    my $fidHash =           $sapObject->coregulated_correspondence({
                                -ids => [$fid1, $fid2, ...],
                                -pcLevel => 0.8,
                                -genomes => [$genome1, $genome2, ...]
                            });

Given a gene, return genes that may be coregulated because they correspond to coregulated genes in genomes for which we have expression data (an expression-analyzed genome). For each incoming gene, a corresponding gene will be found in each expression-analyzed genome. The coregulated genes for the corresponding gene will be determined, and then these will be mapped back to the original genome. The resulting genes can be considered likely candidates for coregulation in the original genome.

parameter

The parameter should be a reference to a hash with the following key.

-ids

Reference to a list of FIG feature IDs.

-pcLevel (optional)

Minimum pearson coefficient level for a gene to be considered coregulated. The default is 0.5.

-genomes (optional)

Reference to a list of genome IDs. If specified, only expression data from the listed genomes will be used in the analysis; otherwise, all genomes with expression data will be used.

RETURN

Returns a reference to a hash that maps each incoming gene to a list of 4-tuples, each 4-tuple consisting of (0) a hypothetical coregulated gene in this genome, (1) a gene in an expression-analyzed genome corresponding to the input gene, (2) a gene in the expression-analyzed genome coregulated with it (and that corresponds to the hypothetical coregulated gene), and (3) the correlation score.

    $fidHash = { $fid1 => [[$fid1a, $fid1ax, $fid1ay, $score1a],
                           [$fid1b, $fid1bx, $fid1by, $score1b],
                           ...],
                 $fid2 => [[$fid2a, $fid2ax, $fid2ay, $score2a],
                           [$fid2b, $fid2bx, $fid2by, $score2b],
                           ...],
                 ... };

coregulated_fids

    my $fidHash =           $sapObject->coregulated_fids({
                                -ids => [$fid1, $fid2, ...]
                            });

Given a gene, return the coregulated genes and their pearson coefficients. Two genes are considered coregulated if there is some experimental evidence that their expression levels are related: the pearson coefficient indicates the strength of the relationship.

parameter

The parameter should be a reference to a hash with the following key.

-ids

Reference to a list of FIG feature IDs.

RETURN

Returns a reference to a hash that maps each incoming FIG ID to a sub-hash. The sub-hash in turn maps each related feature's FIG ID to its pearson coefficient with the incoming FIG ID.

    $fidHash = { $fid1 => { $fid1a => $coeff1a, $fid1b => $coeff1b, ...},
                 $fid2 => { $fid2a => $coeff2a, $fid2b => $coeff2b, ...},
                 ... };

experiment_fid_levels

    my $expHash =           $sapObject->experiment_fid_levels({
                                -ids => [$exp1, $exp2, ...]
                            });

Given an experiment, return the on/off levels for all genes in that experiment. An on/off level is either 1 (expressed), -1 (inhibited), or 0 (unknown).

parameter

The parameter should be a reference to a hash with the following key.

-ids

Reference to a list of experiment IDs.

RETURN

Returns a reference to a hash that maps each experiment ID to a sub-hash that indicates the expression level of each gene for which the experiment showed a result.

    $expHash = { $exp1 => { $fid1a => $level1a, $fid1b => $level1b, ... },
                 $exp2 => { $fid2a => $level2a, $fid2b => $level2b, ... },
                 ... };

experiment_regulon_levels

    my $expHash =           $sapObject->experiment_regulon_levels({
                                -ids => [$exp1, $exp2, ...]
                            });

Given an experiment, return the on/off levels for all atomic regulons affected by that experiment. An on/off level is either 1 (expressed), -1 (inhibited), or 0 (unknown).

parameter

The parameter should be a reference to a hash with the following key.

-ids

Reference to a list of experiment IDs.

RETURN

Returns a reference to a hash that maps each experiment ID to a sub-hash that indicates the expression level of each atomic regulon for which the experiment showed a result.

    $expHash = { $exp1 => { $regulon1a => $level1a, $regulon1b => $level1b, ... },
                 $exp2 => { $regulon2a => $level2a, $regulon2b => $level2b, ... },
                 ... };

expressed_genomes

    my $genomeList =        $sapObject->expressed_genomes((
                                -names => 1
                            });

List the IDs of genomes for which expression data exists in the database.

parameter

The parameter should be a reference to a hash with the following keys.

-names (optional)

If TRUE, then the return will be a reference to a hash mapping the genome IDs to genome names; if FALSE, the return will be a reference to a list of genome IDs. The default is FALSE.

RETURN

Returns a reference to a list of genome IDs or a hash mapping genome IDs to genome names.

-names FALSE
    $genomeList = [$genome1, $genome2, ...];
-names TRUE
    $genomeList = { $genome1 => $name1, $genome2 => $name2, ... };

fid_experiments

    my $fidHash =           $sapObject->fid_experiments({
                                -ids => [$fid1, $fid2, ...],
                                -experiments => [$exp1, $exp2, ...]
                            });

Return the expression levels for the specified features in all experiments for which they have results.

parameter

The parameter should be a reference to a hash with the following key.

-ids

Reference to a list of FIG feature IDs.

-experiments (optional)

A list of experiments. If specified, only levels from the indicated experiments will be returned.

RETURN

Returns a reference to a hash mapping each incoming feature ID to a list of 3-tuples, each 3-tuple containing (0) an experiment ID, (1) the expression on/off indication (1/0/-1), and (2) the normalized rma-value.

    $fidHash =  { $fid1 => [[$exp1a, $level1a, $rma1a],
                            [$exp1b, $level1b, $rma1b], ...],
                  $fid2 => [[$exp2a, $level2a, $rma2a],
                            [$exp2b, $level2b, $rma2b], ...],
                     ... };

fid_vectors

    my $regulonHash =       $sapObject->fid_vectors({
                                -ids => [$fid1, $fid2, ...],
                                -raw => 0
                            });

Return a map of the expression levels for each specified feature (gene). The expression levels will be returned in the form of vectors with values -1 (suppressed), 1 (expressed), or 0 (unknown) in each position. The positions will correspond to the experiments in the order returned by "genome_experiments".

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs.

-raw (optional)

If TRUE, then the vectors will be returned in the form of strings. Each string will have the character +, -, or space for the values 1, -1, and 0 respectively.

RETURN

Returns a reference to a hash mapping the incoming atomic regulon IDs to the desired vectors. The vectors will normally be references to lists of values pf 1, 0, and -1, but they can also be represented as strings.

Normal Output
    $regulonHash = { $fid1 => [$level1a, $level2a, ...],
                     $fid2 => [$level2a, $level2b, ...],
                     ... };
Output if -raw is TRUE
    $regulonHash = { $fid1 => $string1, $fid2 => $string2, ... };

fids_expressed_in_range

    my $genomeHash =        $sapObject->fids_expressed_in_range({
                                -ids => [$genome1, $genome2, ...],
                                -minLevel => $min,
                                -maxLevel => $max
                            });

Return for each genome the genes that are expressed in a given fraction of the experiments for that ganome.

parameter

The parameter should be a reference to a hash containing the following keys.

-ids

Reference to a list of IDs for the genomes of interest.

-minLevel (optional)

Minimum expression level. Only genes expressed at least this fraction of the time will be output. Must be between 0 and 1 (inclusive) to be meaningful. The default is 0, which gets everything less than or equal to the maximum level.

-maxLevel (optiona;)

Maximum expression level. Only genes expressed no more than this fraction of the time will be output. Must be between 0 and 1 (inclusive) to be meaningful. The default is 1, which gets everything greater than or equal to the minimum level.

RETURN

Returns a hash that maps each incoming genome ID to a sub-hash. The sub-hash maps the FIG ID for each qualifying feature to the level (as a fraction of the total experiments recorded) that it is expressed.

    $genomeHash = { $genome1 => { $fid1a => $level1a, $fid1b => $level1b, ...},
                    $genome1 => { $fid2a => $level2a, $fid2b => $level2b, ...},
                  };

fids_to_regulons

    my $fidHash =           $sapObject->fids_to_regulons({
                                -ids => [$fid1, $fid2, ...]
                            });

Return the atomic regulons associated with each incoming gene.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs for the genes of interest.

RETURN

Returns a reference to a hash of hashes, keyed on FIG feature ID. Each feature is mapped to a sub-hash that maps the feature's atomic regulons to the number of features in each regulon.

    $fidHash = { $fid1 => { $regulon1a => $size1a, $regulon1b => $size1b, ...},
                 $fid2 => { $regulon2a => $size2a, $regulon2b => $size2b, ...},
                 ... };

genome_experiments

    my $genomeHash =        $sapObject->genome_experiments({
                                -ids => [$genome1, $genome2, ...]
                            });

Return a list of the experiments for each indicated genome.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of genome IDs. For each genome ID, a list of relevant experiments will be produced.

RETURN

Returns a hash mapping each incoming genome ID to a list of experiments related to that genome ID.

    $featureHash = { $id1 => [$exp1a, $exp1b, ...],
                     $id2 => [$exp2a, $exp2b, ...] };

genome_experiment_levels

    my $fidHash =           $sapObject->genome_experiment_levels({
                                -genome => $genome1,
                                -experiments => [$exp1, $exp2, ...]
                            });

Return the expression levels for the specified features in all experiments for which they have results.

parameter

The parameter should be a reference to a hash with the following keys.

-genome

ID of a genome for which expression data is present.

-experiments (optional)

A list of experiments. If specified, only levels from the indicated experiments will be returned.

RETURN

Returns a reference to a hash mapping each of the genome's feature IDs to a list of 3-tuples, each 3-tuple containing (0) an experiment ID, (1) the expression on/off indication (1/0/-1), and (2) the normalized rma-value.

    $fidHash =  { $fid1 => [[$exp1a, $level1a, $rma1a],
                            [$exp1b, $level1b, $rma1b], ...],
                  $fid2 => [[$exp2a, $level2a, $rma2a],
                            [$exp2b, $level2b, $rma2b], ...],
                     ... };

regulons_to_fids

    my $regHash =           $sapObject->regulons_to_fids({
                                -ids => [$regulon1, $regulon2, ...]
                            });

Return the list of genes in each specified atomic regulon.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of atomic regulon IDs.

RETURN

Returns a reference to a hash mapping each incoming atomic regulon ID to a list of the FIG feature IDs for the genes found in the regulon.

    $regHash = { $regulon1 => [$fid1a, $fid1b, ...],
                 $regulon2 => [$fid2a, $fid2b, ...],
                 ... };

Feature (Gene) Data Methods

NOTE: To get the functional assignment for a feature, see "Annotation and Assertion Data Methods".

compared_regions

    my $result =           $sapObject->compared_regions({
                                -focus => $fid1,
                                -genomes => [$genome1, $genome2, ... ],
                                -extent => 16000
                            });

Return information about the context of a focus gene and the corresponding genes in other genomes (known as pinned genes). The information returned can be used to create a compare-regions display.

The return information will be in the form of a reference to a list of contexts, each context containing genes in a region surrounding the pinned gene on a particular genome. The genome containing the focus gene will always be the first in the list.

parameter

The parameter should be a reference to a hash with the following keys.

-focus

The FIG ID of the focus gene.

-count (optional)

The number of pinned genes desired. If specified, the closest genes to the focus gene will be located, at most one per genome. The default is 4.

-genomes (optional)

Reference to a list of genomes. If specified, only genes in the specified genomes will be considered pinned.

-pins (optional)

Reference to a list of FIG feature IDs. The listed genes will be used as the pinned genes. If this option is specified, it overrides -count and -genomes.

-extent (optional)

The number of base pairs to show in the context for each particular genome. The default is 16000.

RETURN

Returns a hash that maps each focus gene to the compared regions view for that gene.

Each compared regions view is a list of hashes, one hash per genome.

Each genome has the following keys:

    genome_id => this genome's id
    genome_name => this genome's name
    row_id => the row number for this genome
    features => the features for this genome.

The features lists will consist of one or more 9-tuples, one per gene in the context. Each 8-tuple will contain (0) the gene's FIG feature ID, (1) its functional assignment, (2) its FIGfam ID, (3) the contig ID, (4) the start location, (5) the end location, (6) the direction (+ or -), (7) the row index, and (8) the color index. All genes with the same color have similar functions.

    $result = { focus_fid =>
               [
                 { row_id => 0, genome_name => "g1name", genome_id => "g1id",
                   features => [[$fid1a, $function1a, $figFam1a, $contig1a, $start1a, $end1a, $dir1a, 0, $color1a],
                                [$fid1b, $function1b, $figFam1b, $contig1b, $start1b, $end1b, $dir1b, 0, $color1b],
                                ... ],
                },
                { row_id => 1, genome_name => "g2name", genome_id => "g2id",
                  features => [[$fid2a, $function2a, $figFam2a, $contig2a, $start2a, $end2a, $dir2a, 1, $color2a],
                               [$fid2b, $function2b, $figFam2b, $contig2b, $start2b, $end2b, $dir2b, 1, $color2b],
                                ... ],
                },

                ...
                ]
                };

equiv_sequence_ids

    my $idHash =            $sapObject->equiv_sequence_ids({
                                -ids => [$id1, $id2, ...],
                                -precise => 1
                            });

Return all identifiers for genes in the database that are protein-sequence-equivalent to the specified identifiers. In this case, the identifiers are assumed to be in their natural form (without prefixes). For each identifier, the identified protein sequences will be found and then for each protein sequence, all identifiers for that protein sequence or for genes that produce that protein sequence will be returned.

Alternatively, you can ask for identifiers that are precisely equivalent, that is, that identify the same location on the same genome.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of identifiers of interest. These can be normal feature identifiers in prefixed form (e.g. cmr|NT03SD3201, gi|90022544, fig|100226.1.peg.3361) or their natural, un-prefixed form (NT03SD3201, 90022544). In addition, they can be protein sequence IDs formed by taking the hexadecimal MD5 hash of the protein sequence with an optional md5 or gnl|md5 prefix (500009d8cf094fa4e6a1ebb15295c60f, gnl|md5|6a00b57a9facf5056c68e5d7fe157814).

-precise

If TRUE, then only identifiers that refer to the same location on the same genome will be returned. The default is FALSE (return all sequence-equivalent IDs). If this option is specified, identifiers that refer to proteins rather than features will return no result.

-assertions

If TRUE, then instead of returning a hash of lists, this method will return a hash of sub-hashes. Each sub-hash will be keyed by the equivalent IDs, and will map each ID to a list of 3-tuples describing assertions about the ID, each 3-tuple consisting of (0) an assertion of function, (1) the source of the assertion, and (2) a flag that is TRUE for an expert assertion and FALSE otherwise. IDs in a sub-hash which are not associated with assertions will map to an empty list.

RETURN

Returns a reference to a hash that maps each incoming identifier to a list of sequence-equivalent identifiers.

Normal Output
    $idHash = { $id1 => [$id1a, $id1b, ...],
                $id2 => [$id2a, $id2b, ...],
                ... };
Output with -assertions = 1
    $idHash = { $id1 => { $id1a => [[$assert1ax, $source1ax, $flag1ax],
                                    [$assert1ay, $source1ay, $flag1ay], ...],
                          $id1b => [[$assert1bx, $source1bx, $flag1bx],
                                    [$assert1by, $source1by, $flag1by], ...]},
                          ... },
                $id2 => { $id2a => [[$assert2ax, $source2ax, $flag2ax],
                                    [$assert2ay, $source2ay, $flag2ay], ...],
                          $id2b => [[$assert2bx, $source2bx, $flag2bx],
                                    [$assert2by, $source2by, $flag2by], ...]},
                          ... },
                ... };

The output identifiers will not include protein sequence IDs: these are allowed on input only as a convenience.

fid_correspondences

    my $featureHash =       $sapObject->fid_correspondences({
                                -ids => [$fid1, $fid2, ...],
                                -genomes => [$genome1, $genome2, ...]
                            });

Return the corresponding genes for the specified features in the specified genomes. The correspondences are determined in the same way as used by "gene_correspondence_map", but this method returns substantially less data.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs.

-genomes

Reference to a list of genome IDs. For each incoming feature ID, the corresponding features in the specified genomes will be returned.

RETURN

Returns a reference to a hash that maps each incoming feature ID to a list of corresponding feature IDs in the specified genomes. If no sufficiently corresponding feature is found in any of the genomes, the feature ID will map to an empty list.

    $featureHash = { $fid1 => [$fid1a, $fid1b, ...],
                     $fid2 => [$fid2a, $fid2b, ...],
                     ... };

fid_locations

    my $featureHash =       $sapObject->fid_locations({
                                -ids => [$fid1, $fid2, ...],
                                -boundaries => 1
                            });

Return the DNA locations for the specified features.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs.

-boundaries (optional)

If TRUE, then for any multi-location feature, a single location encompassing all the location segments will be returned instead of a list of all the segments. If the segments cross between contigs, then the behavior in this mode is undefined (something will come back, but it may not be what you're expecting). The default is FALSE, in which case the locations for each feature will be presented in a list.

RETURN

Returns a reference to a hash mapping each feature ID to a list of location strings representing the feature locations in sequence order.

    $featureHash = { $fid1 => [$loc1a, $loc1b, ...],
                     $fid2 => [$loc2a, $loc2b, ...],
                     ... };

fid_map_for_genome

    my $idHash =            $sapObject->get_map_for_genome({
                                -idHash => { $myID1 => [$id1a, $id1b, ...],
                                             $myID2 => [$id2a, $id2b, ...],
                                             ... },
                                -genome => $genome1
                            });

Find FIG IDs corresponding to caller-provided genes in a specific genome.

In some situations you may have multiple external identifiers for various genes in a genome without knowing which ones are present in the Sapling database and which are not. The external identifiers present in the Sapling database are culled from numerous sources, but different genomes will tend to have coverage from different identifier types: some genomes are represented heavily by CMR identifiers and have no Locus Tags, others have lots of Locus Tags but no CMR identifiers, and so forth. This method allows you to throw everything you have at the database in hopes of finding a match.

parameter

The parameter should be a reference to a hash with the following keys.

-idHash

Reference to a hash that maps caller-specified identifiers to lists of external identifiers in prefixed form (e.g. LocusTag:SO1103, uni|QX8I1, gi|4808340). Each external identifier should be an alternate name for the same gene.

-genome (optional)

ID of a target genome. If specified, only genes in the specified target genome will be returned.

RETURN

Returns a hash mapping the original caller-specified identifiers to FIG IDs in the target genome. If the identifier list is ambiguous, the first matching FIG ID will be used. If no matching FIG ID is found, an undefined value will be used.

    $idHash = { $myID1 => $fid1, $myID2 => $fid2, ... };

fid_possibly_truncated

    my $featureHash =       $sapObject->fid_possibly_truncated({
                                -ids => [$fid1, $fid2, ...],
                                -limit => 300
                            });

For each specified gene, return stop if its end is possibly truncated, start if its beginning is possibly truncated, and an empty string otherwise. Truncation occurs if the gene is located near either edge of a contig.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG gene IDs.

-limit (optional)

The distance from the end of a contig considered to be at risk for truncation. the default is 300.

RETURN

Returns a hash mapping each incoming gene ID to the appropriate value (start if it has a possibly-truncated start, stop if it has a possibly-truncated stop, or the empty string otherwise). Note that the empty string is expected to be the most common result.

    $featureHash = { $fid1 => $note1, $fid2 => $note2, ... };

fids_to_ids

    my $featureHash =       $sapObject->fids_to_ids({
                                -ids => [$fid1, $fid2, ...],
                                -types => [$typeA, $typeB, ...],
                                -protein => 1
                            });

Find all aliases and/or synonyms for the specified FIG IDs. For each FIG ID, a hash will be returned that maps each ID type to a list of the IDs of that type.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs of interest,

-types (optional)

Reference to a list of permissible ID types. Only ID types in this list will be present in the output. If omitted, all ID types are permissible.

-protein (optional)

If TRUE, then IDs for features with equivalent protein sequences will be returned; otherwise, only IDs for precisely equivalent genes will be returned. The default is FALSE

-natural (optional)

If TRUE, then the IDs will be returned in their natural form; otherwise, the IDs are returned in prefixed form. The default is FALSE.

RETURN

Returns a reference to a hash that maps each feature ID to a sub-hash. Each sub-hash maps an ID type to a list of equivalent IDs of that type.

    $featureHash = { $fid1 => { $typeA => [$id1A1, $id1A2, ...],
                                $typeB => [$id1B1, $id1B2, ...],
                                ... },
                     $fid2 =>  { $typeA => [$id2A1, $id2A2, ...],
                                 $typeB => [$id2B1, $id2B2, ...],
                                ... },
                     ... };

fids_to_proteins

    my $fidHash =           $sapObject->fids_to_proteins({
                                -ids => [$fid1, $fid2, ...],
                                -sequence => 1
                            });

Return the ID or amino acid sequence associated with each specified gene's protein. If the gene does not produce a protein, it will not be included in the output.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs, representing the features of interest.

-sequence (optional)

If TRUE, then the output will include protein sequences; otherwise, the output will include MD5 protein IDs. The default is FALSE.

RETURN

Returns a reference to a hash keyed by feature ID. If -sequence is FALSE, then the hash maps each feature ID to the MD5 ID of the relevant gene's protein sequence. If -sequence is TRUE, then the hash maps each feature ID to the relevant protein sequence itself.

-sequence TRUE
    $fidHash = { $fid1 => $sequence1, $fid2 => $sequence2, ... };
-sequence FALSE
    $fidHash = { $fid1 => $md5id1, $fid2 => $md5id2, ... };

fids_with_evidence_codes

    my $featureHash =       $sapObject->fids_with_evidence_codes({
                                -codes => [$code1, $code2, ...],
                                -genomes => [$genome1, $genome2, ...]
                            });

Return the ID, assignment, and evidence for all features having an evidence code of one of the specified types. The output can be restricted to one or more specified genomes.

parameter

The parameter should be a reference to a hash with the following keys.

-codes

Reference to a list of evidence code types. This is only the prefix, not a full-blown code. So, for example, ilit would be used for indirect literature references, dlit for direct literature references, and so forth.

-genomes (optional)

Reference to a list of genome IDs. If no genome IDs are specified, all features in all genomes will be processed.

RETURN

Returns a hash mapping each feature to a list containing the function followed by all of the feature's evidence codes.

    $featureHash = { $fid1 => [$function1, $code1A, $code1B, ...],
                     $fid2 => [$function2, $code2A, $code2B, ...],
                     ... };

genes_in_region

    my $locHash =           $sapObject->genes_in_region({
                                -locations => [$loc1, $loc2, ...],
                                -includeLocation => 1
                            });

Return a list of the IDs for the features that overlap the specified regions on a contig.

parameter

The parameter should be a reference to a hash with the following keys.

-locations

Reference to a list of location strings (e.g. 360108.3:NZ_AANK01000002_264528_264007 or 100226.1:NC_003888_3766170+612). A location string consists of a contig ID (which includes the genome ID), an underscore, a begin offset, and either an underscore followed by an end offset or a direction (+ or -) followed by a length.

-includeLocation

If TRUE, then instead of mapping each location to a list of IDs, the hash will map each location to a hash reference that maps the IDs to their locations.

RETURN

Returns a reference to a hash mapping each incoming location string to a list of the IDs for the features that overlap that location.

    $locHash = { $loc1 => [$fid1A, $fid1B, ...],
                 $loc2 => [$fid2A, $fid2B, ...],
                 ... };

ids_to_data

    my $featureHash =       $sapObject->ids_to_data({
                                -ids => [$id1, $id2, ...],
                                -data => [$fieldA, $fieldB, ...],
                                -source => 'UniProt'
                            });

Return the specified data items for the specified features.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of gene identifiers. Normally, these would be FIG feature IDs, but other identifier types can be specified if you use the -source option.

-data

Reference to a list of data field names. The possible data field names are given below.

evidence

Comma-delimited list of evidence codes indicating the reason for the gene's current assignment.

fig-id

The FIG ID of the gene.

function

Current functional assignment.

genome-name

Name of the genome containing the gene.

length

Number of base pairs in the gene.

location

Comma-delimited list of location strings indicated the location of the gene in the genome. A location string consists of a contig ID, an underscore, the starting offset, the strand (+ or -), and the number of base pairs.

publications

Comma-delimited list of PUBMED IDs for publications related to the gene.

-source (optional)

Database source of the IDs specified-- e.g. SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

-genome (optional)

ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.

RETURN

Returns a hash mapping each incoming ID to a list of tuples, There will be one tuple for each feature identified by the incoming ID (because some IDs are ambiguous there may be more than one), and the tuple will contain the specified data fields for the computed gene in the specified order.

    $featureHash = { $id1 => [$tuple1A, $tuple1B, ...],
                     $id2 => [$tuple2A, $tuple2B, ...],
                     ... };

ids_to_fids

    my $idHash =            $sapObject->ids_to_fids({
                                -ids => [$id1, $id2, ...],
                                -protein => 1,
                                -genomeName => $genusSpeciesString,
                                -source => 'UniProt'
                            });

Return a list of the FIG IDs corresponding to each of the specified identifiers. The correspondence can either be gene-based (same feature) or sequence-based (same protein).

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of identifiers.

-source

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth).

-protein (optional)

If TRUE, then all FIG IDs for equivalent proteins will be returned. The default is FALSE, meaning that only FIG IDs for the same gene will be returned.

-genomeName (optional)

The full or partial name of a genome or a comma-delimited list of genome IDs. This parameter is useful for narrowing the results when a protein match is specified. If it is omitted, no genome filtering is performed.

RETURN

Returns a reference to a hash mapping each incoming identifier to a list of equivalent FIG IDs.

    $idHash = { $id1 => [$fid1A, $fid1B, ...],
                $id2 => [$fid2A, $fid2B, ...],
                ... };

ids_to_genomes

    my $featureHash =       $sapObject->ids_to_genomes({
                                -ids => [$id1, $id2, ...],
                                -source => 'SwissProt',
                                -name => 1
                            });

Return the genome information for each incoming gene ID.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of gene IDs.

-source (optional)

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

-name (optional)

If TRUE, the genomes names will be returned; if FALSE, the genome IDs will be returned. The default is FALSE.

RETURN

Returns a reference to a hash mapping each incoming ID to the associated genome ID, or alternatively to the associated genome name.

    $featureHash = { $id1 => $genome1, $id2 => $genome2, ... };

ids_to_lengths

    my $geneHash =          $sapObjects->ids_to_lengths({
                                -ids => [$id1, $id2, ...],
                                -protein => 1,
                                -source => 'NCBI'
                            });

Return the DNA or protein length of each specified gene.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of gene IDs.

-source (optional)

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

-genome (optional)

ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.

-protein (optional)

If TRUE, then the length of each gene's protein will be returned. Otherwise, the DNA length of each gene will be returned. The default is FALSE (DNA lengths).

RETURN

Returns a reference to a hash mapping each incoming ID to the length of the associated gene. If no gene is found, or -protein is TRUE and the gene is not a protein-encoding gene, the ID will not be present in the return hash.

    $geneHash = { $id1 => $length1, $id2 => $length2, ... };

make_runs

    my $groupHash =         $sapObject->make_runs({
                                -groups => ["$fid0a, $fid0b, ...",
                                            "$fid1a, $fid1b, ...",
                                            ...],
                                -maxGap => 200,
                                -justFirst = 1,
                                -operonSize => 10000
                            });

Look at sequences of feature IDs and separate them into operons. An operon contains features that are close together on the same contig going in the same direction.

parameter

The parameter should be a reference to a hash with the following keys.

-groups

Reference to a list of strings. Each string will contain a comma-separated list of FIG feature IDs for the features in a group. Alternatively, this can be a reference to a list of lists, in which each sub-list contains the feature IDs in a group.

-maxGap (optional)

Maximum number of base pairs that can be between to genes in order for them to be considered as part of the same operon. The default is 200.

-justFirst (optional)

If TRUE, then only the first feature in an operon will be included in the output operon strings. The default is FALSE.

-operonSize (optional)

Estimate of the typical size of an operon. This is a tuning parameter; the default is 10000.

RETURN

Returns a hash mapping group numbers to lists of operons. In other words, for each incoming group, the hash will map the group's (zero-based) index number to a list of operon strings. Each operon string is a comma-separated list of feature IDs in operon order.

    $groupHash = { 0 => [[$fid1op1, $fid2op1, ...],
                         [$fid1op2, $fid2op2, ...], ... ],
                   1 => [[$fid1opA, $fid2opB, ...],
                         [$fid1opB, $fid2opB, ...], ... ],
                   ... };

proteins_to_fids

    my $protHash =          $sapObject->proteins_to_fids({
                                -prots => [$prot1, $prot2, ...]
                            });

Return the FIG feature IDs associated with each incoming protein. The protein can be specified as an amino acid sequence or MD5 protein ID.

parameter

The parameter should be a reference to a hash with the following keys.

-prots

Reference to a list of proteins. Each protein can be specified as either an amino acid sequence or an MD5 protein ID. The method will assume a sequence of 32 hex characters is an MD5 ID and anything else is an amino acid sequence. Amino acid sequences should be in upper-case only.

RETURN

Returns a hash mapping each incoming protein to a list of FIG feature IDs for the genes that produce the protein.

    $protHash = { $prot1 => [$fid1a, $fid1b, ...],
                  $prot2 => [$fid2a, $fid2b, ...],
                  ... };

FIGfam Data Methods

all_figfams

    my $ffHash =            $sapObject->all_figfams({
                                -roles => [$role1, $role2, ...],
                                -functions => [$function1, $function2, ...]
    });

Return a list of all the FIGfams along with their functions. Optionally, you can specify a role or a function, and only FIGfams with that role or function will be returned.

parameter

The parameter should be a reference to a hash with the following keys.

roles (optional)

If specified, a reference to a list of roles. Only FIGfams with one of the specified roles (or one of the functions listed in -functions) will be returned in the hash.

function (optional)

If specified, a reference to a list of functions. Only FIGfams with one of the specified functions (or one of the roles listed in -roles) will be returned in the hash.

RETURN

Returns a reference to a hash mapping each qualifying FIGfam ID to its function.

    $ffHash = { $ff1 => $function1, $ff2 => $function2, ... };

discriminating_figfams

    my $groupList =         $sapObject->discriminating_figfams({
                                -group1 => [$genome1a, $genome2a, ...],
                                -group2 => [$genome2a, $genome2b, ...]
                            });

Determine the FIGfams that discriminate between two groups of genomes.

A FIGfam discriminates between genome groups if it is common in one group and uncommon in the other. The degree of discrimination is assigned a score based on statistical significance, with 0 being insignificant and 2 being extremely significant. FIGfams with a score greater than 1 are returned by this method.

parameter

The parameter should be a reference to a hash with the following keys.

-group1

Reference to a list of genome IDs for the genomes in the first group.

-group2

Reference to a list of genome IDs for the genomes in the second

RETURN

Returns a reference to a 2-tuple, consisting of (0) a hash mapping FIGfam IDs to scores for FIGfams common in group 1 and (1) a hash maping FIGfam IDs to scores for FIGfams common in group 2.

    $groupList = [{ $ff1a => $score1a, $ff1b => $score1b, ... },
                  { $ff2a => $score2a, $ff2b => $score2b, ... }];

figfam_fids

    my $fidList =           $sapObject->figfam_fids({
                                -id => $figFam1,
                                -fasta => 1
                            });

Return a list of all the protein encoding genes in a FIGfam. The genes can be returned as IDs or as FASTA strings.

parameter

The parameter should be a reference to a hash with the following keys.

-id

ID of the desired FIGfam.

-fasta

If TRUE, then the output will be in the form of FASTA strings; otherwise it will be in the form of FIG IDs.

RETURN

Returns a reference to a list of genes in the form of FIG feature IDs or protein FASTA strings.

Normal Output
    $fidList = [$fid1, $fid2, ...];
Output When -fasta = 1
    $fidList = [$fasta1, $fasta2, ...];

figfam_fids_batch

    my $fidHash =           $sapObject->figfam_fids_batch({
                                -ids => [$ff1, $ff2, ...],
                                -genomeFilter => $genome1
                            });

Return a list of all the protein encoding genes in one or more FIGfams. This method is an alternative to "figfam_fids" that is faster when you need the feature IDs but not the protein sequences.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the IDs of the desired FIGfams.

-genomeFilter (optional)

The ID of a genome. If specified, then only feature IDs from the specified genome will be returned.

RETURN

Returns a hash mapping each incoming FIGfam ID to a list of the IDs for the features in that FIGfam.

    $fidHash = { $ff1 => [$fid1a, $fid1b, ...],
                 $ff2 => [$fid2a, $fid2b, ...],
                 ... };

figfam_function

    my $ffHash =            $sapObject->figfam_function({
                                -ids => [$ff1, $ff2, ...]
                            });

For each incoming FIGfam ID, return its function, that is, the common functional assignment of all its members.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIGfam IDs.

RETURN

Returns a hash mapping each incoming FIGfam ID its function string.

    $ffHash => { $ff1 => $function1, $ff2 => $function2, ... };

genome_figfams

    my $genomeHash =        $sapObject->genome_figfams({
                                -ids => [$genome1, $genome2, ...]
                            });

Compute the list of FIGfams represented in each specific genome.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of genome identifiers.

RETURN

Returns a reference to a hash mapping each incoming genome ID to a list of the IDs of the FIGfams represented in that genome.

    $genomeHash = { $genome1 => [$ff1a, $ff1b, ...],
                    $genome2 => [$ff2a, $ff2b, ...],
                     ... };

ids_to_figfams

    my $featureHash =       $sapObject->ids_to_figfams({
                                -ids => [$id1, $id2, ...],
                                -functions => 1,
                                -source => 'RefSeq'
                            });

This method returns a hash mapping each incoming feature to its FIGfam.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of feature identifiers.

-functions (optional)

If TRUE, the family function will be returned in addition to the list of FIGfam IDs. In this case, instead of a list of FIGfam IDs, each feature ID will point to a list of 2-tuples, each consisting of (0) a FIGfam ID followed by (1) a function string. The default is FALSE.

-source (optional)

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

-genome (optional)

ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.

RETURN

Returns a reference to a hash mapping each incoming feature ID to a list of the IDs of the FIGfams that contain it. (In general the list will be a singleton unless the feature ID corresponds to multiple actual features.) Features not in FIGfams will be omitted from the hash.

    $featureHash = { $id1 => [$ff1a, $ff1b, ...],
                     $id2 => [$ff2a, $ff2b, ...],
                     ... };

related_figfams

    my $ffHash =            $sapObject->related_figfams({
                                -ids => [$ff1, $ff2, ...],
                                -expscore => 1,
                                -all => 1
                            });

This method takes a list of FIGfam IDs. For each FIGfam, it returns a list of FIGfams related to it by functional coupling.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIGfam IDs.

-expscore (optional)

If TRUE, then the score returned will be the co-expression score. If FALSE, the score returned will be the co-occurrence score. This option is ignored if -all is specified. The default is FALSE.

-all (optional)

If TRUE, then both scores will be returned. The default is FALSE, meaning only one score is returned.

RETURN
normal

Returns a reference to a hash mapping each incoming FIGfam ID to a list of 2-tuples for other FIGfams. The 2-tuples each consist of (0) a related FIGfam's ID followed by (1) a 2-tuple containing a coupling score and the related FIGfam's function.

    $ffHash = { $ff1 => [[$ff1a, [$score1a, $function1a]],
                         [$ff1b, [$score1b, $function1b]], ...],
                $ff2 => [[$ff2a, [$score2a, $function2a]],
                         [$ff2b, [$score2b, $function2b]], ...],
                ... };
-exp = all

Returns a reference to a hash mapping each incoming FIGfam ID to a list of 2-tuples for other FIGfams. The 2-tuples each consist of (0) a related FIGfam's ID followed by (1) a 3-tuple containing the co-occurrence coupling score, the co-expression coupling score, and the related FIGfam's function.

    $ffHash = { $ff1 => [[$ff1a, [$score1ax, $score1ay, $function1a]],
                         [$ff1b, [$score1bx, $score1by, $function1b]], ...],
                $ff2 => [[$ff2a, [$score2ax, $score2ay, $function2a]],
                         [$ff2b, [$score2bx, $score2by, $function2b]], ...],
                ... };

roles_to_figfams

    my $roleHash =          $sapObject->roles_to_figfams({
                                -roles => [$role1, $role2, ...]
                            });

For each incoming role, return a list of the FIGfams that implement the role, that is, whose functional assignments include the role.

parameter

The parameter should be a reference to a hash with the following keys.

-roles

Reference to a list of role names.

RETURN

Returns a reference to a hash mapping each incoming role to a list of FIGfam IDs for the FIGfams that implement the role.

    $roleHash = { $role1 => [$ff1a, $ff1b, ...],
                  $role2 => [$ff2a, $ff2b, ...],
                  ... };

Functional Coupling Data Methods

clusters_containing

    my $featureHash =       $sapObject->clusters_containing({
                                -ids => [$fid1, $fid2, ...]
                            });

This method takes as input a list of FIG feature IDs. For each feature, it returns the IDs and functions of other features in the same cluster of functionally-coupled features.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs.

For backward compatibility, this method can also take as input a reference to a list of FIG feature IDs.

RETURN

Returns a reference to a hash. The hash maps each incoming feature ID to a 2-tuple containing (0) the feature's functional assignment and (1) a reference to a hash that maps each clustered feature to its functional assignment.

    $featureHash = { $fid1 => [$function1, { $fid1a => $function1a,
                                             $fid1b => $function1b,
                                             ...}],
                     $fid2 => [$function2, { $fid2a => $function2a,
                                             $fid2b => $function2b,
                                             ...}],
                     ... };

In backward-compatibility mode, this method returns a reference to a list. For each incoming feature, there is a list entry containing the feature ID, the feature's functional assignment, and a sub-list of 2-tuples. Each 2-tuple contains the ID of another feature in the same cluster and its functional assignment.

co_occurrence_evidence

    my $pairHash =          $sapObject->co_occurrence_evidence({
                                -pairs => ["$fid1:$fid2", "$fid3:$fid4", ...]
                            });

For each specified pair of genes, this method returns the evidence that the genes are functionally coupled (if any); that is, it returns a list of the physically close homologs for the pair.

parameter

The parameter should be a reference to a hash with the following keys.

-pairs

Reference to a list of functionally-coupled pairs. Each pair is represented by two FIG gene IDs, either in the form of a 2-tuple or as a string with the two gene IDs separated by a colon.

RETURN

Returns a hash mapping each incoming gene pair to a list of 2-tuples. Each 2-tuple contains a pair of physically close genes, the first of which is similar to the first gene in the input pair, and the second of which is similar to the second gene in the input pair. The hash keys will consist of the two gene IDs separated by a colon (e.g. fig|273035.4.peg.1016:fig|273035.4.peg.1018).

    $pairHash = { "$fid1:$fid2" => [[$fid1a, $fid2a], [$fid1b, $fid2b], ...],
                  "$fid3:$fid4" => [[$fid3a, $fid4a], [$fid3b, $fid4b], ...],
                  ... };

conserved_in_neighborhood

    my $featureHash =       $sapObject->conserved_in_neighborhood({
                                -ids => [$fid1, $fid2, ...]
                            });

This method takes a list of feature IDs. For each feature ID, it will return the set of other features to which it is functionally coupled, along with the appropriate score.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs.

For backward compatibility, this method can also take as input a reference to a list of FIG feature IDs.

RETURN

Returns a reference to a hash mapping each incoming feature ID to a list of 4-tuples, one 4-tuple for each feature coupled to the incoming feature. Each 4-tuple contains (0) the coupling score, (1) the FIG ID of the coupled feature, (2) the coupled feature's current functional assignment, and (3) the ID of the pair set to which the coupling belongs.

    $featureHash = { $fid1 => [[$score1A, $fid1A, $function1A, $psID1A],
                               [$score1B, $fid1B, $function1B, $psID1B], ...],
                     $fid2 => [[$score2A, $fid2A, $function2A, $psID2A],
                               [$score2B, $fid2B, $function2B, $psID2B], ...],
                     ... };

In backward compatibility mode, returns a list of sub-lists, each sub-list corresponding to the value that would be found in the hash for the feature in the specified position of the input list.

pairsets

    my $psHash =            $sapObject->pairsets({
                                -ids => [$psID1, $psID2, ...]
                            });

This method takes as input a list of functional-coupling pair set IDs (such as those returned in the output of "conserved_in_neighborhood"). For each pair set, it returns the set's score (number of significant couplings) and a list of the coupled pairs in the set.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of functional-coupling pair set IDs.

For backward compatibility, you may also specify a reference to a list of pair set IDs.

RETURN

Returns a reference to a hash that maps each incoming pair-set ID to a 2-tuple that consists of (0) the set's score and (1) a reference to a list of 2-tuples containing the pairs in the set.

    $psHash = { $psID1 => [$score1, [[$fid1A, $fid1B],
                                     [$fid1C, $fid1D], ...]],
                $psID2 => [$score2, [[$fid2A, $fid2B],
                                     [$fid2C, $fid2D], ...]],
                ... };

In backward-compatibility mode, returns a reference to a list of 2-tuples, each consisting of (0) an incoming pair-set ID, and (1) the 2-tuple that would be its hash value in the normal output.

related_clusters

    my $featureHash =       $sapObject->related_clusters({
                                -ids => [$fid1, $fid2, ...]
                            });

This method returns the functional-coupling clusters related to the specified input features. Each cluster contains features on a single genome that are related by functional coupling.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of FIG feature IDs.

RETURN

Returns a reference to a hash that maps each incoming feature ID to a list of clusters. Each cluster in the list is a 3-tuple consisting of (0) the ID of a feature similar to the incoming feature, (1) the similarity P-score, and (2) a reference to a list of 2-tuples containing clustered features and their functional assignments.

    $featureHash = { $fid1 => [[$fid1A, $score1A, [[$fid1Ax, $function1Ax],
                                                   [$fid1Ay, $function1Ay],
                                                   ...]],
                               [$fid1B, $score1B, [[$fid1Bx, $function1Bx],
                                                   [$fid1By, $function1By],
                                                   ...]],
                               ...],
                      $fid2 => [[$fid2A, $score2A, [[$fid2Ax, $function2Ax],
                                                   [$fid2Ay, $function2Ay],
                                                   ...]],
                               [$fid2B, $score2B, [[$fid2Bx, $function2Bx],
                                                   [$fid2By, $function2By],
                                                   ...]],
                               ...],
                      ... };

Genome Data Methods

all_features

    my $genomeHash =        $sapObject->all_features({
                                -ids => [$genome1, $genome2, ...],
                                -type => [$type1, $type2, ...],
                            });

Return a list of the IDs for all features of a specified type in a specified genome.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of genome IDs.

-type (optional)

Type of feature desired (e.g. peg, rna), or a reference to a list of desired feature types. If omitted, all features regardless of type are returned.

RETURN

Returns a reference to a hash that maps each incoming genome ID to a list of the desired feature IDs for that genome. If a genome does not exist or has no features of the desired type, its ID will map to an empty list.

    $genomeHash = { $genome1 => [$fid1a, $fid1b, ...],
                    $genome2 => [$fid2a, $fid2b, ...],
                    ... };

all_genomes

    my $genomeHash = $sapObject->all_genomes({
                            -complete => 1,
                            -prokaryotic => 1
                        });

Return a list of the IDs for all the genomes in the system.

parameter

Reference to a hash containing the following keys.

-complete (optional)

If TRUE, only complete genomes will be returned. The default is FALSE (return all genomes).

-prokaryotic (optional)

If TRUE, only prokaryotic genomes will be returned. The default is FALSE (return all genomes).

RETURN

Returns a reference to a hash mapping genome IDs to genome names.

    $genomeHash = { $genome1 => $name1, $genome2 => $name2, ... };

all_proteins

    my $fidHash = $sapObject->all_proteins({
                        -id => $genome1
                    });

Return the protein sequences for all protein-encoding genes in the specified genome.

parameter

The parameter should be a reference to a hash with the following keys.

-id

A single genome ID. All of the protein sequences for genes in the specified genome will be extracted.

RETURN

Returns a reference to a hash that maps the FIG ID of each protein-encoding gene in the specified genome to its protein sequence.

    $fidHash = { $fid1 => $protein1, $fid2 => $protein2, ... };

close_genomes

    my $genomeHash = $sapObject->close_genomes({
                        -ids => [$genome1, $genome2, ...],
                        -count => 10,
                    });

Find the genomes functionally close to the input genomes.

Functional closeness is determined by the number of FIGfams in common. As a result, this method will not produce good results for genomes that do not have good FIGfam coverage.

parameter

The parameter should be a reference to a hash with the following keys.

ids

Reference to a list of genome IDs for the genomes whose close neighbors are desired.

count (optional)

Maximum number of close genomes to return for each input genome. The default is 10.

RETURN

Returns a reference to a hash mapping each incoming genome ID to a list of 2-tuples. Each 2-tuple consists of (0) the ID of a close genome and (2) the score (from 0 to 1) for the match. The list will be sorted from closest to furthest.

contig_sequences

    my $contigHash = $sapObject->contig_sequences({
                        -ids => [$contig1, $contig2, ...]
                    });

Return the DNA sequences for the specified contigs.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of contig IDs. Note that the contig ID contains the genome ID as a prefix (e.g. 100226.1:NC_003888).

RETURN

Returns a reference to a hash that maps each contig ID to its DNA sequence.

    $contigHash = { $contig1 => $dna1, $contig2 => $dna2, ... };

contig_lengths

    my $contigHash = $sapObject->contig_lengths({
                        -ids => [$contig1, $contig2, ...]
                    });

Return the lengths for the specified contigs.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of contig IDs. Note that the contig ID contains the genome ID as a prefix (e.g. 100226.1:NC_003888).

RETURN

Returns a reference to a hash that maps each contig ID to its length in base pairs.

    $contigHash = { $contig1 => $len1, $contig2 => $len2, ... };

gene_correspondence_map

    my $geneHash =          $sapObject->gene_correspondence_map({
                                -genome1 => $genome1,
                                -genome2 => $genome2,
                                -fullOutput => 1,
                                -passive => 0
                            });

Return a map of genes in the specified second genome that correspond to genes in the specified first genome.

parameter

The parameter should be a reference to a hash with the following keys.

-genome1

ID of the first genome of interest.

-genome2

ID of the second genome of interest.

-fullOutput (optional)

If 1, then instead of a simple hash map, a list of lists will be returned. If 2, then the list will contain unidirectional correspondences from the target back to the source as well as bidirectional corresopndences and unidirectional correspondences from the source to the target. The default is 0, which returns the hash map.

-passive (optional)

If TRUE, then an undefined value will be returned if no correspondence file exists. If FALSE, a correspondence file will be created and cached on the server if one does not already exist. This is an expensive operation, so set the flag to TRUE if you are worried about performance. The default is FALSE.

RETURN

This method will return an undefined value if either of the genome IDs is missing, not found, or incomplete.

Normal Output

Returns a hash that maps each gene in the first genome to a corresponding gene in the second genome. The correspondence is determined by examining factors such as functional role, conserved neighborhood, and similarity.

    $geneHash = { $g1gene1 => $g2gene1, $g1gene2 => $g2gene2,
                  $g1gene3 => $g2gene3, ... };
Output with -fullOutput >= 1

Returns a reference to list of sub-lists. Each sub-list contains 18 data items, as detailed in "Gene Correspondence List" in ServerThing.

genome_contig_md5s

    my $genomeHash =        $sapObject->genome_contig_md5s({
                                -ids => [$genome1, $genome2, ...]
                            });

For each incoming genome, return a hash mapping its contigs to their MD5 identifiers.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the genome IDs.

RETURN

Returns a hash that maps each incoming genome ID to a sub-hash that maps its contig IDs to their MD5 identifiers. The MD5 identifiers are computed directly from the contig DNA sequences.

    $genomeHash = { $genome1 => {$contig1a => $md5id1a, $contig1b => $md5id1b, ... },
                    $genome2 => {$contig2a => $md5id2a, $contig2b => $md5id2b, ... },
                    ... };

genome_contigs

    my $genomeHash =        $sapObject->genome_contigs({
                                -ids => [$genome1, $genome2, ...]
                            });

For each incoming genome, return a list of its contigs.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the genome IDs.

RETURN

Returns a hash that maps each incoming genome ID to a list of its contig IDs.

    $genomeHash = { $genome1 => [$contig1a, $contig1b, ...],
                    $genome2 => [$contig2a, $contig2b, ...],
                    ... };

genome_data

    my $genomeHash =        $sapObject->genome_data({
                                -ids => [$genome1, $genome2, ...],
                                -data => [$fieldA, $fieldB, ...]
                            });

Return the specified data items for the specified genomes.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of genome IDs.

-data

Reference to a list of data field names. The possible data field names are given below.

complete

1 if the genome is more or less complete, else 0.

contigs

The number of contigs for the genome

dna-size

The number of base pairs in the genome

domain

The domain of the genome (Archaea, Bacteria, ...).

gc-content

The amount of GC base pairs in the genome, expressed as a percentage of the genome's DNA.

genetic-code

The genetic code used by this genome.

pegs

The number of protein encoding genes in the genome.

rnas

The number of RNAs in the genome.

name

The scientific name of the genome.

taxonomy

The genome's full taxonomy as a comma-separated string.

md5

The MD5 identifier computed from the genome's DNA sequences.

RETURN

Returns a hash mapping each incoming genome ID to an n-tuple. Each tuple will contain the specified data fields for the computed gene in the specified order.

    $genomeHash =  { $id1 => [$data1A, $data1B, ...],
                     $id2 => [$data2A, $data2B, ...],
                     ... };

genome_domain

    my $genomeHash =        $sapObject->genome_domain({
                                -ids => [$genome1, $genome2, ...]
                            });

Return the domain for each specified genome (e.g. Archaea, Bacteria, Plasmid).

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the genome IDs.

RETURN

Returnss a hash that maps each incoming genome ID to its taxonomic domain.

genome_fid_md5s

    my $genomeHash =        $sapObject->genome_fid_md5s({
                                -ids => [$genome1, $genome2, ...]
                            });

For each incoming genome, return a hash mapping its genes to their MD5 identifiers.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the genome IDs.

RETURN

Returns a hash that maps each incoming genome ID to a sub-hash that maps its FIG feature IDs to their MD5 identifiers. The MD5 identifiers are computed from the genome's MD5 identifier and the gene's location in the genome.

    $genomeHash = { $genome1 => {$fid1a => $md5id1a, $fid1b => $md5id1b, ... },
                    $genome2 => {$fid2a => $md5id2a, $fid2b => $md5id2b, ... },
                    ... };

genome_ids

    my $genomeHash =        $sapObject->genome_ids({
                                -names => [$name1, $name2, ...],
                                -taxons => [$tax1, $tax2, ...]
                            });

Find the specific genome ID for each specified genome name or taxonomic number. This method helps to find the correct version of a given genome when only the species and strain are known.

parameter

The parameter should be a reference to a hash with the following keys.

-names (optional)

Reference to a list of genome scientific names, including genus, species, and strain (e.g. Streptomyces coelicolor A3(2)). A genome ID will be found (if any) for each specified name.

taxons (optional)

Reference to a list of genome taxonomic numbers. These are essentially genome IDs without an associated version number (e.g. 100226). A specific matching genome ID will be found; the one chosen will be the one with the highest version number that is not a plasmid.

RETURN

Returns a hash mapping each incoming name or taxonomic number to the corresponding genome ID.

    $genomeHash = { $name1 => $genome1, $name2 => $genome2, ...
                    $tax1 => $genome3, $tax2 => $genome4, ... };

genome_metrics

    my $genomeHash =        $sapObject->genome_metrics({
                                -ids => [$genome1, $genome2, ...]
                            });

For each incoming genome ID, returns the number of contigs, the total number of base pairs in the genome's DNA, and the genome's default genetic code.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of genome IDs.

RETURN

Returns a hash mapping each incoming genome ID to a 3-tuple consisting of (0) the number of contigs, (1) the total DNA size, and (2) the genome's default genetic code.

    $genomeHash = { $genome1 => [$contigCount1, $baseCount1, $geneticCode1],
                    $genome2 => [$contigCount2, $baseCount2, $geneticCode2],
                    ... };

genome_names

    my $idHash =            $sapObject->genome_names({
                                -ids => [$id1, $id2, ...],
                                -numbers => 1
                            });

Return the name of the genome containing each specified feature or genome.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of identifiers. Each identifier can be a prefixed feature ID (e.g. fig|100226.1.peg.3361, uni|P0AC98) or a genome ID (83333.1, 360108.3).

-numbers (optional)

If TRUE, the genome ID number will be returned instead of the name. Note that this facility is only useful when the incoming identifiers are feature IDs, as genome IDs would be mapped to themselves.

RETURN

Returns a reference to a hash mapping each incoming feature ID to the scientific name of its parent genome. If an ID refers to more than one real feature, only the first feature's genome is returned.

    $idHash = { $id1 => $genomeName1, $id2 => $genomeName2, ... };

genomes_by_md5

    my $md5Hash =           $sapObject->genomes_by_md5({
                                -ids => [$md5id1, $md5id2, ...],
                                -names => 1
                            });

Find the genomes associated with each specified MD5 genome identifier. The MD5 genome identifier is computed from the DNA sequences of the genome's contigs; as a result, two genomes with identical sequences arranged in identical contigs will have the same MD5 identifier even if they have different genome IDs.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of MD5 genome identifiers.

-names (optional)

If TRUE, then both genome IDs and their associated names will be returned; otherwise, only the genome IDs will be returned. The default is FALSE.

RETURN

Returns a reference to a hash keyed by incoming MD5 identifier. Each identifier maps to a list of genomes. If -names is FALSE, then the list is of genome IDs; if -names is TRUE, then the list is of 2-tuples, each consisting of (0) a genome ID and (1) the associated genome's scientific name.

if -names = TRUE
    $md5Hash = { $md5id1 => [[$genome1a, $name1a], [$genome1b, $name1b], ...],
                 $md5id2 => [[$genome2a, $name2a], [$genome2b, $name2b], ...],
                 ... };
if -names = FALSE
    $md5Hash = { $md5id1 => [$genome1a, $genome1b, ...],
                 $md5id2 => [$genome2a, $genome2b, ...],
                 ... };

intergenic_regions

    my $locList =           $sapObject->intergenic_regions({
                                -genome => $genome1,
                                -type => ['peg', 'rna']
                            });

Return a list of "Location Strings" for the regions in the specified genome that are not occupied by genes of the specified types. All of these will be construed to be on the forward strand, and sorted by contig ID and start location within contig.

parameter

The parameter should be a reference to a hash with the following keys.

-genome

ID of the genome whose intergenic regions are to be returned.

-type (optional)

Reference to a list of gene types. Only genes of the specified type will be considered to be occupying space on the contigs. Typically, this parameter will either be peg or a list consisting of peg and rna. The default is to allow all gene types, but this will not generally produce a good result.

RETURN

Returns a reference to a list of location strings, indicating the intergenic region locations for the genome.

    $locList = [$loc1, $loc2, ...]

is_prokaryotic

    my $genomeHash =        $sapObject->is_prokaryotic({
                                -ids => [$genome1, $genome2, ...]
                            });

For each incoming genome ID, returns 1 if it is prokaryotic and 0 otherwise.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the relevant genome IDs.

RETURN

Returns a reference to a hash that maps each incoming genome ID to 1 if it is a prokaryotic genome and 0 otherwise.

    $genomeHash = { $genome1 => $flag1, $genome2 => $flag2, ... };

mapped_genomes

    my $genomeHash =        $sapObject->mapped_genomes({
                                -ids => [$genome1, $genome2, ...]
                            });

For each incoming genome, return a list of the genomes that have an existing gene correspondence map (see "Gene Correspondence List" in ServerThing). Gene correspondence maps indicate which genes in the target genome are the best hit of each gene in the source genome. If a correspondence map does not yet exist, it will be created when you ask for it, but this is an expensive process and it is sometimes useful to find an alternate genome that will give you a faster result.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the IDs for the genomes of interest. A (possibly empty) result list will be returned for each one.

RETURN

Returns a reference to a hash mapping each incoming genome ID to a list of the IDs for the genomes which have existing correspondence maps on the server.

    $genomeHash = { $genome1 => [$genome1a, $genome1b, ...],
                    $genome2 => [$genome2a, $genome2b, ...],
                    ... };

otu_members

    my $genomeHash =        $sapObject->otu_members({
                                -ids => [$genome1, $genome2, ...]
                            });

For each incoming genome, return the name and ID of each other genome in the same OTU.

parameter

The parameter shoudl be a reference to a hash with the following keys.

-ids

Reference to a list of the IDs for the genomes of interest.

RETURN

Returns a reference to a hash mapping each incoming genome ID to a sub-hash. The sub-hash is keyed by genome ID, and maps the ID of each genome in the same OTU to its name.

    $genomeHash = { $genome1 => { $genome1a => $name1a, $genome1b => $name1b, ... },
                    $genome2 => { $genome2a => $name2a, $genome2b => $name2b, ... },
                    ... };

representative

    my $genomeHash =        $sapObject->representative({
                                -ids => [$genome1, $genome2, ...]
                            });

Return the representative genome for each specified incoming genome ID. Genomes with the same representative are considered closely related, while genomes with a different representative would be considered different enough that similarities between them have evolutionary significance.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the IDs for the genomes of interest.

RETURN

Returns a reference to a hash mapping each incoming genome ID to the ID of its representative genome.

    $genomeHash = { $genome1 => $genome1R, $genome2 => $genome2R, ... };

representative_genomes

    my $mappings =          $sapObject->representative_genomes();

Compute mappings for the genome sets (OTUs) in the database. This method will return a mapping from each genome to its genome set ID and from each genome set ID to a list of the genomes in the set. For the second mapping, the first genome in the set will be the representative.

This method does not require any parameters.

RETURN

Returns a reference to a 2-tuple. The first element is a reference to a hash mapping genome IDs to genome set IDs; the second element is a reference to a hash mapping each genome set ID to a list of the genomes in the set. The first genome in each of these lists will be the set's representative.

    $mappings = [ { $genome1 => $set1, $genome2 => $set2, ... },
                  { $set1 => [$genome1R, $genome1a, $genome1b, ...],
                    $set2 => [$genome2R, $genome2a, $genome2b, ...],
                    ... }
                ];

submit_gene_correspondence

    my $statusCode =    $sapObject->submit_gene_correspondence({
                            -genome1 => $genome1,
                            -genome2 => $genome2,
                            -correspondences => $corrList,
                            -passive => 1
                        });

Submit a set of gene correspondences to be stored on the server.

parameter

The parameter should be a reference to a hash with the following keys.

-genome1

ID of the source genome for the correspondence.

-genome2

ID of the target genome for the correspondence.

-correspondences

Reference to a list of lists containing the correspondence data (see "Gene Correspondence List" in ServerThing).

-passive (optional)

If TRUE, then the file will not be stored if one already exists. If FALSE, an existing correspondence file will be overwritten. The default is FALSE.

RETURN

Returns TRUE (1) if the correspondences were successfully stored, FALSE (0) if they were rejected or an error occurred.

taxonomy_of

    my $genomeHash =    $sapObject->taxonomy_of({
                            -ids => [$genome1, $genome2, ...],
                            -format => 'numbers'
                        });

Return the taxonomy of each specified genome. The taxonomy will start at the domain level and moving down to the node where the genome is attached.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of genome IDs. A taxonomy will be generated for each specified genome.

-format (optional)

Format for the elements of the taxonomy string. If numbers, then each taxonomy element will be represented by its number; if names, then each taxonomy element will be represented by its primary name; if both, then each taxonomy element will be represented by a number followed by the name. The default is names.

RETURN

Returns a reference to a hash mapping incoming genome IDs to taxonomies. Each taxonomy will be a list of strings, starting from the domain and ending with the genome.

Normal Output
    $genomeHash = { $genome1 => [$name1a, $name1b, ...],
                    $genome2 => [$name2a, $name2b, ...],
                    ... };
Output if -format = numbers
    $genomeHash = { $genome1 => [$num1a, $num1b, ...],
                    $genome2 => [$num2a, $num2b, ...],
                    ... };
Output if =format = both
    $genomeHash = { $genome1 => ["$num1a $name1a", "$num1b $name1b", ...],
                    $genome2 => ["$num2a $name2a", "$num2b $name2b", ...],
                    ... };

Scenario Data Methods

scenario_names

    my $scenarioHash =      $sapObject->scenario_names({
                                -subsystem => $subsys1
                            });

Return the names of all the scenarios for the specified subsystem. Each scenario has an internal ID number and a common name. This method returns both.

parameter

The parameter should be a reference to a hash with the following keys.

-subsystem

Name of the subsystem whose scenarios are desired.

RETURN

Returns a hash mapping the ID numbers of the subsystem's scenarios to their common names.

    $scenarioHash = { $id1 => $name1, $id2 => $name2, ... };

Subsystem Data Methods

all_subsystems

    my $subsysHash =        $sapObject->all_subsystems({
                                -usable => 1,
                                -exclude => [$type1, $type2, ...],
                                -aux => 1
                            });

Return a list of all subsystems in the system. For each subsystem, this method will return the ID, curator, the classifications, and roles.

parameter

The parameter should be a reference to a hash with the following keys, all of which are optional. Because all of the keys are optional, it is permissible to pass an empty hash or no parameters at all.

-usable (optional)

If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.

-exclude (optional)

Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based and experimental. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable option is turned off.

-aux (optional)

If TRUE, then auxiliary roles will be included in the output. The default is FALSE, meaning they will be excluded.

RETURN

Returns a hash mapping each subsystem ID to a 3-tuple consisting of (0) the name of the curator, (1) a reference to a list of the subsystem classifications, and (2) a reference to a list of the subsystem's roles.

    $subsysHash = { $sub1 => [$curator1, [$class1a, $class1b, ...], [$role1a, $role1b, ...]],
                    $sub2 => [$curator2, [$class2a, $class2b, ...], [$role2a, $role2b, ...]],
                    ... };

classification_of

    my $subsysHash =        $sapObject->classification_of({
                                -ids => [$sub1, $sub2, ...]
                            });

Return the classification for each specified subsystem.

parameter

Reference to a hash of parameters with the following possible keys.

-ids

Reference to a list of subsystem IDs.

RETURN

Returns a hash mapping each incoming subsystem ID to a list reference. Each list contains the classification names in order from the largest classification to the most detailed.

    $subsysHash = { $sub1 => [$class1a, $class1b, ...],
                    $sub2 => [$class2a, $class2b, ...],
                    ... };

genomes_to_subsystems

    my $genomeHash =        $sapObject->genomes_to_subsystems({
                                -ids => [$genome1, $genome2, ...],
                                -all => 1,
                                -usable => 0,
                                -exclude => ['cluster-based', 'experimental', ...]
                            });

Return a list of the subsystems participated in by each of the specified genomes.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the genome IDs.

-all (optional)

If TRUE, all subsystems will be returned, including those in which the genome does not appear to implement the subsystem and those in which the subsystem implementation is incomplete. The default is FALSE, in which case only subsystems that are completely implemented by the genome will be returned.

-usable (optional)

If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.

-exclude (optional)

Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based and experimental. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable option is turned off.

RETURN

Returns a hash mapping each genome ID to a list of 2-tuples. Each 2-tuple will contain a subsystem name followed by a variant code.

    $genomeHash = { $genome1 => [[$sub1a, $variantCode1a], [$sub1b, $variantCode1b], ...],
                    $genome2 => [[$sub2a, $variantCode2a], [$sub2b, $variantCode2b], ...],
                    ... };

get_subsystems

    my $subsysHash =        $sapObject->get_subsystems({
                                -ids => [$sub1, $sub2, ...]
                            });

Get a complete description of each specified subsystem. This will include the basic subsystem properties, the list of roles, and the spreadsheet.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of subsystem IDs.

RETURN

Returns a reference to a hash mapping each incoming subsystem ID to a sub-hash that completely describes the subsystem. The keys for the sub-hash are as follows.

curator

The name of the subsystem's curator.

version

The subsystem's current version number.

notes

The text of the subsystem notes.

desc

The description of the subsystem.

roles

Reference to a list of 3-tuples, one for each role in the subsystem. Each 3-tuple will contain (0) the role abbreviation, (1) 1 if the role is auxiliary and 0 otherwise, and (2) the ID (name) of the role.

spreadsheet

Reference to a list of 5-tuples. For each molecular machine implementing the subsystem, there is a 5-tuple containing (0) the target genome ID, (1) the relevant region string, (2) 1 if the molecular machine is curated and 0 if it was computer-assigned, (3) the variant code for the implemented variant, and (4) a reference to a list of sub-lists, one per role (in order), with each sub-list containing the IDs of all features performing that role.

    $subsysHash = { $sub1 =>
                        { curator => $curator1,
                          version => $version1,
                          notes => $notes1,
                          desc => $desc1,
                          roles => [[$abbr1a, $aux1a, $role1a],
                                    [$abbr1b, $aux1b, $role1b], ... ],
                          spreadsheet => [
                            [$genome1x, $region1x, $curated1x, $variant1x,
                                [[$fid1xa1, $fid1xa2, ...], [$fid1xb1, $fid1xb2, ...], ...]],
                            [$genome1y, $region1y, $curated1y, $variant1y,
                                [[$fid1ya1, $fid1ya2, ...], [$fid1yb1, $fid1yb2, ...], ...]],
                            ... ]
                        },
                    $sub2 =>
                        { curator => $curator2,
                          version => $version2,
                          notes => $notes2,
                          desc => $desc2,
                          roles => [[$abbr2a, $aux2a, $role2a],
                                    [$abbr2b, $aux2b, $role2b], ... ],
                          spreadsheet => [
                            [$genome2x, $region2x, $curated2x, $variant2x,
                                [[$fid2xa1, $fid2xa2, ...], [$fid2xb1, $fid2xb2, ...], ...]],
                            [$genome2y, $region2y, $curated1y, $variant1y,
                                [[$fid1ya1, $fid1ya2, ...], [$fid1yb1, $fid1yb2, ...], ...]],
                            ... ]
                        },

ids_in_subsystems

    my $subsysHash =        $sapObject->ids_in_subsystems({
                                -subsystems => [$sub1, $sub2, ...],
                                -genome => $genome1,
                                -grouped => 1,
                                -roleForm => 1,
                                -source => 'UniProt'
                            });

Return the features of each specified subsystems in the specified genome, or alternatively, return all features of each specified subsystem.

parameter

The parameter should be a reference to a hash with the following keys.

-subsystems

Reference to a list of the IDs for the desired subsystems.

-genome (optional)

ID of the relevant genome, or all to return the genes in all genomes for the subsystem. The default is all.

-grouped (optional)

If specified, then instead of being represented in a list, the feature IDs will be represented in a comma-delimited string.

-roleForm (optional)

If abbr, then roles will be represented by the role abbreviation; if full, then the role will be represented by its full name; if none, then roles will not be included and there will only be a single level of hashing-- by subsystem ID. The default is abbr.

-source (optional)

Database source for the output IDs-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. The default is SEED.

RETURN

Returns a hash mapping each subsystem ID to a sub-hash. Each sub-hash maps the roles of the subsystem to lists of feature IDs. The roles are recorded as role abbreviations.

Normal Output
    $subsysHash = { $sub1 => { $roleAbbr1A => [$fid1Ax, $fid1Ay, ...],
                               $roleAbbr1B => [$fid1Bx, $fid1By, ...],
                               ... },
                    $sub2 => { $roleAbbr2A => [$fid2Ax, $fid2Ay, ...],
                               $roleAbbr2B => [$fid2Bx, $fid2By, ...],
                               ... },
                    ... };
Output if -roleForm = full
    $subsysHash = { $sub1 => { $role1A => [$fid1Ax, $fid1Ay, ...],
                               $role1B => [$fid1Bx, $fid1By, ...],
                               ... },
                    $sub2 => { $role2A => [$fid2Ax, $fid2Ay, ...],
                               $role2B => [$fid2Bx, $fid2By, ...],
                               ... },
                    ... };
Output if -roleForm = none
    $subsysHash = { $sub1 => [$fid1a, $fid1b, ...],
                    $sub2 => [$fid2a, $fid2b, ...],
                    ... };

ids_to_publications

    my $featureHash =       $sapObject->ids_to_publications({
                                -ids => [$id1, $id2, ...],
                                -source => 'UniProt'
    });

Return the PUBMED ID and title of each publication relevant to the specified feature IDs.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of feature IDs. Normally, these are FIG feature IDs (e.g. fig|100226.1.peg.3361, fig|360108.3.peg.1041), but other ID types are permissible if the source parameter is overridden.

-source (optional)

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

-genome (optional)

ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.

RETURN

Returns a reference to a hash mapping feature IDs to lists of 2-tuples. Each 2-tuple consists of a PUBMED ID followed by a publication title.

    $featureHash = { $id1 => [[$pub1a, $title1a], [$pub1b, $title1b], ...],
                     $id2 => [[$pub2a, $title2a], [$pub2b, $title2b], ...],
                     ... };

ids_to_subsystems

    my $featureHash =       $sapObject->ids_to_subsystems({
                                -ids => [$id1, $id2, ...],
                                -usable => 0,
                                -exclude => ['cluster-based', 'private', ...],
                                -source => 'RefSeq',
                                -subsOnly => 1
                            });

Return the subsystem and role for each feature in the incoming list. A feature may have multiple roles in a subsystem and may belong to multiple subsystems, so the role/subsystem information is returned in the form of a list of ordered pairs for each feature.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of feature IDs. Normally, these are FIG feature IDs (e.g. fig|100226.1.peg.3361, fig|360108.3.peg.1041), but other ID types are permissible if the source parameter is overridden.

-usable (optional)

If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.

-exclude (optional)

Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based and experimental. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable option is turned off.

-source (optional)

Database source of the IDs specified-- SEED for FIG IDs, GENE for standard gene identifiers, or LocusTag for locus tags. In addition, you may specify RefSeq, CMR, NCBI, Trembl, or UniProt for IDs from those databases. Use mixed to allow mixed ID types (though this may cause problems when the same ID has different meanings in different databases). Use prefixed to allow IDs with prefixing indicating the ID type (e.g. uni|P00934 for a UniProt ID, gi|135813 for an NCBI identifier, and so forth). The default is SEED.

-genome (optional)

ID of a specific genome. If specified, results will only be returned for genes in the specified genome. The default is to return results for genes in all genomes.

-subsOnly (optional)

If TRUE, instead of a list of (role, subsystem) 2-tuples, each feature ID will be mapped to a simple list of subsystem names. The default is FALSE.

RETURN

Returns a reference to a hash mapping feature IDs to lists of 2-tuples. Each 2-tuple consists of a role name followed by a subsystem name. If a feature is not in a subsystem, it will not be present in the return hash.

Normal Output
    $featureHash = { $id1 => [[$role1a, $sub1a], [$role1b, $sub1b], ...],
                     $id2 => [[$role2a, $sub2a], [$role2b, $sub2b], ...],
                     ... };
Output if -subsOnly = 1
    $featureHash = { $id1 => [$sub1a, $sub1b, ...],
                     $id2 => [$sub2a, $sub2b, ...],
                     ... };

is_in_subsystem

    my $featureHash =       $sapObject->is_in_subsystem({
                                -ids => [$fid1, $fid2, ...],
                                -usable => 0,
                                -exclude => [$type1, $type2, ...]
                            });

Return the subsystem and role for each specified feature.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the FIG feature IDs for the features of interest.

-usable (optional)

If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.

-exclude (optional)

Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based and experimental. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable option is turned off.

For backward compatibility, the parameter may also be a reference to a list of FIG feature IDs.

RETURN

Returns a reference to a hash that maps each incoming feature ID to a list of 2-tuples, each 2-tuple consisting of (0) the ID of a subsystem containing the feature and (1) the feature's role in that subsystem. If an incoming feature is not in any subsystem, its ID will be mapped to an empty list.

    $featureHash = { $fid1 => [[$sub1a, $role1a], [$sub1b, $role1b], ...],
                     $fid2 => [[$sub2a, $role2a], [$sub2b, $role2b[, ...],
                     ... };

In backward-compatible mode, returns a reference to a list of 3-tuples, each 3-tuple consisting of (0) a subsystem ID, (1) a role ID, and (2) the ID of a feature from the input list.

is_in_subsystem_with

    my $featureHash =       $sapObject->is_in_subsystem_with({
                                -ids => [$fid1, $fid2, ...],
                                -usable => 0,
                                -exclude => [$type1, $type2, ...]
                            });

For each incoming feature, returns a list of the features in the same genome that are part of the same subsystem. For each other feature returned, its role, functional assignment, subsystem variant, and subsystem ID will be returned as well.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the FIG feature IDs for the features of interest.

-usable (optional)

If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.

-exclude (optional)

Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based and experimental. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable option is turned off.

For backward compatibility, the parameter may also be a reference to a list of FIG feature IDs.

RETURN

Returns a reference to a hash that maps each incoming feature ID to a list of 5-tuples relating to features in the same subsystem. Each 5-tuple contains (0) a subsystem ID, (1) a variant ID, (2) the related feature ID, (3) the related feature's functional assignment, and (4) the related feature's role in the subsystem.

    $featureHash = { $fid1 => [[$sub1a, $variant1a, $fid1a, $function1a, $role1a],
                               [$sub1b, $variant1b, $fid1b, $function1b, $role1b], ...],
                     $fid2 => [[$sub2a, $variant2a, $fid2a, $function2a, $role2a],
                               [$sub2b, $variant2b, $fid2b, $function2b, $role2b], ...],
                    ... };

In backward-compatibility mode, returns a reference to a list of lists. Each sub-list contains 6-tuples relating to a single incoming feature ID. Each 6-tuple consists of a subsystem ID, a variant ID, the incoming feature ID, the other feature ID, the other feature's functional assignment, and the other feature's role in the subsystem.

pegs_implementing_roles

    my $roleHash =          $sapObject->pegs_implementing_roles({
                                -subsystem => $subsysID,
                                -roles => [$role1, $role2, ...]
                            });

Given a subsystem and a list of roles, return a list of the subsystem's features for each role.

parameter

The parameter should be a reference to a hash with the following keys.

-subsystem

ID of a subsystem.

-roles

Reference to a list of roles.

For backward compatibility, the parameter can also be a reference to a 2-tuple consisting of (0) a subsystem ID and (1) a reference to a list of roles.

RETURN

Returns a hash that maps each role ID to a list of the IDs for the features that perform the role in that subsystem.

    $roleHash = { $role1 => [$fid1a, $fid1b, ...],
                  $role2 => [$fid2a, $fid2b, ...],
                  ... };

In backward-compatibility mode, returns a list of 2-tuples. Each tuple consists of a role and a reference to a list of the features in that role.

pegs_in_subsystems

    my $subsysHash =        $sapObject->pegs_in_subsystems({
                                -genomes => [$genome1, $genome2, ...],
                                -subsystems => [$sub1, $sub2, ...]
                            });

This method takes a list of genomes and a list of subsystems and returns a list of the roles represented in each genome/subsystem pair.

parameter

Reference to a hash of parameter values with the following possible keys.

-genomes

Reference to a list of genome IDs.

-subsystems

Reference to a list of subsystem IDs.

For backward compatibility, the parameter may also be a reference to a 2-tuple, the first element of which is a list of genome IDs and the second of which is a list of subsystem IDs.

RETURN

Returns a reference to a hash of hashes. The main hash is keyed by subsystem ID. Each subsystem's hash is keyed by role ID and maps the role to a list of the feature IDs for that role in the subsystem that belong to the specified genomes.

    $subsysHash = { $sub1 => { $role1A => [$fid1Ax, $fid1Ay, ...],
                               $role1B => [$fid1Bx, $fid1By, ...],
                               ... },
                    $sub2 => { $role2A => [$fid2Ax, $fid2Ay, ...],
                               $role2B => [$fid2Bx, $fid2By, ...],
                               ... },
                    ... };

In backward-compatibility mode, returns a list of 2-tuples. Each tuple consists of a subsystem ID and a second 2-tuple that contains a role ID and a reference to a list of the feature IDs for that role that belong to the specified genomes.

pegs_in_variants

    my $subsysHash =        $sapObject->pegs_in_variants({
                                -genomes => [$genomeA, $genomeB, ...],
                                -subsystems => [$sub1, $sub2, ...]
                            });

This method takes a list of genomes and a list of subsystems and returns a list of the pegs represented in each genome/subsystem pair.

The main difference between this method and "pegs_in_subsystems" is in the organization of the output, which is more like a subsystem spreadsheet.

parameter

Reference to a hash of parameter values with the following possible keys.

-genomes (optional)

Reference to a list of genome IDs. If the list is omitted, all genomes will be included in the output (which will be rather large in most cases).

-subsystems

Reference to a list of subsystem IDs.

RETURN

Returns a reference to a hash mapping subsystem IDs to sub-hashes. Each sub-hash is keyed by genome ID and maps the genome ID to a list containing the variant code and one or more n-tuples, each n-tuple containing a role ID followed by a list of the genes in the genome having that role in the subsystem.

    $subsysHash = { $sub1 => { $genomeA => [$vc1A,
                                            [$role1Ax, $fid1Ax1, $fid1Ax2, ...],
                                            [$role1Ay, $fid1Ay1, $fid1Ay2, ...],
                                            ...],
                               $genomeB => [$vc1B,
                                            [$role1Bx, $fid1Bx1, $fid1Bx2, ...],
                                            [$role1By, $fid1By1, $fid1By2, ...],
                                            ...],
                               ... },
                    $sub2 => { $genomeA => [$vc2A,
                                            [$role2Ax, $fid2Ax1, $fid2Ax2, ...],
                                            [$role2Ay, $fid2Ay1, $fid2Ay2, ...],
                                            ...],
                               $genomeB => [$vc2B,
                                            [$role2Bx, $fid2Bx1, $fid2Bx2, ...],
                                            [$role2By, $fid2By1, $fid2By2, ...],
                                            ...],
                               ... },
                    ... };

Note that in some cases the genome ID will include a region string. This happens when the subsystem has multiple occurrences in the genome.

roles_exist_in_subsystem

    my $rolesHash =         $sapObject->roles_exist_in_subsystem({
                                -subsystem => $sub1,
                                -roles => [$role1, $role2, ...]
                            });

Indicate which roles in a given list belong to a specified subsystem.

parameter

The parameter should be a reference to a hash with the following keys.

-subsystem

The name of the subsystem of interest.

-roles

A reference to a list of role IDs.

RETURN

Returns a reference to a hash mapping each incoming role ID to 1 if it exists in the specified subsystem and 0 otherwise.

    $roleHash = { $role1 => $flag1, $role2 => $flag2, ... };

roles_to_subsystems

    my $roleHash =              $sapObject->({
                                    -roles => [$role1, $role2, ...],
                                    -usable => 0
                                });

Return the subsystems containing each specified role.

parameter

The parameter should be a reference to a hash with the following keys.

-roles

Reference to a list of role names.

-usable (optional)

If TRUE, only usable subsystems will be returned. If FALSE, all subsystems will be returned. The defult is TRUE.

RETURN

Returns a reference to a hash mapping each incoming role to a list of the names of subsystems containing that role.

    $roleHash = { $role1 => [$sub1a, $sub1b, ...],
                  $role2 => [$sub2a, $sub2b, ...],
                  ... };

rows_of_subsystem

    my $subHash =               $sapObject->({
                                    -subs => [$sub1, $sub2, ...],
                                    -genomes => [$genomeA, $genomeB, ...],
                                    ...
                                });

Return the subsystem row for each subsystem/genome pair. A row in this case consists of a reference to a hash mapping role names to a list of the FIG feature IDs for the features in the genome performing that role.

In the Sapling database, a subsystem row is represented by the MolecularMachine entity. The strategy of this method is therefore to find the molecular machine for each subsystem/genome pair, and then use its ID to get the roles and features.

parameter

The parameter should be a reference to a hash with the following keys.

-subs

Reference to a list of subsystem IDs.

-genomes

Reference to a list of genome IDs.

RETURN

Returns a reference to a hash mapping each incoming subsystem ID to a sub-hash keyed by genome ID. In the sub-hash, each genome ID will map to a sub-sub-hash that maps role names to lists of feature IDs.

    $subHash = { $sub1 => { $genomeA => { $role1Aa => [$fid1Aax, $fid1Aay, ... ],
                                          $role1Ab => [$fid1Abx, $fid1Aby, ... ],
                                          ... },
                            $genomeB => { $role1Ba => [$fid1Bax, $fid1Bay, ... ],
                                          $role1Bb => [$fid1Bbx, $fid1Bby, ... ],
                                          ... },
                            ... },
                 $sub2 => { $genomeA => { $role2Aa => [$fid2Aax, $fid2Aay, ... ],
                                          $role2Ab => [$fid2Abx, $fid2Aby, ... ],
                                          ... },
                            $genomeB => { $role2Ba => [$fid2Bax, $fid2Bay, ... ],
                                          $role2Bb => [$fid2Bbx, $fid2Bby, ... ],
                                          ... },
                            ... },
                 ... };

subsystem_data

    my $subsysHash =        $sapObject->subsystem_data({
                                -ids => [$sub1, $sub2, ...],
                                -field => 'version'
                            });

For each incoming subsystem ID, return the specified data field. This method can be used to find the curator, description, or version of the specified subsystems.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of subsystem IDs.

-field (optional)

Name of the desired data field-- curator to retrieve the name of each subsystem's curator, version to get the subsystem's version number, or description to get the subsystem's description, or notes to get the subsystem's notes. The default is description.

RETURN

Returns a hash mapping each incoming subsystem ID to the associated data value.

    $subsysHash = { $sub1 => $value1, $sub2 => $value2, ... };

subsystem_genomes

    my $subHash =           $sapObject->subsystem_genomes({
                                -ids => [$sub1, $sub2, ...],
                                -all => 1
                            });

For each subsystem, return the genomes that participate in it and their associated variant codes.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the names of the subsystems whose genome information is desired.

-all (optional)

If TRUE, then all genomes associated with the subsystem will be listed. The default is FALSE, meaning that only genomes that completely implement the subsystem will be listed.

RETURN

Returns a reference to a hash that maps each subsystem ID to a sub-hash. Each sub-hash in turn maps the ID of each subsystem that participates in the subsystem to its variant code.

    $subHash = { $sub1 => { $genome1a => $code1a, $genome1b => $code1b, ...},
                 $sub2 => { $genome2a => $code2a, $genome2b => $code2b, ...},
                 ... };

subsystem_names

    my $nameList =          $sapObject->subsystem_names({
                                -usable => 0,
                                -exclude => ['cluster-based', ...]
                            });

Return a list of all subsystems in the database.

parameter

The parameter should be a reference to a hash with the following keys.

-usable (optional)

If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.

-exclude (optional)

Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based and experimental. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable option is turned off.

RETURN

Returns a reference to a list of subsystem names.

    $nameList = [$sub1, $sub2, ...];

subsystem_roles

    my $subHash =           $sapObject->subsystem_roles({
                                -ids => [$sub1, $sub2, ...],
                                -aux => 1
    });

Return the list of roles for each subsystem, in order.

parameter

Reference to a hash of parameters with the following possible keys.

-ids

Reference to a list of subsystem IDs.

-aux (optional)

If TRUE, auxiliary roles will be included. The default is FALSE, which excludes auxiliary roles.

-abbr (optional)

If TRUE, then the role abbreviations will be included in the results. In this case, each subsystem name will be mapped to a list of 2-tuples, with each 2-tuple consisting of (0) the role name and (1) the role abbreviation. The default is FALSE (normal output).

RETURN

Return a hash mapping each subsystem ID to a list of roles (normal) or a list of role/abbreviation pairs (extended output).

Output if -abbr is FALSE
    $subHash = { $sub1 => [$role1a, $role1b, ...],
                 $sub2 => [$role2a, $role2b, ...],
                 ... };
Output if -abbr is TRUE
    $subHash = { $sub1 => [[$role1a, $abbr1a],
                           [$role1b, $abbr1b], ...],
                 $sub2 => [[$role2a, $abbr2a],
                           [$role2b, $abbr2b], ...],
                 ... };

subsystem_spreadsheet

    my $subsysHash =        $sapObject->subsystem_spreadsheet({
                                -ids => [$sub1, $sub2, ...]
                            });

This method takes a list of subsystem IDs, and for each one returns a list of the features in the subsystem. For each feature, it will include the feature's functional assignment, the subsystem name and variant (spreadsheet row), and its role (spreadsheet column).

parameter

Reference to a hash of parameters with the following possible keys.

-ids

Reference to a list of subsystem IDs.

For backward compatibility, this method can also accept a reference to a list of subsystem IDs.

RETURN

Returns a hash mapping each incoming subsystem ID to a list of 4-tuples. Each tuple contains (0) a variant ID, (1) a feature ID, (2) the feature's functional assignment, and (3) the feature's role in the subsystem.

    $subsysHash = { $sub1 => [[$variant1a, $fid1a, $function1a, $role1a],
                              [$variant1b, $fid1b, $function1b, $role1b], ...],
                    $sub2 => [[$variant2a, $fid2a, $function2a, $role2a],
                              [$variant2b, $fid2b, $function2b, $role2b], ...],
                    ... };

In backward-compatability mode, returns a list of 5-tuples. Each tuple contains (0) a subsystem ID, (1) a variant ID, (2) a feature ID, (3) the feature's functional assignment, and (4) the feature's role in the subsystem.

subsystem_type

    my $subsysHash =        $sapObject->subsystem_type({
                                -ids => [$sub1, $sub2, ...],
                                -type => 'cluster-based'
                            });

For each incoming subsystem, return TRUE if it has the specified characteristic, else FALSE.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of subsystem names.

-type

Name of the subsystem characteristic of interest. The default is usable. The possible characteristics are

cluster-based

A cluster-based subsystem is one in which there is functional-coupling evidence that genes belong together, but we do not yet know what they do.

experimental

An experimental subsystem is designed for investigation and is not yet ready to be used in comparative analysis and annotation.

private

A private subsystem has valid data, but is not considered ready for general distribution.

usable

An unusable subsystem is one that is experimental or is of such low quality that it can negatively affect analysis. A usable subsystem is one that is not unusable.

RETURN

Returns a hash mapping the incoming subsystem names to TRUE/FALSE flags indicating the value of the specified characteristic.

    $subsysHash = { $sub1 => $flag1, $sub2 => $flag2, ... };

subsystems_for_role

    my $roleHash =          $sapObject->subsystems_for_role({
                                -ids => [$role1, $role2, ...],
                                -usable => 1,
                                -exclude => ['cluster-based', ...]
                            });

For each role, return a list of the subsystems containing that role. The results can be filtered to include unusable subsystems or exclude subsystems of certain exotic types.

parameter

The parameter should be a reference to a hash with the following keys.

-ids

Reference to a list of the IDs of the roles of interest.

-aux (optional)

If TRUE, then subsystems in which the role is auxiliary will be included. The default is not to include such subsystems.

-usable (optional)

If TRUE, then only results from usable subsystems will be included. If FALSE, then results from all subsystems will be included. The default is TRUE.

-exclude (optional)

Reference to a list of special subsystem types that should be excluded from the result list. The permissible types are cluster-based and experimental. Normally cluster-based subsystems are included, but experimental subsystems are only included if the -usable option is turned off.

RETURN

Returns a reference to a hash that maps each incoming role ID to a list of subsystem names.

    $roleHash = { $role1 => [$ss1a, $ss1b, ...],
                  $role2 => [$ss2a, $ss2b, ...],
                  ... };