Server Scripts
All scripts read from the standard input and write to the standard output. File names are never specified on the command line. In general, they accept as input a tab-delimited file and operate on the last column of the input. This allows multiple commands to be strung together using a pipe.
- svr_NCBI_taxonomy: Get taxonomy information from NCBI
- svr_ach_lookup: Find protein assertions from the Annotation Clearinghouse.
- svr_add_lengths_to_blast: Add query and contig lengths to m8 blast output
- svr_ali_to_html: Convert a FASTA alignment to HTML
- svr_aliases_of: Return all identifiers for genes in the database that are protein-sequence-equivalent to the specified identifiers. In this case, the identifiers are assumed to be in their natural form (without prefixes). For each identifier, the identified protein sequences will be found and then for each protein sequence, all identifiers for that protein sequence or for genes that produce that protein sequence will be returned.
- svr_aliases_to_pegs: Convert aliases to PEGs
- svr_align_seqs: This script takes a FASTA file from the standard input, aligns the
sequences using Clustal, MUSCLE or MAFFT, and writes the alignment in
the FASTA format to the standard output.
- svr_all_complexes: List the reaction complexes in the database.
- svr_all_compounds: svr_all_compounds [options] [< modelIDFile] > output
- svr_all_experiments: List all the experiments present for a specified genome or for all genomes lists
in a file.
- svr_all_features: Get a list of Feature IDs for all features of a given type in a given genome
or a list of genomes.
- svr_all_figfams: List all the features in each FIGfam.
- svr_all_genomes: List the names and IDs of all the (complete) genomes.
- svr_all_models: List the existing metabolic models (and the genomes for which
they were built).
- svr_all_reactions: List the reactions IDs
- svr_all_roles_used_in_models: List the roles used in the existing metabolic models.
- svr_all_subsystems: Get all the subsystem names.
- svr_assign_to_dna_using_figfams: Assign Using the FIGfams Server
- svr_assign_using_figfams: Assign Using the FIGfams Server
- svr_atomic_regulons: List all the atomic regulons computed for a specified genome or for all genomes lists
in a file.
- svr_big_repeats: Find regions that appear to be big repeats (at the DNA level). This can be done by looking for multiple copies of
identical DNA within a single genome or looking for instances of large repeats maintained as a Blast DB.
- svr_blast: Run blast locally
- svr_by_taxonomy: Separate a list by taxonomy
- svr_call_pegs: Call Genes using Annotation server.
- svr_call_rnas: Call RNAs using Annotation server.
- svr_cdd_scan: Scans protein sequences for conserved domain hits in NCBI CD-Database.
- svr_close_genomes: List the IDs of the genomes that are functionally close to the input genomes.
- svr_closest_genes: Locate genes in a specified genome containing the specified protein or DNA sequence.
- svr_cluster_pegs: Cluster PEGs that are close on the contig
- svr_co_occurrence_evidence: Displays instances in which homologs of the two specified PEGs co-occur or members of
two distinct FIGfams tend to co-occur. Thus you can say
- svr_cohesion_groups: This script classifies tips of a newick tree into cohesion groups
based on bootstrap values of tree branches.
- svr_compare_feature_tables: usage: svr_compare_feature_tables old_features.tab new_fatures.tab [summary.yaml] > comparison.tab 2> summary.txt
- svr_complex_to_reaction: Extend a set of complexes to include the associated reactions
- svr_contigs_in_genome: For each incoming genome ID, return the IDs of its contigs.
- svr_coregulated_by_correspondence: Get genes that have evidence of coexpression indirectly (i.e.,
it seems to exist between corresponding genes in one or more
other genomes with expression data).
- svr_corr_by_exp: Get genes that have similar expression profiles.
- svr_coupled_reactions: Takes as input a table containing reaction IDs and
adds a column giving the "adjacent" reactions.
- svr_create_set: Create a persistent set owned by owner
- svr_current_annotation: Get the functional role, annotator and timestamp of the current
annotation for a protein-encoding gene or RNA.
- svr_cut_domain: Clip domains out of a set of protein sequences
- svr_delete_from_set: Delete entries from a persistent set owned by owner
- svr_delete_set: Delete a persistent set owned by owner
- svr_determine_sets_of_related_contigs: This takes as input a list of contig IDs. What is the output?
- svr_discriminating_functions: Analyze two groups of genomes and return a list of the functions that discriminate
between them.
- svr_distinct_otus: Classify the incoming genome IDs into organism taxonomic units.
- svr_dlits: Get the list of publications associated with each specified protein or
gene.
- svr_dna_seq: Produce DNA strings for contigs, FIG feature IDs, and/or locations.
- svr_enumerate_sets_by_owner: List all the persistent sets owned by owner
- svr_evidence: Get evidence codes for protein-encoding genes
- svr_exp_genomes: List the names and IDs of all the genomes with expression data.
- svr_export_as_seed_dir: Export one or more genomes as SEED format directories.
- svr_expressed_genes_in_range: Compute the genes in the specified genomes that are expressed a particular fraction of the
time, where the fraction is a number between 0 and 1. The fraction is specified as a
range from a minimum to a maximum value. If the minimum is 1, then only genes expressed
all the time are returned. If the maximum is 0, then only genes that are never expressed
are returned.
- svr_fasta: Produce DNA or protein strings for genes.
- svr_fasta_to_md5: This script takes a FASTA file of protein sequences from the standard input and
writes a tab-delimited file of protein IDs to the standard output. Each output record will
correspond to a single FASTA input record and will contain the incoming ID
in the first column and the MD5 protein ID in the second column.
- svr_fc_figfams: Output the functionally coupled FIGfams By specifying a MinSc, you restrict the
output to functionally-coupled FIGfams that co-occur in at least n OTUs.
- svr_fids_for_md5: Given a set of md5 protein IDs, compute the FIG IDs of features that produce each
protein. This script takes as input a table containing md5 protein IDs and
adds a column containing the associated FIG feature IDs.
- svr_fids_to_regulons: Return all the atomic regulons each feature belongs to.
- svr_figfam_fasta: Produce FASTA strings for FIGfams.
- svr_figfam_functions: Output the functions for the specified FIGfams.
- svr_figfams_to_ids: List the PEGs for each specified FIGfam ID on STDOUT.
- svr_file_to_spreadsheet: Writes the contents of the tab separated file on STDIN to a spreadsheet
- svr_find_clusters_relevant_to_reaction: Find clusters potentially relevant to a search for a "missing gene"
- svr_find_fused_genes: Find genes that are homologous to the query genes and prints a list
of fusions among them.
- svr_find_hypos_for_cluster: Get candidates for a specific role by finding genes with no real
assignment of function yet that are connected to a cluster. We will
consider a hypothetical "connected to a cluster" iff
- svr_find_protein: Output the FIG IDs and functional assignments of all features that produce a
specific protein. This script takes as input a single protein sequence on the
command-line or a FASTA file. It outputs a tab-delimited file connecting each
specified protein to its features.
- svr_find_regulatory_proteins: Find potential regulatory proteins
- svr_function_of: Get functions of protein-encoding genes
- svr_function_to_role: Convert functions to roles.
- svr_functionally_coupled: Get functionally_coupled neighbors (neighbors that tend to co-occur).
- svr_gap_filled_reactions_and_roles: Get the reactions and functional roles that were predicted by gap-filling
- svr_gene_data: Get one or more pieces of data about each specified gene.
- svr_genome_functions: List the location and functional assignment for each gene in a specified genome.
- svr_genome_of: Get genome of feature
- svr_genome_statistics: Get one or more pieces of data about each specified genome.
- svr_get_ali_and_tree: Get alignments and trees corresponding based on a
set of gene IDs.
- svr_get_compound_data: svr_get_compound_data [options] [< compoundIdFile] > output
- svr_get_coupling_data: Get functional coupling data for genes in a genome
- svr_get_default_dataset: Return the name of the default dataset currently installed in the figfams annotation server.
- svr_get_model_data: svr_get_model_data [options] [< modelIdFile] > output
- svr_get_reaction_data: svr_get_reaction_data [options] [< reactionIdFile] > output
- svr_get_set: List all the entries in a persistent set owned by owner
- svr_ids_to_figfams: List the FIGfams for each specified gene ID on STDOUT. List on STDERR those lines where the id does not have a FIGFam.
- svr_ids_to_subsystems: List the subsystems for each specified gene ID STDOUT. List on STDERR those lines where the id does not have a subsystem.
- svr_img_analysis: Read an IMG genome directory and compare it to the corresponding Sapling genomes
(if any). The single positional parameter is the IMG genome directory name. Note
that the last level of the directory name must also be the IMG genome number.
In other words, if the directory name is B<~/genomes/IMG/637000001>, then
the genome name must be B<637000001>.
- svr_in_runs: Make sequences of genes into operons.
- svr_inconsistent_sets: Separate out inconsistent sets
- svr_inherit_aliases: Cause a new genome to inherit aliases from an existing
genome for protein-encoding genes that are unique within each
genome and that have identical translations.
- svr_inherit_annotations: Cause a new genome to inherit annotations from an existing
genome for protein-encoding genes that are unique within each
genome and that have identical translations.
- svr_insert_seqs_into_alignment: This script takes a FASTA file of protein/DNA alignment from the
standard input and the name of a FASTA file of sequences to be
inserted from the command line and writes the resulting alignment to
the standard output. When not possible, a message will be written to
the standard error output.
- svr_intergenic_regions: List all the intergenic regions in the contigs for a specified genome.
- svr_is_hypo: Keep just hypotheticals
- svr_just_ends: Clip off the ends of a set of contigs
- svr_link_to_compare_regions: Get link to compare regions organized to show co-occuring genes
- svr_location_of: Get physical locations of genes.
- svr_make_pan_genome_prot_families: Construct the protein families needed to study Pan Genomes
- svr_mapped_genomes: Get maps between a reference genome and a set of genomes to which
you wish to compare the reference genome.
- svr_md5_of_prot: Get md5s of protein-encoding genes
- svr_members_of_otu: For each incoming genome ID, return the genome ID and name of each genome in the
same organism taxonomic unit.
- svr_metabolic_reconstruction: Get a metabolic reconstruction from a set of functional roles.
- svr_missing_roles: It is assumed that -r is used to specify a column in the input file.
The column should contain reaction IDs for which "missing roles" might exist.
The -g argument is used to specify the column containing the genome ID.
- svr_model_build: svr_model_build genomeId [options]
- svr_model_data: svr_model_data [options] model_id > output
- svr_model_diff: svr_model_diff [options] modelOne modelTwo > output
- svr_model_stats: svr_model_stats [options] [model_id ...] > output
- svr_model_status: svr_model_status [options] [model_id ...] > output
- svr_motif: This script identifies the conserved regions from a set of aligned sequences.
- svr_my_models: svr_my_models [options] > output
- svr_neighborhood_of_role: Find roles in metabolic-function neighborhood
- svr_neighboring_reactions: Takes as input a table containing reaction IDs and
adds 2 columns giving the distance and the connected reaction.
- svr_neighbors_of: Get neighbors of protein-encoding genes (PEGs)
- svr_oligomer_similarity: This command goes through an alignment and computes the pairwise fractions
of n-character identities, producing a matrix of values for each string length
specified in the -min to -max parameters.
- svr_otus: List the names and IDs of all the representative genomes for the organism taxonomic units in
the system.
- svr_pegs_in_subsystems: Return all genes in one or more subsystems found in one or more genomes.
- svr_pegs_with_evcode: Get pegs that have a given evcode (along with all of the peg's evcodes).
- svr_possible_joins: Given kmer hits on ends of contigs, just group the hits having the same functions
to support finding cases in which genes might span contigs.
- svr_protein_assertions: Get a list of Annotation Clearinghouse assertions for the specified proteins.
- svr_psiblast_search: This script takes a FASTA file of trimmed protein sequence alignment,
uses PSIBLAST to search against the protein database of complete
genomes, and writes the extracted regions of hits to the standard output.
- svr_put_set: Add entries to a persistent set owned by owner
- svr_rRNA: Get 16S rRNAs of genomes
- svr_reaction_description: This simple utility gives the reaction associated with reaction IDs.
- svr_reactions_in_model: Takes as input a table containing model IDs and
adds a column giving a reaction in the model. Since each model contains
hundreds of reactions, the output file will be extremely large compared to the
input file.
- svr_reactions_to_roles: Takes as input a table containing reaction IDs and
adds a column giving the roles that implement the reactions
- svr_regulons_to_fids: Return all the atomic regulons each feature belongs to.
- svr_representative_sequences: usage: representative_sequences [ opts ] [ rep_seqs_0 ] < new_seqs > rep_seqs
- svr_reroot_tree: Reroot a tree at a different node or a point on an internal arc.
- svr_retreive_model: Retrieves reaction data for the selected metabolic model
- svr_role_to_complex: Extend a set of roles to include the triggered complexes
- svr_role_to_pegs: Get PEGs that implement a given set of functional roles
- svr_roles_to_reactions: Extend a set of roles to include the associated reactions
- svr_roles_to_subsys: Extend a set of roles to include the subsystems and category data
- svr_run_gene_activity_simulation: Identify gene calls that are inconsistent with model simulations.
------
Example: svr_run_gene_activity_simulation -infile GeneCalls-158878.1.txt -media Complete -output Results-158878.1.txt
------
Produces a file called "Results-158878.1.txt" with biomass, fluxes, and gene call consististency:
Label Experiment 1
Description pH7
Media Complete
Model Seed158878.1
Genome 158878.1
Fluxes rxn00001:1.321;rxn00002:-1.321....
Biomass 4.581
peg.1 (call status)/(model status)
peg.2 on/nonmetabolic
...
------
Called options: on, off, unknown
Model options: on, off, nonmetabolic, essential, inactive, nonfunctional
------
=head2 Command-Line Options
=over 4
=item url
The URL for the Sapling server, if it is to be different from the default.
=item complete
If TRUE, only complete genomes will be returned. The default is FALSE (return all genomes).
=back
=head2 Output Format
The standard output is a file where each line contains a genome name and a genome ID.
=cut
my $usage = 'usage: svr_run_gene_activity_simulation --jobid 1549 --infile GeneCalls-158878.1.txt --genome 158878.1 --model Seed158878.1 --outfile Results-83333.1.txt --url http://www.theseed.org/ --user reviewer --password reviewer'."\n";
my $infile = '';
my $genome = '';
my $model = '';
my $outfile = '';
my $url = '';
my $user = '';
my $password = '';
my $jobid = '';
my $opted = GetOptions('jobid:s' => \$jobid,'infile:s' => \$infile,'genome:s' => \$genome,'model:s' => \$model,'outfile:s' => \$outfile,'user:s' => \$user,'url:s' => \$url,'password:s' => \$password);
if (!$opted) {
- svr_seed_to_table: Extract a tab-separated feature table from a SEED Genome Directory.
Table format is:
- svr_sets_by_owner: List all the persistent sets owned by owner
- svr_similar_to: Get similarities for a PEG
- svr_sims2html: Build an HTML page or table from one or more tables of pairwise similarities.
- svr_sketch_tree: This little utility invokes a tree "printing" utility Gary Olsen wrote.
It has a rich set of options. We suggest a default usage of -m and -a.
Thus,
- svr_split_fams: Detect cases in which BBHs have led to cycles
- svr_spreadsheet_to_file: Writes the contents of the spreadsheet given in filename to a tab separated file on STDOUT.
- svr_ss_classes: List the classifications for a specified subsystem or for all subsystems in a file.
- svr_subsystem_classification: Extend a set of subsystems names to classifications
- svr_subsystem_genome_data: Output the features, variants, and roles for one or more subsystems, optionally
filtered by genome ID.
- svr_subsystem_genomes: Output the genomes of a subsystem.
- svr_subsystem_roles: Output the roles of a subsystem.
- svr_subsystem_spreadsheet: Output a subsystem's spreadsheet.
- svr_summarize_MG_output: This simple program produces two summaries: one of the functions identified
and one of the OTUs identified. We represent OTUs with a representative
organism. The function summary is sent to stdout, while the OTU summary
is sent to stderr.
- svr_summarize_contigs: For each incoming genome ID, return statistics about its contigs.
- svr_summarize_protein_families: Write out three simple reports relating to a proposed set of protein families.
- svr_taxonomically_related_genomes: Get a list of genomes that are taxonomically related to the input genomes.
- svr_taxonomy: Get taxonomy of genomes
- svr_translations_of: Get translations from ids
- svr_tree: This script uses fasttree, PhyML or RAxML to build a
maximum-likelihood tree from a FASTA alignment, or evaluates the
likelihood of an input tree against a given alignment.
- svr_tree_to_html: This script converts a newick tree into an HTML page. It has a rich
set of options.
- svr_trim_ali: This script takes a FASTA file of aligned sequences, trims the
alignment by running PSIBLAST against the sequences themselves, and
writes the trimmed alignment to the standard output.
- svr_upstream: Retrieve upstream regions from the Sapling Server.
- svr_with_close_blast_hits: Determine which of the input PEGs have blastX hits to a given DB "close".