Server Scripts
All scripts read from the standard input and write to the standard output. File names are never specified on the command line. In general, they accept as input a tab-delimited file and operate on the last column of the input. This allows multiple commands to be strung together using a pipe.
- svr_CS_pipeline: Generate data needed to support close-strain analysis.
- svr_NCBI_taxonomy: Get taxonomy information from NCBI
- svr_ach_lookup: Find protein assertions from the Annotation Clearinghouse.
- svr_add_lengths_to_blast: Add query and contig lengths to m8 blast output
- svr_add_reaction_descriptions: Add a column giving a text reaction description (the reaction equation).
- svr_ali_to_html: Convert a FASTA alignment to HTML
- svr_aliases_of: Return all identifiers for genes in the database that are protein-sequence-equivalent to the specified identifiers. In this case, the identifiers are assumed to be in their natural form (without prefixes). For each identifier, the identified protein sequences will be found and then for each protein sequence, all identifiers for that protein sequence or for genes that produce that protein sequence will be returned.
- svr_aliases_to_pegs: Convert aliases to PEGs
- svr_align_seqs: This script takes a FASTA file from the standard input, aligns the
sequences using Clustal, MUSCLE or MAFFT, and writes the alignment in
the FASTA format to the standard output.
- svr_aligns_with_prot: Get the list of alignments associated with each specified protein or fid.
- svr_all_complexes: List the reaction complexes in the database.
- svr_all_experiments: List all the experiments present for a specified genome or for all genomes lists
in a file.
- svr_all_features: Get a list of Feature IDs for all features of a given type in a given genome
or a list of genomes.
- svr_all_figfams: List all the features in each FIGfam.
- svr_all_genomes: List the names and IDs of all the (complete) genomes.
- svr_all_models: List the existing metabolic models (and the genomes for which
they were built).
- svr_all_reactions: List the reactions IDs
- svr_all_roles_used_in_models: List the roles used in the existing metabolic models.
- svr_all_subsystems: Get all the subsystem names.
- svr_assign_to_dna_using_figfams: Assign Using the FIGfams Server
- svr_assign_using_figfams: Assign Using the FIGfams Server
- svr_atomic_reg_coexp: Get functions of protein-encoding genes
- svr_atomic_regulons: List all the atomic regulons computed for a specified genome or for all genomes lists
in a file.
- svr_big_repeats: Find regions that appear to be big repeats (at the DNA level). This can be done by looking for multiple copies of
identical DNA within a single genome or looking for instances of large repeats maintained as a Blast DB.
- svr_blast: Run blast locally
- svr_by_taxonomy: Separate a list by taxonomy
- svr_call_pegs: Call Genes using Annotation server.
- svr_call_rnas: Call RNAs using Annotation server.
- svr_cdd_scan: Scans protein sequences for conserved domain hits in NCBI CD-Database.
- svr_close_genomes: List the IDs of the genomes that are functionally close to the input genomes.
- svr_closest_genes: Locate genes in a specified genome containing the specified protein or DNA sequence.
- svr_cluster_locations: Cluster locations on the chromosome
- svr_cluster_pegs: Cluster PEGs that are close on the contig
- svr_co_occurrence_evidence: Displays instances in which homologs of the two specified PEGs co-occur or members of
two distinct FIGfams tend to co-occur. Thus you can say
- svr_cohesion_groups: This script classifies tips of a newick tree into cohesion groups
based on bootstrap values of tree branches.
- svr_compare_feature_tables: usage: svr_compare_feature_tables old_features.tab new_fatures.tab [summary.yaml] > comparison.tab 2> summary.txt
- svr_complex_to_reaction: Extend a set of complexes to include the associated reactions
- svr_contigs_in_genome: For each incoming genome ID, return the IDs of its contigs.
- svr_coregulated_by_correspondence: Get genes that have evidence of coexpression indirectly (i.e.,
it seems to exist between corresponding genes in one or more
other genomes with expression data).
- svr_corr_by_exp: Get genes that have similar expression profiles.
- svr_coupled_reactions: Takes as input a table containing reaction IDs and
adds a column giving the "adjacent" reactions.
- svr_create_set: Create a persistent set owned by owner
- svr_current_annotation: Get the functional role, annotator and timestamp of the current
annotation for a protein-encoding gene or RNA.
- svr_cut_domain: Clip domains out of a set of protein sequences
- svr_delete_from_set: Delete entries from a persistent set owned by owner
- svr_delete_set: Delete a persistent set owned by owner
- svr_determine_sets_of_related_contigs: This takes as input a list of contig IDs. What is the output?
- svr_discriminating_functions: Analyze two groups of genomes and return a list of the functions that discriminate
between them.
- svr_distinct_otus: Classify the incoming genome IDs into organism taxonomic units.
- svr_dlits: Get the list of publications associated with each specified protein or
gene.
- svr_dna_seq: Produce DNA strings for contigs, FIG feature IDs, and/or locations.
- svr_enumerate_sets_by_owner: List all the persistent sets owned by owner
- svr_evidence: Get evidence codes for protein-encoding genes
- svr_exp_genomes: List the names and IDs of all the genomes with expression data.
- svr_export_as_seed_dir: Export one or more genomes as SEED format directories.
- svr_expressed_genes_in_range: Compute the genes in the specified genomes that are expressed a particular fraction of the
time, where the fraction is a number between 0 and 1. The fraction is specified as a
range from a minimum to a maximum value. If the minimum is 1, then only genes expressed
all the time are returned. If the maximum is 0, then only genes that are never expressed
are returned.
- svr_fasta: Produce DNA or protein strings for genes.
- svr_fasta_to_md5: This script takes a FASTA file of protein sequences from the standard input and
writes a tab-delimited file of protein IDs to the standard output. Each output record will
correspond to a single FASTA input record and will contain the incoming ID
in the first column and the MD5 protein ID in the second column.
- svr_fc_figfams: Output the functionally coupled FIGfams By specifying a MinSc, you restrict the
output to functionally-coupled FIGfams that co-occur in at least n OTUs.
- svr_fids_for_md5: Given a set of md5 protein IDs, compute the FIG IDs of features that produce each
protein. This script takes as input a table containing md5 protein IDs and
adds a column containing the associated FIG feature IDs.
- svr_fids_to_ids: svr_fids_to_ids
- svr_fids_to_locations: Clusters from protein-encoding genes
- svr_fids_to_regulons: Return all the atomic regulons each feature belongs to.
- svr_figfam_fasta: Produce FASTA strings for FIGfams.
- svr_figfam_functions: Output the functions for the specified FIGfams.
- svr_figfams_to_ids: List the PEGs for each specified FIGfam ID on STDOUT.
- svr_file_to_spreadsheet: Writes the contents of the tab separated file on STDIN to a spreadsheet
- svr_find_clusters_relevant_to_reaction: Find clusters potentially relevant to a search for a "missing gene"
- svr_find_fused_genes: Find genes that are homologous to the query genes and prints a list
of fusions among them.
- svr_find_hypos_for_cluster: Get candidates for a specific role by finding genes with no real
assignment of function yet that are connected to a cluster. We will
consider a hypothetical "connected to a cluster" iff
- svr_find_protein: Output the FIG IDs and functional assignments of all features that produce a
specific protein. This script takes as input a single protein sequence on the
command-line or a FASTA file. It outputs a tab-delimited file connecting each
specified protein to its features.
- svr_find_regulatory_proteins: Find potential regulatory proteins
- svr_function_of: Get functions of protein-encoding genes
- svr_function_to_role: Convert functions to roles.
- svr_functionally_coupled: Get functionally_coupled neighbors (neighbors that tend to co-occur).
- svr_gap_filled_reactions_and_roles: Get the reactions and functional roles that were predicted by gap-filling
- svr_gene_data: Get one or more pieces of data about each specified gene.
- svr_generate_dna_samples: Create random samples of DNA from known genomes
- svr_genome_functions: List the location and functional assignment for each gene in a specified genome.
- svr_genome_of: Get genome of feature
- svr_genome_statistics: Get one or more pieces of data about each specified genome.
- svr_get_ali_and_tree: Get alignments and trees corresponding based on a
set of gene IDs.
- svr_get_all: Process a general query against the Sapling database.
- svr_get_coupling_data: Get functional coupling data for genes in a genome
- svr_get_default_dataset: Return the name of the default dataset currently installed in the figfams annotation server.
- svr_get_rep_genomes: Get a set of representative genomes using heuristics and the NCBI taxonomy
- svr_get_set: List all the entries in a persistent set owned by owner
- svr_identical_genomes: For each incoming genome ID, return the genome ID and name of each that has
an identical md5 hash.
- svr_ids_to_figfams: List the FIGfams for each specified gene ID on STDOUT. List on STDERR those lines where the id does not have a FIGFam.
- svr_ids_to_subsystems: List the subsystems for each specified gene ID STDOUT. List on STDERR those lines where the id does not have a subsystem.
- svr_img_analysis: Read an IMG genome directory and compare it to the corresponding Sapling genomes
(if any). The single positional parameter is the IMG genome directory name. Note
that the last level of the directory name must also be the IMG genome number.
In other words, if the directory name is B<~/genomes/IMG/637000001>, then
the genome name must be B<637000001>.
- svr_in_fasta: This little script just takes a fasta file of PEGs as input and
outputs a 3-column table [PEG,function,sequence].
- svr_in_runs: Make sequences of genes into operons.
- svr_inconsistent_sets: Separate out inconsistent sets
- svr_inherit_aliases: Cause a new genome to inherit aliases from an existing
genome for protein-encoding genes that are unique within each
genome and that have identical translations.
- svr_inherit_annotations: Cause a new genome to inherit annotations from an existing
genome for protein-encoding genes that are unique within each
genome and that have identical translations.
- svr_insert_seqs_into_alignment: This script takes a FASTA file of protein/DNA alignment from the
standard input and the name of a FASTA file of sequences to be
inserted from the command line and writes the resulting alignment to
the standard output. When not possible, a message will be written to
the standard error output.
- svr_intergenic_regions: List all the intergenic regions in the contigs for a specified genome.
- svr_is_hypo: Keep just hypotheticals
- svr_just_ends: Clip off the ends of a set of contigs
- svr_link_to_compare_regions: Get link to compare regions organized to show co-occuring genes
- svr_location_of: Get physical locations of genes.
- svr_make_pan_genome_prot_families: Construct the protein families needed to study Pan Genomes
- svr_mapped_genomes: Get maps between a reference genome and a set of genomes to which
you wish to compare the reference genome.
- svr_md5_of_prot: Get md5s of protein-encoding genes
- svr_members_of_otu: For each incoming genome ID, return the genome ID and name of each genome in the
same organism taxonomic unit.
- svr_metabolic_reconstruction: Get a metabolic reconstruction from a set of functional roles.
- svr_missing_roles: It is assumed that -r is used to specify a column in the input file.
The column should contain reaction IDs for which "missing roles" might exist.
The -g argument is used to specify the column containing the genome ID.
- svr_motif: This script identifies the conserved regions from a set of aligned sequences.
- svr_neighborhood_of_role: Find roles in metabolic-function neighborhood
- svr_neighboring_reactions: Takes as input a table containing reaction IDs and
adds 2 columns giving the distance and the connected reaction.
- svr_neighbors_of: Get neighbors of protein-encoding genes (PEGs)
- svr_oligomer_similarity: This command goes through an alignment and computes the pairwise fractions
of n-character identities, producing a matrix of values for each string length
specified in the -min to -max parameters.
- svr_otus: List the names and IDs of all the representative genomes for the organism taxonomic units in
the system.
- svr_pegs_in_subsystems: Return all genes in one or more subsystems found in one or more genomes.
- svr_pegs_with_evcode: Get pegs that have a given evcode (along with all of the peg's evcodes).
- svr_possible_joins: Given kmer hits on ends of contigs, just group the hits having the same functions
to support finding cases in which genes might span contigs.
- svr_preferred_roles: Get preferred roles, translating functions into more
acceptable forms
- svr_project_by_sr: Get corresponding genes.
- svr_project_model: Project atomic regulons from model to new genome.
- svr_protein_assertions: Get a list of Annotation Clearinghouse assertions for the specified proteins.
- svr_psiblast_search: This script takes a FASTA file of trimmed protein sequence alignment,
uses PSIBLAST to search against the protein database of complete
genomes, and writes the extracted regions of hits to the standard output.
- svr_put_set: Add entries to a persistent set owned by owner
- svr_quick_assign: Use staged-kmers (new, followed by old on hypotheticals)
- svr_rRNA: Get 16S rRNAs of genomes
- svr_reaction_description: This simple utility gives the reaction associated with reaction IDs.
- svr_reactions_in_model: Takes as input a table containing model IDs and
adds a column giving a reaction in the model. Since each model contains
hundreds of reactions, the output file will be extremely large compared to the
input file.
- svr_reactions_to_roles: Takes as input a table containing reaction IDs and
adds a column giving the roles that implement the reactions
- svr_regulons_to_fids: Return all the atomic regulons each feature belongs to.
- svr_representative_sequences: usage: representative_sequences [ opts ] [ rep_seqs_0 ] < new_seqs > rep_seqs
- svr_reroot_tree: Reroot a tree at a different node or a point on an internal arc.
- svr_role_to_complex: Extend a set of roles to include the triggered complexes
- svr_role_to_pegs: Get PEGs that implement a given set of functional roles
- svr_roles_to_reactions: Get the reactions potentially supported by a set of functional roles
- svr_roles_to_subsys: Extend a set of roles to include the subsystems and category data
- svr_seed_to_table: Extract a tab-separated feature table from a SEED Genome Directory.
Table format is:
- svr_sets_by_owner: List all the persistent sets owned by owner
- svr_similar_to: Get similarities for a PEG
- svr_sims2html: Build an HTML page or table from one or more tables of pairwise similarities.
- svr_sketch_tree: This little utility invokes a tree "printing" utility Gary Olsen wrote.
It has a rich set of options. We suggest a default usage of -m and -a.
Thus,
- svr_sphinx_indexing: Use sphinx indexes to match a keyword query (returning a table [peg,weight,annotation]
By default it returns PEGs from pubSEED. The -c option supports coreSEED instead.
------
- svr_split_fams: Detect cases in which BBHs have led to cycles
- svr_spreadsheet_to_file: Writes the contents of the spreadsheet given in filename to a tab separated file on STDOUT.
- svr_ss_classes: List the classifications for a specified subsystem or for all subsystems in a file.
- svr_subsystem_classification: Extend a set of subsystems names to classifications
- svr_subsystem_genome_data: Output the features, variants, and roles for one or more subsystems, optionally
filtered by genome ID.
- svr_subsystem_genomes: Output the genomes of a subsystem.
- svr_subsystem_roles: Output the roles of a subsystem.
- svr_subsystem_spreadsheet: Output a subsystem's spreadsheet.
- svr_subsystems_to_roles: This take a table in. One of the columns contains subsystem names. For
each subsystem, a set of lines is output. The set will be
- svr_summarize_MG_output: This simple program produces two summaries: one of the functions identified
and one of the OTUs identified. We represent OTUs with a representative
organism. The function summary is sent to stdout, while the OTU summary
is sent to stderr.
- svr_summarize_contigs: For each incoming genome ID, return statistics about its contigs.
- svr_summarize_protein_families: Write out three simple reports relating to a proposed set of protein families.
- svr_taxonomically_related_genomes: Get a list of genomes that are taxonomically related to the input genomes.
- svr_taxonomy: Get taxonomy of genomes
- svr_tmpred_predictions: Estimate transmembrane domains
- svr_translations_of: Get translations from ids
- svr_tree: This script uses fasttree, PhyML or RAxML to build a
maximum-likelihood tree from a FASTA alignment, or evaluates the
likelihood of an input tree against a given alignment.
- svr_tree_to_html: This script converts a newick tree into an HTML page. It has a rich
set of options.
- svr_trim_ali: This script takes a FASTA file of aligned sequences, trims the
alignment by running PSIBLAST against the sequences themselves, and
writes the trimmed alignment to the standard output.
- svr_upstream: Retrieve upstream regions from the Sapling Server.
- svr_which_genus_species: Try to identify genus and species of DNA fragments
- svr_with_close_blast_hits: Determine which of the input PEGs have blastX hits to a given DB "close".