Server Scripts

All scripts read from the standard input and write to the standard output. File names are never specified on the command line. In general, they accept as input a tab-delimited file and operate on the last column of the input. This allows multiple commands to be strung together using a pipe.

svr_CS_pipeline: Generate data needed to support close-strain analysis.
svr_NCBI_taxonomy: Get taxonomy information from NCBI
svr_ach_lookup: Find protein assertions from the Annotation Clearinghouse.
svr_add_lengths_to_blast: Add query and contig lengths to m8 blast output
svr_add_reaction_descriptions: Add a column giving a text reaction description (the reaction equation).
svr_ali_to_html: Convert a FASTA alignment to HTML
svr_aliases_of: Return all identifiers for genes in the database that are protein-sequence-equivalent to the specified identifiers. In this case, the identifiers are assumed to be in their natural form (without prefixes). For each identifier, the identified protein sequences will be found and then for each protein sequence, all identifiers for that protein sequence or for genes that produce that protein sequence will be returned.
svr_aliases_to_pegs: Convert aliases to PEGs
svr_align_seqs: This script takes a FASTA file from the standard input, aligns the sequences using Clustal, MUSCLE or MAFFT, and writes the alignment in the FASTA format to the standard output.
svr_aligns_with_prot: Get the list of alignments associated with each specified protein or fid.
svr_all_complexes: List the reaction complexes in the database.
svr_all_experiments: List all the experiments present for a specified genome or for all genomes lists in a file.
svr_all_features: Get a list of Feature IDs for all features of a given type in a given genome or a list of genomes.
svr_all_figfams: List all the features in each FIGfam.
svr_all_genomes: List the names and IDs of all the (complete) genomes.
svr_all_models: List the existing metabolic models (and the genomes for which they were built).
svr_all_reactions: List the reactions IDs
svr_all_roles_used_in_models: List the roles used in the existing metabolic models.
svr_all_subsystems: Get all the subsystem names.
svr_assign_to_dna_using_figfams: Assign Using the FIGfams Server
svr_assign_using_figfams: Assign Using the FIGfams Server
svr_atomic_reg_coexp: Get functions of protein-encoding genes
svr_atomic_regulons: List all the atomic regulons computed for a specified genome or for all genomes lists in a file.
svr_big_repeats: Find regions that appear to be big repeats (at the DNA level). This can be done by looking for multiple copies of identical DNA within a single genome or looking for instances of large repeats maintained as a Blast DB.
svr_blast: Run blast locally
svr_by_taxonomy: Separate a list by taxonomy
svr_call_pegs: Call Genes using Annotation server.
svr_call_rnas: Call RNAs using Annotation server.
svr_cdd_scan: Scans protein sequences for conserved domain hits in NCBI CD-Database.
svr_close_genomes: List the IDs of the genomes that are functionally close to the input genomes.
svr_closest_genes: Locate genes in a specified genome containing the specified protein or DNA sequence.
svr_cluster_locations: Cluster locations on the chromosome
svr_cluster_pegs: Cluster PEGs that are close on the contig
svr_co_occurrence_evidence: Displays instances in which homologs of the two specified PEGs co-occur or members of two distinct FIGfams tend to co-occur. Thus you can say
svr_cohesion_groups: This script classifies tips of a newick tree into cohesion groups based on bootstrap values of tree branches.
svr_compare_feature_tables: usage: svr_compare_feature_tables old_features.tab new_fatures.tab [summary.yaml] > comparison.tab 2> summary.txt
svr_complex_to_reaction: Extend a set of complexes to include the associated reactions
svr_contigs_in_genome: For each incoming genome ID, return the IDs of its contigs.
svr_coregulated_by_correspondence: Get genes that have evidence of coexpression indirectly (i.e., it seems to exist between corresponding genes in one or more other genomes with expression data).
svr_corr_by_exp: Get genes that have similar expression profiles.
svr_coupled_reactions: Takes as input a table containing reaction IDs and adds a column giving the "adjacent" reactions.
svr_create_set: Create a persistent set owned by owner
svr_current_annotation: Get the functional role, annotator and timestamp of the current annotation for a protein-encoding gene or RNA.
svr_cut_domain: Clip domains out of a set of protein sequences
svr_delete_from_set: Delete entries from a persistent set owned by owner
svr_delete_set: Delete a persistent set owned by owner
svr_determine_sets_of_related_contigs: This takes as input a list of contig IDs. What is the output?
svr_discriminating_functions: Analyze two groups of genomes and return a list of the functions that discriminate between them.
svr_distinct_otus: Classify the incoming genome IDs into organism taxonomic units.
svr_dlits: Get the list of publications associated with each specified protein or gene.
svr_dna_seq: Produce DNA strings for contigs, FIG feature IDs, and/or locations.
svr_enumerate_sets_by_owner: List all the persistent sets owned by owner
svr_evidence: Get evidence codes for protein-encoding genes
svr_exp_genomes: List the names and IDs of all the genomes with expression data.
svr_export_as_seed_dir: Export one or more genomes as SEED format directories.
svr_expressed_genes_in_range: Compute the genes in the specified genomes that are expressed a particular fraction of the time, where the fraction is a number between 0 and 1. The fraction is specified as a range from a minimum to a maximum value. If the minimum is 1, then only genes expressed all the time are returned. If the maximum is 0, then only genes that are never expressed are returned.
svr_fasta: Produce DNA or protein strings for genes.
svr_fasta_to_md5: This script takes a FASTA file of protein sequences from the standard input and writes a tab-delimited file of protein IDs to the standard output. Each output record will correspond to a single FASTA input record and will contain the incoming ID in the first column and the MD5 protein ID in the second column.
svr_fc_figfams: Output the functionally coupled FIGfams By specifying a MinSc, you restrict the output to functionally-coupled FIGfams that co-occur in at least n OTUs.
svr_fids_for_md5: Given a set of md5 protein IDs, compute the FIG IDs of features that produce each protein. This script takes as input a table containing md5 protein IDs and adds a column containing the associated FIG feature IDs.
svr_fids_to_ids: svr_fids_to_ids
svr_fids_to_locations: Clusters from protein-encoding genes
svr_fids_to_regulons: Return all the atomic regulons each feature belongs to.
svr_figfam_fasta: Produce FASTA strings for FIGfams.
svr_figfam_functions: Output the functions for the specified FIGfams.
svr_figfams_to_ids: List the PEGs for each specified FIGfam ID on STDOUT.
svr_file_to_spreadsheet: Writes the contents of the tab separated file on STDIN to a spreadsheet
svr_find_clusters_relevant_to_reaction: Find clusters potentially relevant to a search for a "missing gene"
svr_find_fused_genes: Find genes that are homologous to the query genes and prints a list of fusions among them.
svr_find_hypos_for_cluster: Get candidates for a specific role by finding genes with no real assignment of function yet that are connected to a cluster. We will consider a hypothetical "connected to a cluster" iff
svr_find_protein: Output the FIG IDs and functional assignments of all features that produce a specific protein. This script takes as input a single protein sequence on the command-line or a FASTA file. It outputs a tab-delimited file connecting each specified protein to its features.
svr_find_regulatory_proteins: Find potential regulatory proteins
svr_function_of: Get functions of protein-encoding genes
svr_function_to_role: Convert functions to roles.
svr_functionally_coupled: Get functionally_coupled neighbors (neighbors that tend to co-occur).
svr_gap_filled_reactions_and_roles: Get the reactions and functional roles that were predicted by gap-filling
svr_gene_data: Get one or more pieces of data about each specified gene.
svr_generate_dna_samples: Create random samples of DNA from known genomes
svr_genome_functions: List the location and functional assignment for each gene in a specified genome.
svr_genome_of: Get genome of feature
svr_genome_statistics: Get one or more pieces of data about each specified genome.
svr_get_ali_and_tree: Get alignments and trees corresponding based on a set of gene IDs.
svr_get_all: Process a general query against the Sapling database.
svr_get_coupling_data: Get functional coupling data for genes in a genome
svr_get_default_dataset: Return the name of the default dataset currently installed in the figfams annotation server.
svr_get_rep_genomes: Get a set of representative genomes using heuristics and the NCBI taxonomy
svr_get_set: List all the entries in a persistent set owned by owner
svr_identical_genomes: For each incoming genome ID, return the genome ID and name of each that has an identical md5 hash.
svr_ids_to_figfams: List the FIGfams for each specified gene ID on STDOUT. List on STDERR those lines where the id does not have a FIGFam.
svr_ids_to_subsystems: List the subsystems for each specified gene ID STDOUT. List on STDERR those lines where the id does not have a subsystem.
svr_img_analysis: Read an IMG genome directory and compare it to the corresponding Sapling genomes (if any). The single positional parameter is the IMG genome directory name. Note that the last level of the directory name must also be the IMG genome number. In other words, if the directory name is B<~/genomes/IMG/637000001>, then the genome name must be B<637000001>.
svr_in_fasta: This little script just takes a fasta file of PEGs as input and outputs a 3-column table [PEG,function,sequence].
svr_in_runs: Make sequences of genes into operons.
svr_inconsistent_sets: Separate out inconsistent sets
svr_inherit_aliases: Cause a new genome to inherit aliases from an existing genome for protein-encoding genes that are unique within each genome and that have identical translations.
svr_inherit_annotations: Cause a new genome to inherit annotations from an existing genome for protein-encoding genes that are unique within each genome and that have identical translations.
svr_insert_seqs_into_alignment: This script takes a FASTA file of protein/DNA alignment from the standard input and the name of a FASTA file of sequences to be inserted from the command line and writes the resulting alignment to the standard output. When not possible, a message will be written to the standard error output.
svr_intergenic_regions: List all the intergenic regions in the contigs for a specified genome.
svr_is_hypo: Keep just hypotheticals
svr_just_ends: Clip off the ends of a set of contigs
svr_link_to_compare_regions: Get link to compare regions organized to show co-occuring genes
svr_location_of: Get physical locations of genes.
svr_make_pan_genome_prot_families: Construct the protein families needed to study Pan Genomes
svr_mapped_genomes: Get maps between a reference genome and a set of genomes to which you wish to compare the reference genome.
svr_md5_of_prot: Get md5s of protein-encoding genes
svr_members_of_otu: For each incoming genome ID, return the genome ID and name of each genome in the same organism taxonomic unit.
svr_metabolic_reconstruction: Get a metabolic reconstruction from a set of functional roles.
svr_missing_roles: It is assumed that -r is used to specify a column in the input file. The column should contain reaction IDs for which "missing roles" might exist. The -g argument is used to specify the column containing the genome ID.
svr_motif: This script identifies the conserved regions from a set of aligned sequences.
svr_neighborhood_of_role: Find roles in metabolic-function neighborhood
svr_neighboring_reactions: Takes as input a table containing reaction IDs and adds 2 columns giving the distance and the connected reaction.
svr_neighbors_of: Get neighbors of protein-encoding genes (PEGs)
svr_oligomer_similarity: This command goes through an alignment and computes the pairwise fractions of n-character identities, producing a matrix of values for each string length specified in the -min to -max parameters.
svr_otus: List the names and IDs of all the representative genomes for the organism taxonomic units in the system.
svr_pegs_in_subsystems: Return all genes in one or more subsystems found in one or more genomes.
svr_pegs_with_evcode: Get pegs that have a given evcode (along with all of the peg's evcodes).
svr_possible_joins: Given kmer hits on ends of contigs, just group the hits having the same functions to support finding cases in which genes might span contigs.
svr_preferred_roles: Get preferred roles, translating functions into more acceptable forms
svr_project_by_sr: Get corresponding genes.
svr_project_model: Project atomic regulons from model to new genome.
svr_protein_assertions: Get a list of Annotation Clearinghouse assertions for the specified proteins.
svr_psiblast_search: This script takes a FASTA file of trimmed protein sequence alignment, uses PSIBLAST to search against the protein database of complete genomes, and writes the extracted regions of hits to the standard output.
svr_put_set: Add entries to a persistent set owned by owner
svr_quick_assign: Use staged-kmers (new, followed by old on hypotheticals)
svr_rRNA: Get 16S rRNAs of genomes
svr_reaction_description: This simple utility gives the reaction associated with reaction IDs.
svr_reactions_in_model: Takes as input a table containing model IDs and adds a column giving a reaction in the model. Since each model contains hundreds of reactions, the output file will be extremely large compared to the input file.
svr_reactions_to_roles: Takes as input a table containing reaction IDs and adds a column giving the roles that implement the reactions
svr_regulons_to_fids: Return all the atomic regulons each feature belongs to.
svr_representative_sequences: usage: representative_sequences [ opts ] [ rep_seqs_0 ] < new_seqs > rep_seqs
svr_reroot_tree: Reroot a tree at a different node or a point on an internal arc.
svr_role_to_complex: Extend a set of roles to include the triggered complexes
svr_role_to_pegs: Get PEGs that implement a given set of functional roles
svr_roles_to_reactions: Get the reactions potentially supported by a set of functional roles
svr_roles_to_subsys: Extend a set of roles to include the subsystems and category data
svr_seed_to_table: Extract a tab-separated feature table from a SEED Genome Directory. Table format is:
svr_sets_by_owner: List all the persistent sets owned by owner
svr_similar_to: Get similarities for a PEG
svr_sims2html: Build an HTML page or table from one or more tables of pairwise similarities.
svr_sketch_tree: This little utility invokes a tree "printing" utility Gary Olsen wrote. It has a rich set of options. We suggest a default usage of -m and -a. Thus,
svr_sphinx_indexing: Use sphinx indexes to match a keyword query (returning a table [peg,weight,annotation] By default it returns PEGs from pubSEED. The -c option supports coreSEED instead. ------
svr_split_fams: Detect cases in which BBHs have led to cycles
svr_spreadsheet_to_file: Writes the contents of the spreadsheet given in filename to a tab separated file on STDOUT.
svr_ss_classes: List the classifications for a specified subsystem or for all subsystems in a file.
svr_subsystem_classification: Extend a set of subsystems names to classifications
svr_subsystem_genome_data: Output the features, variants, and roles for one or more subsystems, optionally filtered by genome ID.
svr_subsystem_genomes: Output the genomes of a subsystem.
svr_subsystem_roles: Output the roles of a subsystem.
svr_subsystem_spreadsheet: Output a subsystem's spreadsheet.
svr_subsystems_to_roles: This take a table in. One of the columns contains subsystem names. For each subsystem, a set of lines is output. The set will be
svr_summarize_MG_output: This simple program produces two summaries: one of the functions identified and one of the OTUs identified. We represent OTUs with a representative organism. The function summary is sent to stdout, while the OTU summary is sent to stderr.
svr_summarize_contigs: For each incoming genome ID, return statistics about its contigs.
svr_summarize_protein_families: Write out three simple reports relating to a proposed set of protein families.
svr_taxonomically_related_genomes: Get a list of genomes that are taxonomically related to the input genomes.
svr_taxonomy: Get taxonomy of genomes
svr_tmpred_predictions: Estimate transmembrane domains
svr_translations_of: Get translations from ids
svr_tree: This script uses fasttree, PhyML or RAxML to build a maximum-likelihood tree from a FASTA alignment, or evaluates the likelihood of an input tree against a given alignment.
svr_tree_to_html: This script converts a newick tree into an HTML page. It has a rich set of options.
svr_trim_ali: This script takes a FASTA file of aligned sequences, trims the alignment by running PSIBLAST against the sequences themselves, and writes the trimmed alignment to the standard output.
svr_upstream: Retrieve upstream regions from the Sapling Server.
svr_which_genus_species: Try to identify genus and species of DNA fragments
svr_with_close_blast_hits: Determine which of the input PEGs have blastX hits to a given DB "close".