geWorkbench – software platform for integrated genomic data analysis

geWorkbench (genomics Workbench) is a Java-based open-source platform for integrated genomics. Using a component architecture it allows individually developed plug-ins to be configured into complex bioinformatic applications.

At present there are more than 70 available plug-ins supporting the visualization and analysis of gene expression and sequence data.

geWorkbench is the Bioinformatics platform of MAGNet, the National Center for the Multi-scale Analysis of Genomic and Cellular Networks, one of the 8 National Centers for Biomedical Computing.

Features include:

  • Computational analysis tools such as t-test, hierarchical clustering, self-organizing maps, regulatory network reconstruction, BLAST searches, pattern-motif discovery, protein structure prediction, structure-based protein annotation, etc.
  • Visualization of gene expression (heatmaps, volcano plot), molecular interaction networks (through Cytoscape), protein sequence and protein structure data (e.g., MarkUs).
  • Integration of gene and pathway annotation information from curated sources as well as through Gene Ontology enrichment analysis.
  • Component integration through platform management of inputs and outputs. Among data that can be shared between components are expression datasets, interaction networks, sample and marker (gene) sets and sequences.
  • Dataset history tracking – complete record of data sets used and input settings.
  • Integration with 3rd party tools such as Genepattern, Cytoscape, and Genomespace.
  • Provides an environment which supports moving from one data type to another in a seamless fashion, e.g. from gene expression to sequences to patterns.
  • Provides access to a variety of external data sources, including:
    • Microarray gene expression repositories (caArray).
    • BLAST (NCBI).
    • Gene annotation pages (via bioDBNet).
    • Protein and DNA sequence retrieval (UC Santa Cruz and EBI).
    • Pathway diagrams (BioCarta).
  •  Provides a gateway to several computational services currently hosted on Columbia servers and clusters, including:
      • Pattern Discovery.
    • Pudge – protein structure modeling.
    • SkyBase – database of molecular models.

Specific types of data supported include:

  • Microarray Gene Expression:
    • GEO Soft: Series, Series Matrix, and Annotated Matrix (GDS).
    • MAGE-TAB data matrix.
    • Affymetrix GCOS/MAS5.
    • Matrix format (geWorkbench).
    • Tab-delimited (e.g. RMAExpress).
    • GenePix.
  • Microarray Gene Expression Annotation file support:
    • Affymetrix 3′ Expression.
    • Affymetrix WT Gene/Exon ST (transcript-level) including Gene Array 1.0/2.0 ST and Exon 1.0 ST.
  • DNA and Protein Sequences:
    • FASTA.
  • Pathways:
    • BioCarta.
  • Molecular structure – prediction, annotation and display.
  • Sequence Patterns:
    • Regular Expressions.
  • Gene Ontology.
  • Regulatory Networks.

Support: Documentation, QuickStart
Developer: Columbia University, First Genetic Trust National Cancer Institute
License: BSD-like


geWorkbench is written in Java. Learn Java with our recommended free books and free tutorials.

Return to Bioinformatics Tools Home Page

Make a Donation
Click the button to make a donation via flattr. Donations help us to maintain and improve the site. You can also donate via PayPal.

Read our complete collection of recommended free and open source software. The collection covers all categories of software.

The software collection forms part of our series of informative articles for Linux enthusiasts. There's tons of in-depth reviews, alternatives to Google, fun things to try, hardware, free programming books and tutorials, and much more.
Share this article