GeneShelf: A Web-based Visual Interface for Spinal Cord Injury Study

Motivation: A widespread use of high-throughput gene expression analysis techniques enabled the biomedical research community to share a huge body of gene expression datasets in many public databases on the web. However, current gene expression data repositories provide static representations of the data and support limited interactions. This hinders biologists from effectively exploring shared gene expression datasets. Responding to the growing need for better interfaces to improve the utility of the public datasets, we have designed and developed a new web-based visual interface entitled GeneShelf. It builds upon a zoomable grid display to represent two categorical dimensions. It also incorporates an augmented timeline with expandable time points that better shows multiple data values for the focused time point by embedding bar charts. We applied GeneShelf to one of the largest microarray datasets generated to study the progression and recovery process of injuries at the spinal cord of mice and rats. There are also considerations of the analysis methods, and the entire data set was converted into three probe set algorithms (Plier, GC-RMA, and dChip), leading to nearly 10,000 microarray data files. SpinalCordLink can provide researchers with a good resource to interactively investigate one of the world largest microarray datasets.

Support: This work was supported by NIH NINDS-01 (NS-1-2339) and by NIH NCMRR/NINDS 5R24 HD 050846 (Integrated Molecular Core for Rehabilitation Medicine).

Publications:

  • S.M. Knoblach, E.P. Hoffman and J. Seo, "SpinalCordLink: A New Web-based Visual Interface for Analysis of a Large Spinal Cord Injury Expression Profiling Dataset," in Abstracts from The 26th Annual National Neurotrauma Society Symposium, Orlando, FL (July 27-30, 2008), Journal of Neurotrauma, 25(7): 853-935, 2008.
GOTreePlus: Interactive GO Visualization for Proteomics Projects

Motivation: We developed an interactive gene ontology visualization tool named GOTreePlus that can superimpose annotation information over gene ontology structures. GOTreePlus can facilitate the identification of important GO terms while visualizing them in the gene ontology structure. The interactive pie chart summary for a selected gene ontology term provides users with a succinct overview of their experimental results.

Support: This work was supported by NIH 5R24HD050846-02 Integrated molecular core for rehabilitation medicine, and NIH 1P30HD40677-01 (MRDDRC Genetics Core).

Publications:

ConSet: Visualizing set concordance with permutation matrices and fan diagrams

Motivation: Scientific problem solving often involves concordance (or discordance) analysis among the result sets from different approaches. For example, different scientific analysis methods with the same samples often lead to different or even conflicting conclusions. To reach a more judicious conclusion, it is crucial to consider different perspectives by checking concordance among those result sets by different methods. In this paper, we present an interactive visualization tool called ConSet, where users can effectively examine relationships among multiple sets at once. ConSet provides an overview using an improved permutation matrix to enable users to easily identify relationships among sets with a large number of elements. Not only do we use a standard Venn diagram, we also introduce a new diagram called Fan diagram that allows users to compare two or three sets without any inconsistencies that may exist in Venn diagrams. A qualitative user study was conducted to evaluate how our tool works in comparison with a traditional set visualization tool based on a Venn diagram. We observed that ConSet enabled users to complete more tasks with fewer errors than the traditional interface did and most users preferred ConSet.

Support: This work was supported by NIH 5R24HD050846-02 Integrated molecular core for rehabilitation medicine, and NIH 1P30HD40677-01 (MRDDRC Genetics Core).

Publications:

Interactive Power Analysis for Microarray Hypothesis Testing and Generation

Motivation: Human clinical projects typically require a priori statistical power analysis. Towards this end, we sought to build a flexible and interactive power analysis tool for microarray studies integrated into our public domain HCE 3.5 software package. We then sought to determine if probe set algorithms or organism type strongly influenced power analysis results.

Availability: HCE 3.5 or later

Support: This work was supported by Department of Defense W81XWH-04-01-0081 and NIH 1P30HD40677-01 (MRDDRC Genetics Core).

Publications:

Interactive Optimization of Signal-to-Noise Ratios for Affymetrix Microarray Projects

Motivation: The most commonly utilized microarrays for mRNA profiling (Affymetrix) include ‘probe sets’ of a series of perfect match and mismatch probes (typically 22 oligonucleotides per probe set). There are an increasing number of reported ‘probe set algorithms’ that differ in their interpretation of a probe set to derive a single normalized ‘signal’ representative of expression of each mRNA. These algorithms are known to differ in accuracy and sensitivity, and optimization has been done using a small set of standardized control microarray data. We hypothesized that different mRNA profiling projects have varying sources and degrees of confounding noise, and that these should alter the choice of a specific probe set algorithm. Also, we hypothesized that use of the Microarray Suite (MAS) 5.0 probe set detection p-value as a weighting function would improve the performance of all probe set algorithms.

Availability: HCE 3.0 or later

Support: This work was supported by N01 NS-1-2339 from the NIH.

Publications:

PEPR: Public Expression Profiling Resources
Our Center in DC became the first academic "Affymetrix Center of Excellence" in the early 1990ís, and we have continued our interest in QC/SOP protocols, experimental process design and implementation as an international core facility, and bioinformatic methods development. While there are excellent public data repositories for microarray data, there are also chronic concerns regarding accurate meta data collection process, accessibility, and appropriate data formats. We hypothesized that a new system could be developed that included a complete prospective data collection process, through to APIs for conversion of projects to many signal outputs (probe set algorithms), and finally simple interfaces for either on-the-fly dynamic web based The design enabled rich meta-data search functions (i.e. search by experiment design type or animal model's age, sex), including a web-interface data input system to capture experiment information prospectively (and remotely; such that an investigator in Sweden begins entering meta data and design data prior to initiation of the profiling project). Unlike other currently utilized profiling packages, our web interface data input submission process offers great flexibility to obtain desired experiment meta-data (e.g. addition of experiment design type) for analysis and visualization. It provides a mechanism to enforce data input consistency and validation, and eliminates the current accessory tables and batch process to filter data. The data consistency expands the search and visualization capabilities.

Support: Funding for PEPR has been provided by the National Institutes of Health ( NINDS , NHGRI , NHLBI ), the Department of Defense (Congressionally-directed Medical Research Office) and the following foundations and individuals: Erynn Godla Fund, Parsons Family Foundation, Dining Away Duchenne (DAD) (Wood Family, Washington DC), Serving Up a Cure ( Dallas , TX ; MDA), Jarvis Family ( Norfolk , VA ; MDA) and Muscular Dystrophy Association.

Publications:

Knowledge Discovery in High-Dimensional Data: Case Studies and a User Survey for the Rank-by-Feature Framework

Motivation: Knowledge discovery in high-dimensional data is a challenging enterprise, but new visual analytic tools appear to offer users remarkable powers if they are ready to learn new concepts and interfaces. Our three-year effort to develop versions of the Hierarchical Clustering Explorer (HCE) began with building an interactive tool for exploring clustering results. It expanded, based on user needs, to include other potent analytic and visualization tools for multivariate data, especially the rank-by-feature framework. Our own successes using HCE provided some testimonial evidence of its utility, but we felt it necessary to get beyond our subjective impressions. We presents an evaluation of the Hierarchical Clustering Explorer (HCE) using three case studies and an e-mail user survey (n = 57) to focus on skill acquisition with the novel concepts and interface for the rank-by-feature framework. Knowledgeable and motivated users in diverse fields provided multiple perspectives that refined our understanding of strengths and weaknesses. A user survey confirmed the benefits of HCE, but gave less guidance about improvements. Both evaluations suggested improved training methods.

Availability: HCE 3.0 or later

Support: This work was supported by Department of Defense W81XWH-04-01-0081, NIH 1P30HD40677-01 (MRDDRC Genetics Core) and NSF EIA 0129978.

Publications:

Hierarchical Clustering Explorer for Interactive Exploration of Multidimensional Data

The Hierarchical Clustering Explorer (HCE) is an interactive knowledge discovery tool for multivariate data, especially of microarray data sets. Its unique visualization interface and powerful analytic tools, based on more than three years of effort, have induced more than 7000 downloads from more than 20 different countries since April 2002. In addition to our genomic research papers with biologist partners and our information visualization publications, we are encouraged that at least six scientific papers from authors unknown to us were published since 2004 that describe using HCE in their analysis.

For more information about HCE, visit www.cs.umd.edu/hcil/hce/.