Infrastructure

Information Commons provides tools and data on several different infrastructure options

AWS cloud computing infrastructure, leveraging capabilities such as Elastic Map Reduce, as well as on-premise High Performance Computing (HPC) with UCSF’s Wynton HPC cluster and Information Commons Application Server.

Platforms

For de-identified data, following shared spaces are available on the following platforms:

 

  • Info Commons AWS Cluster

    Amazon Web Services (AWS) Elastic Map Reduce (EMR) cluster for shared use among multiple researchers. Private data storage area is available for each user.​​​​​

  • Wynton HPC

    Wynton HPC is a large, shared high-performance compute (HPC) cluster underlying UCSF’s Research Computing Capability. It is available to all UCSF researchers and their collaborators. ​​​​​​
  • Research Analysis Environment (RAE)

    A secure data hosting and compute service for UCSF researchers and their internal or external collaborators. Formerly known as MyResearch, RAE provides compute, storage, and tools in a secure, compliant platform across three product tiers to meet the community's diverse research needs.​​​​​

  • Information Commons Application Server

    IC Wynton App Server is a powerful on-premise server with 600 GB of memory and 28 computing cores. In addition, it provides access to computing resources in the Wynton HPC Cluster, which contains an additional 12,000+ computing cores!​​

 

For PHI data with IRB study, private spaces are available on the following platforms:

Computational Environments

Within the Information Commons data science platform, there are two main computational environments:

  • IC AWS: UCSF Information Commons AWS Cluster is an AWS Apache Spark computing cluster that is pre-configured for data science research.
  • IC Wynton: Wynton-based IC App Server is an on-premise computational environment that supports interactive data science workflows requiring parallel processing (GPU) and/or PHI compliance.

The table below lists the key features of the Information Commons and some other computational environments available at UCSF.

 

IC AWS
Shared Cluster

IC AWS
SEC*

New! IC Wynton
App Server

RAE Premium

Wynton HPC

UCSF AWS SEC

Configuration

Shared auto-scalable Spark EMR Cluster (CPU)

On-demand auto-scalable Spark EMR cluster​

Powerful on-premise app server, access to Wynton HPC (GPU); Spark, Dask

On-prem server with custom CPU, RAM 
configuration ​
Scalable on-prem HPC cluster (GPU) Scalable, secure, customizable HPC cluster on UCSF Enterprise AWS cloud

PHI Support

 

✔︎

✔︎

✔︎

✔︎

✔︎

Storage for user data**

✔︎

✔︎

✔︎

✔︎

✔︎

✔︎

Access to de-identified UCSF research data assets (No IRB)

EHR (structured)

✔︎

✔︎

✔︎

✔︎*    

Clinical Notes

✔︎

✔︎

✔︎

✔︎*    

New! Radiology Images

   

✔︎

     

Interactive Tools

Hue, Jupyter, RStudio

Hue, Jupyter, RStudio

Jupyter

Azure Data Studio, SAS, RStudio, Spyder, MATLAB, 
Jupyter, SPSS, STATA​
   

UCSF Enterprise GitHub for Collaboration

 

✔︎

✔︎

✔︎ ✔︎  

* IC AWS will be moving to UCSF Secure Enterprise AWS in the near future, which will enable PHI support
** Storage for user data is PHI-compliant with "PHI Support"

As shown in the table above, some features that are available in both Information Commons environments are:

  • Jupyter environment for interactive data science
  • Availability of structured, de-identified electronic health records data (DeID CDW and OMOP)

System Requirements

If you are unsure about your data set security requirements, please review information on data classification.

If you would like to set-up your own infrastructure, please review information on minimum security standards and information on UCSF IT Security Assessment.