ADSP Data Portal User Guide


November 25, 2013

I. Introduction

Prerequisites

Feedback

II. ADSP Data Portal Overview

III. Step-by-Step Tutorial

Log in to the ADSP Data Portal

Using the Study Info Tab

Using the Pedigrees Tab

Using the Select Sequencing Data (SRA) Tab

Using the Select VCF/Supporting Files Tab

Using the Checkout Tab

Using the User Logout Tab

IV. Step-By-Step Instructions on How to Use the SRA Toolkit

Overview: Steps required to run the SRA Toolkit

System Requirements

An example session

Installation

Configure the SRA Toolkit environment

Create the ADSP Repository on Your Local Computer

Download and Decrypt ADSP Files Using Prefetch and Shopping Cart File


I. Introduction

The ADSP Data Portal enables users to explore ADSP project data which is archived at dbGaP. Users can then identify samples and files they would like to download:

A dbGaP repository key and the SRA Toolkit are required to download and decrypt the files.  

This user guide provides introduction to the ADSP Data Portal and step-by-step instructions on how to use the portal and the SRA Toolkit.  

Download the latest version of the User Guide

To obtain the latest version of this guide, go to the ADSP website or follow this URL.

The ADSP Study design documents are located here.

Prerequisites

Feedback

The ADSP Data Portal is continually evolving.  As more data become available, we will expand the capabilities of the portal.  Please let us know if you have any suggestions on our portal and/or would like to request new features.  

Please contact adsp-help@niagads.org if you have any questions or difficulties on:

If you encounter any problems when using the ADSP Data Portal (sometimes refreshing your browser helps), please let us know right away.  When you do so, it would help us fix the problem more efficient if you could provide us the following information:


II. ADSP Data Portal Overview

The ADSP Data Portal can be accessed from the ADSP website, https://www.niagads.org/adsp.  Please use this URL to access the ADSP data, as other URLs may be changed as the portal development continues.

The content of ADSP Data Portal is organized into Tabs (see below).  These tabs are ordered in correspondence to common workflow from left to right such that users can simply move from one tab to the next.

The following summarizes the content of each tab.

Instructions on how to use the ADSP Data Portal.

This tab provides the latest counting of samples with respect to various categories, such as Consortium, Study, status of sample data, and other important information such as phenotype codings.

The PDF version of the family samples pedigrees can be downloaded here.

This tab allows you to select sequencing read data (BAM files stored in SRA format) by samples using a number of filters.

This tab allows you to download a number of related files such as VCF and pedigrees.

This tab displays the contents of your “shopping cart” and allows you to remove selected items. The “shopping cart list” can be downloaded here.

This tab allows the user to logout and eliminate the current session.

III. Step-by-Step Tutorial

The following is a step-by-step tutorial on how to use the ADSP portal.

Log in to the ADSP Data Portal

First go to the ADSP website at https://www.niagads.org/adsp.  Click the Access Data button:

You will first be redirected to the NIH iTrust website to authenticate your eRA commons login.  Click NIH Login when the following dialog shows up.

Enter your eRA commons User Name and Password.  

If you have permission either as PI or a downloader to an approved dbGaP project that uses ADSP data, you will be redirected to the ADSP Data Portal web page.

Using the Study Info Tab

Navigate to the “Study Info” tab, where you can find more information about the ADSP data.

Select the buttons on the left and the content on the right panel will change accordingly.  We will add more information as ADSP moves along.

Using the Pedigrees Tab

Navigate to the “Pedigrees” tab.  You can download pedigree plots for the 111 families selected for the ADSP whole-genome sequencing study.  To access pedigree data as text files please go to the “Select VCF/Supporting Files” tab.

To download files, follow either of these two steps:

Using the Select Sequencing Data (SRA) Tab

Navigate to the “Select Sequencing Data (SRA)” tab where you can select SRA files to download.

Please be noted it’s possible that there are items left in the shopping cart from previous sessions. Check cart content by navigating to the “Checkout” tab. Use “Remove selected” or “Empty Cart” to remove items before selecting samples for the current session.

The right panel first shows all available samples without any filtering.  

  1. To apply filters, use the “Filter Builder” on the left panel.
  2. Five filters are readily available as drop-down menu on the left panel (Filter Builder).
  3. Choose desired categories by clicking on the drop-down menus. Grayed items indicate they are not available.
  4. Additional filters can be added by clicking on the “Add Filter” drop-down menu. Once a filter is chosen, click on the newly added drop-down menu to choose desired categories.“Add Filter” can be done for more than once.
  5. Once all filters are set, click on “Apply Filters” at the bottom. Samples fulfilling all the criteria will be displayed on the right panel.
  6. Click on “Reset” to clear or modify filters.
  7. The header of the sample table are all clickable for sorting the table by a particular column.
  8. Now select the samples you would like to add to your cart:
  1. Samples can be selected individually.
  1. By clicking the checkbox next to the “Family ID” of samples then clicking “Add to Cart”
  1. Sample can be added in bulk.
  1. By clicking the checkbox above the samples pane (Next to the field descriptions, upper left of the table; see below, the checkbox next to Family ID) then clicking “Add to Cart”.

  1. You can click “Add All to Cart” and then confirming this action.  This will add ALL samples in the table.  We suggest you only be used when the table is already filtered as you want.

Download Sample Information.  You can click the Export to Excel button to save all information of the selected sample in an Excel spreadsheet that can be opened by Microsoft Excel.  This also allows you to filter the samples using your own preferred computer programs.  Some browsers do not detect the correct format of the download file, and you need to add the .xls suffix to the file name.

Upload SRS ID list into the shopping cart.  You can upload a list of SRS IDs into shopping cart following these steps:

  1. Save the list of SRS IDs into a text file, one ID per row (see example below).  Currently only SRS IDs are supported.

  1. Drag and drop the text file onto the ADSP Data Portal web page; any location will do.  A dialog will show up asking you to confirm this operation

  1. Click Yes to continue adding sample IDs to the shopping cart.  Sample IDs already in the shopping cart will not be duplicated.

Using the Select VCF/Supporting Files Tab

Navigate to the “Select Other File” tab.  Here you can download other types of files such as pedigree information, phenotype data, and called genotype data in VCF format.

  1. Select files from the file type tables.  
  2. Files are divided into sections by file types.  You can click the upper left checkbox to add all files in the section.

  1. Click the “Add files to cart” button.

Using the Checkout Tab

Navigate to the “Checkout” tab to download the shopping cart file and retrieve data from dbGaP/SRA.

This tab is split into two panels.

Check your shopping cart before you proceed.  Follow these steps to download the shopping cart file:  

  1. You can remove samples by following the selecting process used in the “Select Samples” tab.   Confirm the contents of your cart.
  2. Click the “Check Out” button to download the cart file.  The default filename is “cart_0.krt”.
  3. Now the the cart file can be used with the SRA Toolkit to download files.  Please see Section IV on how to use SRA Toolkit, or the documentation provided by SRA.
  4. Note: it’s possible that there are items left in the shopping cart from previous sessions. Click “Empty Cart” before selecting samples for the current session.

Using the User Logout Tab

Click the Logout button to exit the current session.


IV. Step-By-Step Instructions on How to Use the SRA Toolkit

The following instructions explain how to set up SRA Toolkit on your computer and download files from dbGaP using shopping cart files generated by the ADSP Data Portal.  If this guide does not apply to your computing needs/environment, please refer to the SRA Toolkit readme file for more details on how to install and use SRA Toolkit.

Overview: Steps required to run the SRA Toolkit

Here is a brief overview of how to set up and run SRA Toolkit for IT experts who do not need step-by-step instructions.  You can refer to the SRA Toolkit readme file for more details.

  1. Download the dbGaP repository key file (file suffix is .ngc) from your approved dbGaP project.
  2. Use the ADSP Data Portal to generate the shopping kart file (say cart_0.krt).
  3. Download SRA Toolkit for your computing environment from here.  Decompress the file on your home directory (a new directory will be created).  Executables can be found at the bin/ subdirectory.
  4. Run sratools.jar (a Java program) to set up the SRA Toolkit environment and start the SRA Toolkit console.
  5. Choose File -> Import Repository and import the dbGaP repository key file.  Note the path to the new repository directory: it should read like dbGaP-####.
  6. At the Windows command prompt, OSX Terminal, or Linux shell prompt, change to the new repository directory created in step 5, and run prefetch as follows.  This will download all files in cart_0.krt to the new repository directory.

prefetch -X 1000G cart_0.krt

  1. Run vdb-decrypt to decrypt non-SRA files as follows:

vdb-decrypt .

System Requirements

  1. Running SRA Toolkit requires one of the following operating systems:
  1. Mac OSX 64-bit
  2. Windows 64-bit
  3. Linux (Ubuntu or CentOS) 64-bit
  4. You can try compiling the source code if you plan to run the code on other *IX operating systems that support 64-bit architecture.
  1. Configuration of SRA Toolkit is done by sratools.jar, which requires Java.

An example session

We now describe a simple example to demonstrate how to configure and run SRA Toolkit.  You may need to change the user ID, project number, and paths according to your own project and computing environment.

Installation

  1. Install SRA Toolkit by downloading the correct file from here.  The latest version is 2.3.4.  You can choose the pre-compiled binary files or compile the source code by yourself.  
  2. If you choose to compile the supplied source codes, please contact an IT specialist if you are not familiar with how to compile source codes.
  3. If you plan to download the pre-compiled binary files (recommended), move the downloaded file (the file name should end with .tar.gz or .zip) and decompress the file you downloaded:
  1. On Mac OSX, double-click the file to decompressed automatically.
  2. On Windows, double-click the zip file to decompress the file automatically.
  3. On Linux, use the following command to decompress the .tar.gz file (replace sratoolkit.2.3.4-centos_linux64.tar with the name of your downloaded file):

tar xvzf sratoolkit.2.3.4-centos_linux64.tar.gz

  1. We recommend you put the downloaded file under your home directory (/home/userid on most Linux environments, and /Users/userid on OSX; replace userid with your own user ID).  In this example the file is located at /Users/lisanwang/.
  2. Take note on the path (sratoolkitpath) created when you decompress the file.  All executable files are in the bin/ subdirectory.  For example,
  1. On OSX, the executables will reside in sratoolkit.2.3.4-mac64/bin/
  2. On CentOS Linux, the executables are in sratoolkit.2.3.4-centos_linux64/bin/
  1. In this example, the SRA Toolkit files are located at /Users/lisanwang/sratoolkit.2.3.4-mac64/, and the executables are in
    /Users/lisanwang/sratoolkit.2.3.4-mac64/bin/.

Configure the SRA Toolkit environment

  1. You will need Java to run sratoolkit.jar and configure the SRA Toolkit environment.  Java should be already installed on modern operating systems.
  2. Open a command prompt/console window.
  1. On OSX the program is called Terminal.  You can use Spotlight to locate this program.
  1. Run the following command (replace sratoolkitpath with the program path):

java -jar /Users/lisanwang/sratoolkit.2.3.4-mac64/bin/sratools.jar

  1. A configuration wizard will appear.  You can use the default settings.
  2. The first window asks your input on where the dbGaP/SRA file archive will be located on your computer.  If you want to use the default setting just click Next. Note: the volume for repository should have enough free space. BAM files are typically in the range of 30 to 100 GB per WES/WGS.


  1. If the path has not been created, click Yes at the next dialog window to create the new path.
  2. On the next dialog window, make sure the Disable Repository checkbox is unchecked and click OK.

  1. Click OK in the next dialog to continue browsing into the SRA Toolkit console.

  1. Here is a screenshot of the SRA-Toolkit console, which displays remote files at NCBI SRA and local files on your computer.

  1. If you want to exit the console, select the SRA-Toolkit menu and click Quit to exit the console.  You can run sratoolkit.jar again to enter the console.


Create the ADSP Repository on Your Local Computer

  1. Enter the console again (run sratoolkit.jar) if you have exited the console.
  2. Select File -> Import Repository.
  3. Select the ADSP project key you downloaded from dbGaP (the file ends with .ngc).
  4. At the next dialog, click DEFAULT to create a local repository of the ADSP project data.  In this example, the dbGaP project ID is 4570.  Your approved project will have a different number.

Download and Decrypt ADSP Files Using Prefetch and Shopping Cart File

  1. Enter the console again (run sratoolkit.jar) if you have exited the console.
  2. Open the command prompt window and change your working directory to the newly created ADSP project archive using the cd command.  In this example, enter the following command:

cd /Users/lisanwang/ncbi/dbGaP-4570

  1. Now run prefetch to download files using the shopping cart file as input.  Assume the full path to the cart file is /Users/lisanwang/adsp_key/cart.krt.  You can use the following command to view available options:

prefetch -h

  1. Enter the following command to list files in the shopping cart file (replace prefetch with full path, e.g. /Users/lisanwang/sratoolkit.2.3.4-mac64/bin/prefetch):

prefetch --list /Users/lisanwang/adsp_key/cart.krt

  1. Enter the following command to retrieve files (replace prefetch with full path).  Change the parameter for the -X option from 1000G (corresponds to 1TB) to other numbers to change the file size limit.  The default setting is 20GB.  

prefetch -X 1000G /Users/lisanwang/adsp_key/cart.krt

  1. Prefetch will report the download progress and decrypt SRA files automatically.
  2. For non-SRA files, an additional step is required to decrypt the files.  Under the same directory, enter the following command to decrypt all files (replace vdb-decrypt with full path, e.g. ~/sratoolkit.2.3.4-mac64/bin/vdb-decrypt):

vdb-decrypt .