Skip to content

Setting up Samplesheet

There are two ways to generate a samplesheet for the pipeline depending on the use case. If the goal is to process patient fastq files through the pipeline to use the results for secondary analysis, you can follow these steps to build your own samplesheet. Along with processing the data, if you want to visualize the results on ClinOmics data portal then follow steps to build samplesheet from mastersheet. This is highly recommended.

Build samplesheet from mastersheet

For khanlab purposes, pipeline is always launched using the information in the mastersheets on biowulf under /data/khanlab space. The script samplesheet_builder.py queries the mastersheets to build a samplesheet for the pipeline. A copy of this script is available in the pipeline git repo. This script takes two inputs PatientID and casename. By default, it queries all mastersheets found in the /data/khanlab/projects/DATA/Sequencing_Tracking_Master directory and uses /data/khanlab/projects/DATA as the default input directory.

When using a non-Khanlab master sheet, ensure the following columns are included:

  • Patient ID: PatientID
  • Library ID: LibraryID
  • Enrichment Step: Capture kit name
  • Matched RNA-seq Library: Matching RNA lib for the Exome library (can be left empty)
  • Matched Normal: Matching normal lib for the Exome library (can be left empty)
  • Diagnosis: Diagnosis
  • Case Name: casename for website
  • Type: Data type information
  • FCID: flowcell ID (optional)
  • Project: Project name

Read1, Read2 Construction: The script uses information from the Input directory and following columns to build the file paths for read1 and read2.

  • Library ID
  • FCID (optional) If FCID is provided, it will be used to build the paths; otherwise, the paths will be constructed using only the Input Path and Library ID.
Usage: python ./samplesheet_builder.py <patient_id> <case_name>
Default Samplesheet Directory: /data/khanlab/projects/DATA/Sequencing_Tracking_Master
Default Input Directory: /data/khanlab/projects/DATA
To use custom directories, modify the script:
   - Change 'DEFAULT_SAMPLESHEET_DIR' to your samplesheet directory path
   - Change 'DEFAULT_INPUT_DIR' to your input directory path

python ./samplesheet_builder.py Test_Patient casename will output a file Test_Patient_casename.csv in the same folder.

Error Handling

The script includes the following error handling mechanisms:

  • Invalid read1 and read2 Paths: If the paths for read1 and read2 are invalid, the script will output an error message. This message will prompt you to check and verify the input paths.

Build your own samplesheet

Alternately, we can build custom samplesheet without mastersheet. These are the required columns.

Column name Notes Example
sample Patient name NCI-Test1
library Name of the sample library Test1_T1D_E
read1 Full path to the read1 /data/khanlab/DATA/Sample_Test1_T1D_E/Sample_Test1_T1D_E.R1.fastq.gz
read2 Full path to the read2 /data/khanlab/DATA/Sample_Test1_T1D_E/Sample_Test1_T1D_E.R2.fastq.gz
sample_captures Name of the capture kit used List of supported capture kits are here
Matched_RNA Matched RNA library for the tumor library. This includes cell_line_RNA and tumor_RNA Test1_T1R_T
Matched_normal Matched exome normal library for the tumor library. This includes panel, blood DNA, cell_line_DNA Test1_N1D_E
Diagnosis Diagnosis of the patient Glioma
casename Casename for the patient NCI-Test1
type Data type example: tumor_RNA, tumor_DNA, normal_DNA, blood_DNA, cell_line_DNA, cell_line_RNA
FCID Flowcell ID ACJ678349

Example samplesheet:

sample,library,read1,read2,sample_captures,Diagnosis,Matched_RNA,Matched_normal,casename,type,FCID,Project Test8,Test5_T1D_E,/data/khanlab/projects/fastq/Test5_T1D_E_R1.fastq.gz,/data/khanlab/projects/fastq/Test5_T1D_E_R2.fastq.gz,clin.ex.v1,Osteosarcoma,,Test8_N2D_E,NFtest0523,tumor_DNA,AWXYNH2,Test Test8,Test8_N2D_E,/data/khanlab/projects/fastq/Test8_N2D_E_R1.fastq.gz,/data/khanlab/projects/fastq/Test8_N2D_E_R2.fastq.gz,clin.ex.v1,Osteosarcoma,,,NFtest0523,normal_DNA,AWXYNH2,Test