fragmented notes and Q&As - work in progress

infrastructure, ‘stations’

pipeline

‘junction classes’ and anchors

ARResT/Interrogate is able to annotate and identify rearrangements of all IG/TR loci, organised in what we call ‘junction classes’. They include complete, e.g. IG’s VJ:Vh-(Dh)-Jh; incomplete, e.g. TR’s DJ:Db-Jb; and other e.g. IG’s Vk-Kde or intron-Kde. For junction classes with no biologically-relevant junctional anchors (residues that define the CDR3 region, as per IMGT), we introduced our own – this enables consistent and informative results across all junction classes, assisting the user to focus on the most variable part of the rearrangement. For the D genes in DJ, VD and DD incomplete junction classes, we use recombination signal sequence (RSS) heptamers: the last triplet of the heptamer in 5’, and the first triplet of the heptamer in 3’. For the intron, we use a CCC triplet between the EuroClonality-NGS primer and the RSS heptamer, while for Kde the final triplet after the RSS heptamer and before the EuroClonality-NGS primer. In the majority of cases, these anchors are far enough from the junctional points to allow for nucleotide trimming without affecting their presence, but ARResT/Interrogate is anyway able to report rearrangements even with the anchors trimmed or mutated – this is also true for the normal anchors in complete rearrangements.

Q: What does it mean when some of my samples "QC-failed"?. What does it mean for my results, can I trust the results?

A: It means that it failed our QC tests, but that doesn't necessarily mean that it's unusable, that's why you can reinsert it back into the analysis (In the 'questions' panel, you can include back in the QC-failed samples, as the browser itself tells you). Final decision is on you, based on context (kind of sample, DNA quality, purpose) - we just want to attract your attention to potential issues.

Q: Can you tell me something about the specificity and sensitivity of the results?

A: Probably not - in general, it's very tricky to assess this fully, given the complexity and variability of the underlying data. We have a golden standard sequence set that we test on, and we’re quite confident we don't miss much - but if you're worried that we missed a marker, check the postmortem section of the sample log in 'file' panel. We also do report some artefacts that follow our rules and escape our filters, but these are usually very low abundance.

Q: Do I need a samplesheet?

A: A samplesheet is not absolutely essential, but it is extremely useful to provide the pipeline and the browser (and you!) with metadata - such metadata can identify run quality control samples to be specifically checked, specify the primer set used in each sample in order to e.g. specifically check that no mixup has occurred, split samples into runs again for quality control, and of course help you select/filter/order/rename/identify your samples in the browser.

Q: "(pre-)filtered"?

A: It means that there's a pre-filtering if you load the pre-filtered results in 'file', and filtering based on the widgets in 'questions'. Everything that is left out becomes part of this (pre-)filtered %. But be careful, and depending on what you're showing, it's not about a specific V-D-J rearrangement, it's about the whole sample.

Q: Sequence in the ‘minitable’?

A: The minitable shows the most popular sequence for a clonotype, not necessarily the longest. You can retrieve all sequences in 'forensics' in case there's a less popular version that is longer because of a different primer.

Q: What is the easiest way to check whether all the primers (and combinations) did their jobs? How can we check in the polyclonal control if all the sequences are in?

A: First of all you need to do 'processing' using a scenario with primers. Go to the 'questions' panel (if you cannot see it switch to a generic user mode at the top left, e.g. 'simple'), select primers as feature types with the "select|combine feature types {A}|{B}" widget - you can play with different combinations here.

 

Q: What is “w:wo junction” in reports?

A: If you use primers, we can see how many sequences were amplified by primers of a specific set, and how many of those had a junction, that's how we can tell you how many reads of that primer were w(ith):w(ith)o(out) junction.

Q: What is “R1/R2 missing” in ‘processing’?

A: If we believe the sample name represents Illumina's R1 or R2 reads, and one of those files is missing, we report it as such.