fragmented notes and Q&As - work in progress

infrastructure, ‘stations’

We use the excellent compute and data infrastructure of the MetaCentrum - Virtual Organization [http://metavo.metacentrum.cz/en/] of CESNET [http://www.cesnet.cz/?lang=en] in the Czech Republic.
By visiting arrest.tools/interrogate you’re redirected to a free server, or ‘station’ - do not bookmark the final address.
Data and results, as well as user information, are shared between ‘stations’.
We’ll frown upon you launching 1000s of samples to be analysed, especially not before telling us what and why.
There are no absolute guarantees for this service, including uptime, support, storage permanence or security - please anonymise your data and keep them safe on your side (store your FASTQ files and download ARResT/Interrogate results).

pipeline

The 'processing' panel is the interface to the ARResT/Interrogate pipeline. You need an account to use it - ask an administrator, or for public 'stations' contact us and we’ll contact you back.
The pipeline is validated against hundreds of diverse sequences, from TP53 to non-rearranged germlines to examples of all junction classes. Of course, due to the immense variability of these rearrangements, the intricacies and biases of the underlying laboratory technologies, and our ever-developing code, we cannot guarantee that there won’t be issues - trust, but verify, and let us know.
Pipeline options are organized in user-editable ‘scenarios’. Samples and their FASTQ files (including paired-ended and from multiple lanes, can/should be gzipped) can be uploaded, organized in ‘analyses’, annotated, and selected to be processed. The pipeline produces comprehensive reports, available in the ‘file’ panel.
The pipeline is able to annotate reads with primer sequences and keep this annotation for downstream analysis, e.g. run quality control. Primers can also be used to safeguard the completeness of the amplicon, by requiring that both 5’ and 3’ primers are located on each read.
After launching the pipeline it's OK to log out (or just close the browser tab) and go home - you should receive an email (if you provided an address) when it’s finished.
You should be able to upload new samples to an existing ‘analysis’, select and run only them. It could be confusing If you upload a sample with the same name as one of the samples already uploaded, because the old samples are overwritten with the new ones.

‘junction classes’ and anchors

ARResT/Interrogate is able to annotate and identify rearrangements of all IG/TR loci, organised in what we call ‘junction classes’. They include complete, e.g. IG’s VJ:Vh-(Dh)-Jh; incomplete, e.g. TR’s DJ:Db-Jb; and other e.g. IG’s Vk-Kde or intron-Kde. For junction classes with no biologically-relevant junctional anchors (residues that define the CDR3 region, as per IMGT), we introduced our own – this enables consistent and informative results across all junction classes, assisting the user to focus on the most variable part of the rearrangement. For the D genes in DJ, VD and DD incomplete junction classes, we use recombination signal sequence (RSS) heptamers: the last triplet of the heptamer in 5’, and the first triplet of the heptamer in 3’. For the intron, we use a CCC triplet between the EuroClonality-NGS primer and the RSS heptamer, while for Kde the final triplet after the RSS heptamer and before the EuroClonality-NGS primer. In the majority of cases, these anchors are far enough from the junctional points to allow for nucleotide trimming without affecting their presence, but ARResT/Interrogate is anyway able to report rearrangements even with the anchors trimmed or mutated – this is also true for the normal anchors in complete rearrangements.

Q: What does it mean when some of my samples "QC-failed"?. What does it mean for my results, can I trust the results?

A: It means that it failed our QC tests, but that doesn't necessarily mean that it's unusable, that's why you can reinsert it back into the analysis (In the 'questions' panel, you can include back in the QC-failed samples, as the browser itself tells you). Final decision is on you, based on context (kind of sample, DNA quality, purpose) - we just want to attract your attention to potential issues.

Q: Can you tell me something about the specificity and sensitivity of the results?

A: Probably not - in general, it's very tricky to assess this fully, given the complexity and variability of the underlying data. We have a golden standard sequence set that we test on, and we’re quite confident we don't miss much - but if you're worried that we missed a marker, check the postmortem section of the sample log in 'file' panel. We also do report some artefacts that follow our rules and escape our filters, but these are usually very low abundance.

Q: Do I need a samplesheet?

A: A samplesheet is not absolutely essential, but it is extremely useful to provide the pipeline and the browser (and you!) with metadata - such metadata can identify run quality control samples to be specifically checked, specify the primer set used in each sample in order to e.g. specifically check that no mixup has occurred, split samples into runs again for quality control, and of course help you select/filter/order/rename/identify your samples in the browser.

Q: "(pre-)filtered"?

A: It means that there's a pre-filtering if you load the pre-filtered results in 'file', and filtering based on the widgets in 'questions'. Everything that is left out becomes part of this (pre-)filtered %. But be careful, and depending on what you're showing, it's not about a specific V-D-J rearrangement, it's about the whole sample.

Q: Sequence in the ‘minitable’?

A: The minitable shows the most popular sequence for a clonotype, not necessarily the longest. You can retrieve all sequences in 'forensics' in case there's a less popular version that is longer because of a different primer.

Q: What is the easiest way to check whether all the primers (and combinations) did their jobs? How can we check in the polyclonal control if all the sequences are in?

A: First of all you need to do 'processing' using a scenario with primers. Go to the 'questions' panel (if you cannot see it switch to a generic user mode at the top left, e.g. 'simple'), select primers as feature types with the "select|combine feature types {A}|{B}" widget - you can play with different combinations here.

Q: What is “w:wo junction” in reports?

A: If you use primers, we can see how many sequences were amplified by primers of a specific set, and how many of those had a junction, that's how we can tell you how many reads of that primer were w(ith):w(ith)o(out) junction.

Q: What is “R1/R2 missing” in ‘processing’?

A: If we believe the sample name represents Illumina's R1 or R2 reads, and one of those files is missing, we report it as such.