GA4GH BED v1 - PRC Feedback

	A	B	C	D	E	F	G	H	I	J	K
1	Element	Comment	Suggested Action	Raised/Supported By	Relevant Github Issues	Required, Recommended or Optional Change?	BED Dev Team Response	PRC Resolution	BED Dev Team Response Round 2	Planned change round 1	Planned change round 2
2	Scope	There is no comment or statement of what the scope of the specification is. This has also hurt PRC as it is hard to know what is and is not in scope. What is intended to be captured in the specification? Is it the formalization of BED or an intention to advance the spec?	Clarify scope within documentation	AY,JZ,TW			[First, we would like to thank the Peer Review Committee for the time and energy they have put into this effort. Thank you for helping us make this better.] Adding information about the scope of the specification is warranted, we will add a subsection of "Specification" before "Typographic conventions". The scope of this specification leans much more towards the side of formalizing the BED file format. The scope should mention that the specification is meant to formalize reasonable interpretations of the previous description of the format; and to make clear interoperability issues that currently cause confusion and can be better specified in a future specification	Resovled - Contigent on review of change		DONE	-
3	Future Directions	The future direction of the specification isn't being iterated upon here so this has caused problems with reviewing the spec e.g. metadata payloads	Clarify the future directions for the specification	AY,TW			We feel the scope of what is described above is self-contained, while other improvements are much more open-ended and may even rely on best practices from elsewhere in GA4GH that we would like to use here (such as how best to specify a genome assembly) that are not yet fully established. Metadata payloads and other advancements in the BED specification should be left for a future specification. This could be a new version ("BEDv2") but would probably be best thought of as an overlay onto BEDv1 such as "BEDx". For example, all BEDxv1 files might be valid BEDv1 files but BEDx compliance would require certain strict choices that simplify parsing (such as requiring that a single tab be the only field separator), and specification of some out-of-band information into header comments. We don't think future directions fit well into the scope of the specification proper given how well they might change but are happy to discuss this publicly	Resovled - Contigent on review of change (except for thickStart see row 17)		-	-
4	Schemas	The spec mentions other ways to define a schema for a BED file but only mentions AutoSQL. Are there others? Certainly AutoSQL to my knowledge is the main one	SPecify the other schemas for BED files	AY			We are only aware of AutoSQL as being designed for BED. Of course there are a wide variety of schemas designed for other purposes that might be adaptable here (much in the way that AutoSQL is itself a bit of an adaptation of SQL table schemas to this format).	Resolved		-	-
5	1. Browser Extensible Data (BED) is a whitespace-delimited file format, where each file consists of one or more lines.	Within a single BED file, can there be a mix of different whitespace characters used as delimiters?	make it explicit whether same delimiter must be use for the entire BED file	JZ			Yes, there can be a mix. For example, one line uses tab to separate chrom and chromStart and four spaces to separate chromStart and chromEnd. We will add a sentence to explicitly state this.	Resovled - Contigent on review of change		DONE	-
6	1.3.1: The whitespace must match the regex [[:space:]]+	It seems like consecutive whitespaces are regarded as a single whitespace, is that understanding correct? If this is allowed, it would be not possible to express empty valued field unambiguously.	only allow single character delimiter	JZ,AY	570		The understanding is correct, except in edge case where a file's field separator is only a single tab and the name field is empty. In this case, two tabs will be recognized as two separate field separators. Using any sequence of arbitrary-length whitespace as a field separator is clearly allowed by the previous description and we think it is too late to invalidate that. It's quite likely that people have been using multiple spaces to emulate tabs. We recommend in this specification that only a single tab be used as field separator. In the next major version of the specification (including an overlay such as "BEDx"), requiring this would be very sensible. Additional note: following public comments we have made it clear in the revised draft that the field separator is horizontal whitespace only, that is the regex `[ \t]+`	Resolved - In the documentation would like to see a sentence to the effect of "We recommend in this specification that only a single tab be used as field separator. In the next major version of the specification (including an overlay such as "BEDx"), requiring this would be very sensible."	Added.	-	DONE
7	Whitespace/blank lines	There is a mention about how blank lines can appear anywhere in a BED file, but this seems at times at odds with the bigBed/sorted BED file comments. If a file is sorted, shouldn't all of these elements appear ideally in the same place? Can you index a BED file into bigbed with blank lines?	Clarify use of comments and blank lines if possible	AY			As far as we know, the bedToBigBed conversion tool does allow blank lines. With regards to sorting, the order in which features are listed should follow the recommendations we made, but you can have a blank line or comments between features and they still count as sorted. The reason for requiring the sorting of features is that the implementation of a lot of downstream operations is disproportionately easier if 1) features on one chromosome are all together, regardless of any sort order of the names of chromosomes 2) chromStart is non-decreasing within a chromosome, regardless of other fields. While other optimizations would be possible with further restrictions on where one can see blank lines, comment lines, and sorting of other fields like chromEnd, they are far less important. We will add: - mention that blank lines and comments are not required to be sorted, only data lines, despite the suggested sorting command - discuss the rationale for requiring sorting of features	Resolved		DONE	-
8	same as above (line 6)	Is it feasible to exclusively use tab as the only supported delimiter, same as VCF, SAM and GFF/GTF etc? it would make a lot things easier.		JZ (also brought up by others in GitHub PR)			It is our recommendation in this specification to only use single tab. In this version requiring single tab as the field separator seems undesirable due to the massive existing uses, as explained above. In future versions, you should expect that this to be the only allowed field separator	Resovled		DONE	-
9	general comment	Should tables be numbered and given short name? it would be useful for referencing them in the main text.		JZ			Yes, we will add a name to each table and a bolded short description	Resovled - Contigent on review of change		DONE	-
10	1.2 Terminology and concepts	Where and how should the information about BEDn, BEDn+, BEDn+m be kept? With such information explicitly defined, a parser may not know how to handle different fields, for example a 9-field BED could be BED3+6 or BED6+3?		JZ,TW	570		A header at the top of the BED file is possible. An idea could be to have a type of header line (with some number of #'s starting the header line) that states the BED type (BEDn or BEDn+m) followed by m lines describing the custom fields if applicable. This should be within the scope of a future BED specification, not this one. More discussion on how file metadata should be defined can occur at that stage. In response to public comments we have made clearer what is out-of-band information for this version of the specification, which includes whether something is BED3+6 vs. BED6+3	Resolved		-	-
11	1.2 BEDn+: A file that has n fields of BED format	should change to: A file that has first n fields of BED format	update	JZ,TW			Okay, will clarify that it is the first n fields	Resovled - Contigent on review of change		DONE	-
12	1.2 BEDn+m: A file that has a custom tab-delimited format starting ...	Why here explicitly only mention tab-delimited? BEDn+m can not be whitespace delimited? what's special about BEDn+m comparing to BEDn+ or BED?		JZ	570		This was because we cannot control whether custom fields can be empty. If a custom field is empty, then single tabs must be used to recognize an empty field. In response to public comments, to avoid confusion we have removed this mention here.	Resolved		DONE	-
13	1.2 field: Data stored as non-tab text. All fields are 7-bit US ASCII.	Why only tab is excluded from field text? Should newline, linefeed, carriage-return be excluded as well? Maybe other non-printing characters should be excluded as well? Or ask the question differently, are there use cases where non-printing characters need to be used in a field text?		JZ (also brought up by others in GitHub PR)	570		In response to public comments, we have specified that fields are 7-bit US ASCII printable characters\footnote{Characters in the range '\x20' to '\x7e', therefore not including any control characters}	Resolved		DONE	-
14	1.3.2 Comment lines and blank lines	The BED specification does not make use of comment lines. Would it be feasible to introduce a header concept, similar to VCF and SAM format? header could be a good place to specify BED version, delimiter, number of BED standard fields, eg, BED5, BED8+ etc. It could also be used to add details about user-defined fields (if a user chooses to do so)		JZ,TW			Yes, see our response in cell G10	Resolved		-	-
15	Table on the top of page 3, regex for 'name' field: [^\t]{0,255}	The regex seems to be too broad. In addition to tab, shouldn't all non-printing characters be excluded?		JZ			Yes, we will exclude other non-printing characters	Resolved		DONE	-
16	1.6 Simple attributes	Should 'name' field be unique? Maybe as a recommandation for some scenarios?	JZ: after first PRC discussion, I withdraw this comment.	JZ			We will add that the `name` need not be unique	Resolved		DONE	-
17	1.7 Display attributes 7. thickStart	Should it be explicitly mentioned that thickStart can be empty?		JZ			We will change this so that the default value is 0.	1) Clarify what default 0 means. Respond to the comment in the column to the left. 2) Making default value 0 may make the BED file format too specific/restrictive.Beyond column 6 (including 7 and beyond) make the columns more general so that it is more inclusive of current uses. 3) For a first version trying to define the core specificaiton with the option to extend or add additional things is great, but the initial case should be kept to be generalizeable and compatible components.	We have removed the word "default" from the specification. Instead, we now recommend values for name, score, thickStart, thickEnd, itemRgb when the field is uninformative throughtout the file but the field must be provided due to the desire to have other standard BED fields. This is not necessary for chrom, chromStart, chromEnd because they are mandatory, it's not necessary for strand because there is a required uninformative setting, and not necessary for the blocks fields because they are the right-most of the 12 BED fields.	DONE	DONE
18	1.7 Display attributes 8. thickEnd	It is not clear whether it's permitted to have thickEnd specified but thickStart is empty.		JZ			We will change this so that the default value is `chromEnd`-`chromStart`	Please clarify the meaning of deafult.	We have removed "default"; see I17 above.	DONE	DONE
19	3.3 User-defined fields Each custom field should contain either one of the following data types or a comma-separated list of values of the same type	For String type, comma is allowed. How can it be distinguished between commas used as part of a String value and commas used as separator? Should String values be quoted? like in CSV format?		JZ			To avoid making this section very complicated in this version, we will exclude Strings from the comma-separated list	Resolved		DONE	-
20	3.5 Whitespace Though lines may use any kind of whitespace as a delimiter between fields, a single tab (\t) should be used.	Why not make this recommandation part of the BED specification, as mentioned before, it will make a lot things easier. If it's for backward compatibility reason to support other delimiters, is it possible to address it by specifying BED format version? For practical reason, in order for parsers to work with different dilimiters, there could be a legacy mode and v1 mode. If tab is detected in first data line, then it enters v1 more, otherwise legacy mode. Same delimiter should be used throughout the entire BED file. The other thought, v1 is probably the best time to introduce breaking changes. According to semver breaking changes pre-v1 are allowed.		JZ (also brought up by others in GitHub PR)			For this specification, we are still trying to stick to describing what is allowable due to the pecularities of the current UCSC definition. But this is definitly something that will be touched on in the next iteration of BED.	Resolved		-	-
21	Whitespace deliminator	Tabs are certainly the best separator. Should the spec go further than it does to push their use	Strengthen the use of tabs as a separator	AY (JunJun raised this too in his 1.3.1 comment and 3.5 whitespace comment)			Same response as above, cell G20	Resolved		-	-
22
23	Visualisation centric	A number of comments/fields call out visualisation recomemndations specific to the UCSC genome browser. Would these be better supported by grouping together the genome browser concens together? Such as 50MB files and the RGB colour space	Consider linking all genome browser issues/considerations together	AY,TW			We could, but we don't feel they are bad recommendations even for people not using the UCSC Genome Browser so would rather leave them where they are	PRC respects the history behind BED and the reasoning is understood. Would appreciate that in future specifications, the flexibility of BED is made clear in the specification to showcase there there are many different ways to use BED.		-	-
24	Cave entrance	Is there a plan for a more formal cave entrance into the spec? Will UCSC link to the spec? Will they retire their own documentation?	Detail how this spec becomes "the dominant spec" for BED	AY			UCSC are currently favorably disposed to this effort but have not always been—this is one reason why we are trying to keep backwards compatibility to the extent possible. We will ask them to do this once the specification draft has a stable home URL	Resolved		-	-
25	Guide/Best practices
26	Examples of encoding transcripts	The spec does not look at is how key data types should be serialised. Perhaps this is beyond the scope of the specification but from personal experience it's very clear how to encoding a protein coding transcript, it is less clear how to encode a non-coding transcript and an implementor is left looking at actual data from UCSC to discover this.	Consider writing an appendix which describes common patterns of data encoding/semantics that go beyond the basic specification	AY,TW			This is a good idea but seems like a big increase in scope for this specification and as you probably have seen it is harder to get agreement on these things than one might initially imagine. We could do this but I think the effort would be better spent on test suites or future stricter versions of the specification, and it is also something that one could easily do without it being part of this specification.	Resolved		-	-
27	Other usage examples	BED is heavily used in many contexts, including regulation data. Many groups build off of BED to provide ways of showing multiple data types. Examples, or links to existing documentation, would be useful.	Add examples, or links to existing documentation.	TW			Same response as above, cell G26	Resolved		-	-
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100