BetterUp CANDOR Corpus Data Dictionary

	A	B	C
1	file	data release flavor	description
2	cb18ac98-a58c-4599-8bc8-461dab667322	--	The folder name is a random unique ID used to identify a specific conversation.
3	├── audio_video_features.csv	no_media	Acoustic and visual features calculated at a 1-second frequency throughout the conversation.
4	├── metadata.json	no_media	Metadata for the conversation including speaker IDs, audio channels, and raw recording information.
5	├── processed	processed_media	Folder containing processed video files, audio files, and metadata.
6	│ ├── 57272517e386b9000e3e2b72.mp4	processed_media	Processed video for a single speaker (file name is <user_id>.mp4).
7	│ ├── 5cb60f1eefd4130001bbfdb7.mp4	processed_media	Processed video for the other speaker (file name is <user_id>.mp4).
8	│ ├── cb18ac98-a58c-4599-8bc8-461dab667322.mp3	processed_media	Processed audio file of the conversation containing both speakers, one in the left channel and the other in the right channel (see channel_map.json). File name is <conversation_id>.mp3.
9	│ ├── cb18ac98-a58c-4599-8bc8-461dab667322.mp4	processed_media	Processed video file of the conversation containing both speakers side-by-side; audio for one in the left channel and the other in the right channel (see channel_map.json). File name is <conversation_id>.mp4.
10	│ ├── channel_map.json	processed_media	Metadata that maps a specific user id to the left and right audio channels for the processed video and audio files, and the side of the frame the speaker is on for processed video where the speakers appear side-by-side.
11	│ └── thumbnail.png	processed_media	An image thumbnail of the processed video.
12	├── raw	raw_media	Folder containing the raw recordings of the conversations (separate video files for each speaker); may be more video files if speaker disconnected and rejoined the conversation.
13	│ ├── 1c5b8b8e-4bf3-47b7-bb02-b20132edb41e.mkv	raw_media	Raw recording stream file of a single speaker. Associated metadata for user id, and start/stop times is in raw/<conversation_id.json>. File name is <stream_id.mkv>.
14	│ ├── 97ee4cbb-b70b-4fb4-8d64-a468efc560e0.mkv	raw_media	Raw recording stream file of a single speaker. If there are two files, this is the other speaker. If more than two, it could be either speaker. Associated metadata for user id, and start/stop times is in raw/<conversation_id.json>. File name is <stream_id.mkv>.
15	│ └── cb18ac98-a58c-4599-8bc8-461dab667322.json	raw_media	Raw metadata for each of the stream files in the raw folder.
16	├── survey.csv	no_media	Answers to pre- and post-conversation survey questions by the participants in the conversation (if available).
17	├── transcription	no_media	Transcription of the conversation.
18	│ ├── transcribe_output.json	no_media	The raw JSON output returned from the AWS Transcribe service for this conversation; contains token-level timings and confidences.
19	│ ├── transcript_audiophile.csv	no_media	Transcript of the conversation compiled from transcribe_output.json into a CSV with a row for each turn in the conversation.
20	│ ├── transcript_backbiter.csv	no_media	Transcript of the conversation using the 'backbiter' method as described in the paper.
21	│ └── transcript_cliffhanger.csv	no_media	Transcript of the conversation using the 'cliffhanger' method as described in the paper.
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100