model list

	A	B	C	D	E
1	what we have now	cnn+rnn(LSTM)	paper: zhjohnchan/awesome-radiology-report-generation
2
3		task	dataset	code	paper
4	co-attention	CNN catch image features then predict tags(VGG-19 model). match image feature and semantic features through co-attention. Words of each sentences are generated by LSTM Evaluation	chest x-ray images: There are approximately 3955 patients' text reports, as well as 7,471 side and front views of patient X-ray images. Most text lengths are 0-5, with a smaller number of reports over 10 text lengths.	A pytorch implementation of On the Automatic Generation of Medical Imaging Reports.	https://arxiv.org/pdf/1711.08195.pdf
5	transformer decoder	Features are extracted from the image, and passed to the cross-attention layers of the Transformer-decoder.	Flickr8k	Image captioning with visual attention	https://arxiv.org/pdf/1502.03044.pdf
6	transformer decoder	RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting. Based on the architecture featured in Attention Is All You Need. The model is a transformer-based CNN-RNN	This network is trained and validated on the MIMIC-CXR v2.0.0 dataset.	farrell236/RATCHET: RAdiological Text Captioning for Human Examined Thoraxes	[2107.02104] RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting
7	Weakly Supervised Contrastive Learning	ResNet50 convolutional neural network is used as a feature extractor. Extract the embeddings of each report from ChexBERT and then apply K-Means to cluster the reports into K groups, guiding contrastive learning process during training. leverage a memory-driven transformer proposed in (Chen et al., 2020b) as backbone model		GitHub - zzxslp/WCL: Code for Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation (EMNLP-21)	Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation
8	Memory-driven Transformer decoder	Resnet is used as a feature extractor. A standard encoder from Transformer is used as encoder. The main contribution is memory-driven transformer decoder: they add Memory-driven Conditional and Relational Memory base on base tranformer model	MIMIC-CXR and IU chest	GitHub - zhjohnchan/R2Gen: [EMNLP-2020] The official implementation of Generating Radiology Reports via Memory-driven Transformer.	Generating Radiology Reports via Memory-driven Transformer
9	Multi-Attention and Incorporating Background Information Model	The architectural of model consists of four sub-modules, a multi-attention module, a sentence RNN module, a background information fusion module, and a word RNN module. The attention layer is used in image feature.	IU chest		Multi-Attention and Incorporating Background Information Model for Chest X-Ray Image Report Generation \| IEEE Journals & Magazine
10	Lesion-Centric Feature Extractor	extract features from bounded lesion regions base on R-CNN. learn from the lexical embeddings of novel diseases to guide the visual feature learning, the lexical embeddings is obtained by a pretrained word embedding model such as BioBert (its both has both seen and novel diseases). For generating part, they merge lesion guided visual and semantic features to improve the generating. The aim of merge semamtic features is to process novel disease.	FFA-IR includes annotations of 46 categories of lesions including 315 cases with 12,166 lesion regions. (annotation datasets, may try lable IU chest data by Weakly Supervised Contrastive Learning)		https://arxiv.org/abs/2210.02270
11	PubMedBert decoder	Vit encoder and PubMedBert decoder, Beam search was also used. PubMedBert is pretrained in only medical field resouces. They pretraining model on ROCO and then use Teacher Forcing and Self Critical Sequence Training to fine-tuning the model on task dataset. (we cannot train the model from 0, consider the fine-tuning method)	ImageCLEFmed Caption task of 2021 dataset		https://ceur-ws.org/Vol-2936/paper-109.pdf
12
13	Transformer
14	Learning to Generate Clinically Coherent Chest X-Ray Reports	image feature: CNN DenseNet-121 layers. word embedding: Word2Vec differentiably sample a report from our model and extract the clinical observations from that report. Then these additional learning objective used to fine tune.	MIMIC-CXR	justinlovelace/coherent-xray-report-generation	Learning to Generate Clinically Coherent Chest X-Ray Reports - ACL Anthology
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100