AI Projects #5 / Project Ideas (Public)

	A	B	C	D	E	F
1		General Topic	Specific Topic	Comments	Paper Links >	Team Member

2	Uras Mutlu (EXAMPLE PROJECT)	Computer Vision	Image to Image Translation with GANs	GANs are widely used in image-to-image translation tasks and there are many successful works. We can reproduce the results of the related papers and we can try to improve the results by applying novel methods.
3	EXAMPLE PROJECT FROM AI PROJECTS #3	Anomaly Detection	Anomaly detection using generative models like Autoencoders, Variational Autoencoders and GANs.	Generative models are used mostly for computer vision applications. But they are also good at detecting anomalies. They can be trained in an unsupervised way or semi supervised way. By learning the distribution of normal behavior of the system, they can detect anomalous behavior by their reconstruction losses. In this project, we will create an anomaly detection model that detects cyber attack’s like DoS, Probing, U2R etc. Cyber attacks are our anomalous data in this setting and we will try to detect them by using generative models. (cyber swcurity knowledge is not required)Generative models have promising results in the field of anomaly detection and they are currently state of the art models in this field. We will also compare the performance metrics of these generative models with traditional ones. Training autoencoder models in semi-supervised and unsupervised way and comparing their results is also within the scope of the this project. GAN models will be tried if we have enough time. NSL-KDD dataset may be used for this project. P.s.: other type of anomalous data rather than cyber security data can also be used.	https://pdfs.semanticscholar.org/0611/46b1d7938d7a8dae70e3531a00fceb3c78e8.pdf https://ieeexplore.ieee.org/abstract/document/8386760
4		Natural Language Processing	Question Answering and Named Entity Recognition on academic papers using Bert and Electra models + Medical data	Question Answering (QA) is a field in the Natural Language Processing (NLP) and Information retrieval (IR). It aims to quickly give answers to the questions asked by users using the given data. In this context, the data will be the embeddings of the academic papers. We will use Bert and Electra models for paper embeddings and question answering. Bert and Electra models will also be used for named entity recognition on the academic papers, and resulting project will show named entities in the academic papers as well.	https://arxiv.org/pdf/2003.10555.pdf https://arxiv.org/pdf/1810.04805.pdf https://arxiv.org/pdf/1606.05250.pdf
5		Unsupervised learning and Statistical Models	Prediction next location, music pieces, text etc. using Variable Order Markov Models(VOMM)	Markov chains are used in a broad variety of academic fields, ranging from biology to economics. When predicting the value of an asset, Markov chains can be used to model the randomness. There are some works about VOMM in the paper section. First paper is about predicting next human location by using clustering techniques and VOMM algorithm. We can implement this paper, reproduce the results of paper and improve the scores.	- https://dl.acm.org/doi/pdf/10.1145/2676552.2676557 - Dataset for the above paper: https://www.microsoft.com/en-us/download/details.aspx?id=52367&from=https%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fdownloads%2Fb16d359d-d164-469e-9fd4-daa38f2b2e13%2Fdefault.aspx -https://www.researchgate.net/publication/51914402_On_Prediction_Using_Variable_Order_Markov_Models - https://ieeexplore.ieee.org/abstract/document/8104881/
6		Natural Language Processing and Image generation	Logogram language generation using multilingual embedding and pre-trained sketcher model (or maybe pre-trained concept learner - bayesian program learning)	Main purpose is building a system that takes a word or primitive sentence as an input and generating a sketch (hopefully, like ancient logogram languages or like alien language in the movie Arrival). Since the embedding is multilingual, equivalents of the input in different languages (hopefully) corresponds same outputs.	- https://arxiv.org/abs/1808.08933 - https://github.com/brendenlake/omniglot - https://arxiv.org/abs/1904.04399 - https://github.com/brendenlake/BPL - https://arxiv.org/abs/1502.04623
7		Object Detection	Anchor-free approaches in 2D Object Detection.	Object recognition is the task of recognizing the object and labeling the object in an image. The main goal of this project is to present a comprehensive study and improvement in the field of 2D object recognition. An object is recognized by extracting the features of object like color of the object, texture of the object or shape or some other features. Then based on these features, objects are classified into various classes and each class is assigned a name.	CornerNet CenterNet FreeAnchor CornerNet-Lite EfficientDet
8		Music Generation with Attention-based Networks	Music generation from MIDI datasets (may be polyphonic); using models like LSTM, Temporal Convolutional Networks, Temporal Convolutional Attention-based Network and comparsion of them.	Music generation is a proper topic in sequential modeling. Traditional sequential models like Bi-LSTMs were used for music modeling [1], also Temporal Convolutional and Attention Based Temporal Convolutional Networks were proposed for sequential based modeling [2,3]. We can reproduce this sequential-attention models for music modeling on polyphonic/homophonic datasets and compare this model's results.	[1] https://arxiv.org/pdf/1804.07300.pdf - https://arxiv.org/pdf/2002.03854.pdf [2] https://arxiv.org/pdf/1803.01271.pdf [3] https://arxiv.org/pdf/2002.12530.pdf - https://arxiv.org/pdf/2002.12530.pdf
9		Robot Navigation by Following Natural Language Directions with Deep Reinforcement Learning	By learning the state representation, the attention function, and control policies, deep reinforcement learning agent learns to execute previously unseen instructions described with a similar vocabulary, and successfully navigates along paths not encountered during training.	Understanding and following directions provided by humans can enable robots to navigate effectively in unknown situations. We present FollowNet, an end-to-end differentiable neural architecture for learning multi-modal navigation policies. FollowNet maps natural language instructions as well as visual and depth inputs to locomotion primitives. FollowNet processes instructions using an attention mechanism conditioned on its visual and depth input to focus on the relevant parts of the command while performing the navigation task. Maybe we can implement and reproduce the paper.	FollowNet: https://arxiv.org/pdf/1805.06150.pdf
10		Speaker Identification	The main topic is recognizing one's sound. We will search further if we can use this as a password. Is it safe enough to be a password? We will conduct further experiments on the result of the success rate.	Speaker identification is one of the main problems in speech processing. It tries to identify a person from voice characteristics. Like many other problems in signal processing, deep learning has accelerated the potential of spekear identification models. We will try to implement a robust speaker identification system to implement an user authentication application. We will use public datasets and try to beat state of art models.	http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ https://arxiv.org/abs/1706.08612 https://ieeexplore.ieee.org/abstract/document/8995509 https://arxiv.org/abs/2005.07817 https://ieeexplore.ieee.org/abstract/document/8721628/
11		Unsupervised Machine Translation	Extending the TransCoder model to create a decompiler	TransCoder translates codes from one programming language to another. The authors trained the model to make translations between C++, Python and Java and they stated that the model can be generalized to any programming language. We will try to translate assembly to C++, and make the improvements to get better results on this task.	TransCoder: https://arxiv.org/pdf/2006.03511v2.pdf
12		Natural Language Processing	Deep learning based speech synthesis	Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more attention. Recent advances on speech synthesis are overwhelmingly contributed by deep learning or even end-to-end techniques which have been utilized to enhance a wide range of application scenarios such as intelligent speech interaction, chatbot or conversational artificial intelligence	https://www.researchgate.net/publication/336113314_A_Review_of_Deep_Learning_Based_Speech_Synthesis
13		Computer Vision/Federated Learning	Federated Generative Adversarial Learning	Generative adversarial networks (GANs) have achieved advancement in various real-world applications, such as image editing, style transfer, scene generations, etc. However, like other deep learning models, GANs are also suffering from data limitation problems in real cases. To boost the performance of GANs in target tasks, collecting images as many as possible from different sources becomes not only important but also essential.Following the configuration of federated learning, we conduct model training and aggregation on one center and a group of clients. Specifically, our method learns the distributed generative models in clients, while the models trained in each client are fused into one unified and versatile model in the center.	https://arxiv.org/pdf/2005.03793.pdf https://yonetaniryo.github.io/papers/YTHU-CVPRW2019_nologo.pdf https://blog.openmined.org/how-gans-can-cause-a-privacy-breach-in-federated-deep-learning/ https://inst.eecs.berkeley.edu/~cs294-163/fa19/slides/federated-learning-in-practice.pdf
14		Computer Vision (Social Good Project)	Traffic Anomaly Detection	AI City Challenge Track #4: Detecting anomalies in traffic, including wrong turns, wrong driving direction, lane change errors, and all other anomalies, based on video feeds available from multiple cameras at intersections and along highways.
15		Natural Language Processing	Medical Question Answering	Question Answering (QA) is a field in the Natural Language Processing (NLP) and Information retrieval (IR). QA task basically aims to give precise and quick answers to given question in natural languages by using given data or databases. MedicalQA is domain-specific task. This task still remains unexplored in medical domain. We will explore medicalQA datasets and other datasets that are used in the Clinicial Decision Support Systems (CDSS) datasets and try to propose new methods in order to give precise answers in medical questions.	https://www.sciencedirect.com/science/article/abs/pii/S0306457315000515 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530755/ https://www.aclweb.org/anthology/D18-1258/ https://github.com/abachaa/MedQuAD
16
17		Computer Vision	Plant Pathology Detection	Misdiagnosis of the many diseases impacting agricultural crops can lead to misuse of chemicals leading to the emergence of resistant pathogen strains, increased input costs, and more outbreaks with significant economic loss and environmental impacts. Current disease diagnosis based on human scouting is time-consuming and expensive, and although computer-vision based models have the promise to increase efficiency, the great variance in symptoms due to age of infected tissues, genetic variations, and light conditions within trees decreases the accuracy of detection.	https://arxiv.org/abs/2004.11958 https://www.kaggle.com/c/plant-pathology-2020-fgvc7
18		Computer Vision	Kinship Image Generation	Synthesizing a realistic photo of a child by the photos of their parents. GANs are widely used in this area. Families in the Wild dataset can be used for this purpose. There are links of related papers in the next cell.	https://arxiv.org/pdf/1806.08600.pdf https://link.springer.com/content/pdf/10.1007/s42452-020-1949-3.pdf Another Dataset: http://parnec.nuaa.edu.cn/xtan/data/TSKinFace.html
19		Computer Vision	Action Recognition in videos	We explore the deep learning techniques on video tasks, specifically action recognition. We will use the UCF101 dataset. Aim is to build a model and recognize actions in videos. CNN and LSTM will be main approach. This project will be based on the following book - Hands-On Computer Vision with TensorFlow 2. Github repo is available. We can improve the model.	UCF101 dataset - https://www.crcv.ucf.edu/data/UCF101.php which was put together by K. Soomro et al. (refer to UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild, CRCV-TR-12-01, 2012) Github repo - https://github.com/PacktPublishing/Hands-On-Computer-Vision-with-TensorFlow-2
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100