|Presentation content now available for download:||https://monash.figshare.com/projects/Monash_eResearch_Machine_Learning_Symposium_2018/30788|
|Session chair||Paul Bonnington|
|9:00 AM||Keynote||Geoff Webb||Monash University Centre for Data Science, Director||Machine Learning: A Foundational Tool for Twenty-First Century Research||Machine Learning is revolutionising business, industry, government and social interaction. It is also revolutionising research, becoming a foundational tool in the age of Big Data. This talk introduces machine learning, and provides examples of how it can support advanced research.|
Geoff Webb is a Professor in the Monash Faculty of Information Technology, Director of the Monash Monash Centre for Data Science and Scientific Director of the Monash eResearch Centre. He is a technical advisor to data science startups BigML and FROMMLE. He has been Editor in Chief of the premier data mining journal, Data Mining and Knowledge Discovery (2005 to 2014) and Program Committee Chair of the two top data mining conferences, ACM SIGKDD (2015) and IEEE ICDM (2010), as well as General Chair of ICDM (2012). Many of his learning algorithms are included in the widely-used BigML, R and Weka machine learning workbenches. He is an IEEE Fellow and his many awards include the inaugural Eureka Prize for Excellence in Data Science in 2017.
|9:40 AM||Language & Text||Gholamreza (Reza) Haffari||Monash University, Senior Lecturer in the Faculty of Information Technology||Training Deep Neural Networks with Minimal Supervision: The case of Language Understanding and Generation||Language Technology is being revolutionised by deep neural networks. In this talk, I cover some of our recent work in multilingual text understanding and generation. More specifically, we will revisit deep encoder-decoder architectures for translation, whereby the encoder 'reads' text in a source language, and the decoder 'generates' its translation in the target language. Although these deep neural architectures are powerful tools for modality transformation, they are notorious for their data hungriness. In this talk, I will describe our efforts in addressing the challenges of learning these architectures in scarce data scenarios.|
|10:00 AM||Language & Text||Wray Buntine||Monash University, Professor in the Faculty of Information Technology||Probabilistic Models for Learning||Text, emails, social media content, links and other so-called semi-structured data is a significant part of commercial data. An example application is to recommend items for purchase based on your purchase history, your profile, item descriptions, etc. We develop probabilistic models for both prediction and summarisation of semi-structured data.|
|10:20 AM||Industry||Jonathan Chang||Silverpond, Managing Director||Deep Learning: Applications for Human Perception and how to get started||Jonathan will be presenting on deep learning applications across various industries including asset management, power, sports and medical. He will dive into a case study on how Deep Learning is being applied to automate human perception and explore the difference it's made to the organisation.|
|10:40 AM||HPC Applications||Chris Watkins||CSIRO, Applications Support Specialist||A Classification Pipeline for Protein Crystallisation Imaging||The hullabaloo surrounding the recent successes of deep learning in image processing often fall short of the results obtained when applied to real datasets. Functional accuracies and reliable predictions are possible, but not without a mindful approach to pipeline development.|
Building machine learning pipelines that combine the various technologies available to today’s data scientist in a robust and repeatable manner is the core requirement when deploying automated image processing software. But with so many options, how can we ensure our data pipeline is accurate and that our deep tech is reliable?
Chris will give an overview of the development of a protein crystal image classification pipeline that has been autonomously deployed at CSIRO’s Collaborative Crystallisation Centre. Chris will speak about the practicalities of training data and touch briefly on the problem of continuous learning for improving model accuracy over time.
|11:00 AM||Morning Tea|
|Session chair||Lance Wilson|
|Jason Rigby||Monash University, ASPREE Data Science Coordinator, School of Public Health and Preventive Medicine||Making health data reportable: using AI to encode free-text drug records||Concomitant medications (conmeds) are routinely collected as part of a clinical trial, bringing greater context to observed outcomes and potentially revealing events of interest that could be missed through other means. This type of data is often collected in a free-text manner without reference to master drug list. Consequently, the captured drug data is nonstandard and contains spelling errors, extraneous information and a mixture of trade and generic drug names. We show how a simple neural network model trained on freely available drug data can assist humans in rapidly coding manually entered drug names to Anatomic Therapeutic Classification codes, which are an accepted standard of encoding a drug ingredient in terms of its target organ or system and therapeutic effect. In its present form, this system will be applied to aid in the manual drug coding process to finalise our conmed dataset for the ASpririn in Reducing Events in the Elderly clinical trial.|
|Dhanya Nambiar||Monash University, Research Fellow in the Pre-hospital Emergency Care Australia & New Zealand Centre for Research Excellence||Machine learning: Patterns and pathways to link administrative health records||Linking routinely collected administrative health records has the potential to significantly increase our understanding of diseases, health outcomes, health service utilization and health expenditure. State and national databases contain health records from general practice visits, emergency department presentations, hospital admissions, outpatient care and elective surgery lists. To protect patient privacy and ensure information security, these databases are created and stored in silos, with no unique patient identifiers to link them together and describe the patients journey through the health system. |
There are no standard protocols to link disparate databases. While some of this is due to limitations in the quality and comparability of datasets, there is a lack of automated processes for data cleaning and sequencing once patient pathways have been identified. Artificial intelligence provides an opportunity to maximize the use and capacity of administrative health records in research.
|11:25 AM||Health||Melita Guimmarra||Monash University, ARC DECRA Research Fellow in the Department of Epidemiology and Preventive Medicine||The utility of machine learning and text mining to expedite systematic reviews in injury recovery research||Systematic reviews are an enormously valuable method to understand the level of scientific evidence for a specific problem. However, the exponential rate of publication poses a major barrier to our capacity to conduct and update high quality systematic reviews in a timely manner. Several machine learning text mining tools have been developed to address this problem. Abstrackr is one such tool, hosted at Brown University, USA, that is a free web-based platform that uses an active learning algorithm to generate predictions of relevance from the words in citation titles, abstracts and keywords (using unigrams and bigrams). Abstrackr then sorts citations according to relevance, allowing researchers to quickly identify relevant articles, and reducing the need to screen articles with very low relevance. Previous studies have shown that Abstrackr is a useful tool to reduce the burden of conducting and updating systematic reviews in specific topics in health (e.g., genetics) without compromising sensitivity and specificity to identify relevant citations for full text review. |
We used Abstrackr to support screening in a systematic review examining the role of fault attributions in recovery from transport injury. A comprehensive search of five databases identified 10,559 citations. Two reviewers screened citations: one screened every citation for relevance (the “gold standard” method), and the second rated citations until a stopping prediction rule was met (no new predictions in Abstrackr). An overview of our experience in using Abstrackr and text mining for health research will be discussed, especially our learnings on the workload efficiencies, precision and false-positives observed from machine-learning assisted screening.
|11:45 AM||Health||Michelle Ananda-Rajah||Alfred Health, Consultant Physician General Medicine & Infectious Diseases||Closing the gap in the detection and diagnosis of fungal infections in patients with blood cancers using a machine learning based platform technology||Invasive fungal infections cause a life-threatening pneumonia in patients with impaired immunity. Hospitals spend millions of dollars on drugs to manage these infections but are unaware of the types of infections affecting their patients and their outcomes. Surveillance of fungal infections is not occurring in hospitals because fungi infrequently grow in the laboratory and manual surveillance is onerous. As a result, clinicians and hospitals cannot evaluate the effectiveness of preventative efforts, outbreaks may go unnoticed and tailoring therapy according to risk is hampered by the lack of large datasets for a rare disease. Variation is common in radiologist reporting affecting patient care and clinical trials. Our machine learning based platform technology incorporating natural language processing, deep learning based image recognition and the integration of clinical data in an expert system can address these performance gaps with benefits to patients, hospitals, clinicians and trial sponsors.|
|12:05 PM||Health||Richard Bassed||Monash Univeristy, Associate Professor of Forensic Medicine||Optimising image and PMCT databases for research at the Victorian Institute of Forensic Medicine||The Victorian Institute of Forensic Medicine (VIFM) is tasked with performing the medical investigations and ancillary tests relating to all deaths reported to the Victorian State Coroner. As a part of this investigation process various data is collected, including photographic imagery and CT scan data for every case. Part of the remit for the VIFM is to conduct forensic medical research – learning from the dead to benefit the living. |
The VIFM now has one of the world’s largest post-mortem computed tomography (PM-CT) databases containing > 70,000 CT full body scans and multiple high resolution optical photographs associated with each case. This PM-CT database sits alongside a case management system (iCMS) that can currently be searched for keyword causes of death.
The opportunities for the application of techniques such as deep learning to facilitate the answering of important research questions is extensive and has important medico-legal consequences.
The VIFM is looking for expressions of interest from research groups that can partner in grant applications to provide PhD students and post-doctoral candidates to help use artificial intelligence to solve image analysis, classification and measurement problems associated with our post-mortem databases.
|Session chair||Wojtek Goscinski|
|Pouya Faridi||Monash University, Immunoproteomics Laborator, School of Biomedical Sciences||Solving the spliced-peptides mystery by using machine learning techniques||Peptides bind to Human Leukocyte Antigens (p-HLA) and present on the cell surface work like a messenger to reports what is happening in the cell to the immune system. p-HLA derives from the intracellular proteins digestion. If a peptide sequence presented from a mutated region of the proteome (which could be the cause of cancer), then the immune system recognize the peptide on the cell surface and kills the tumor cell. For a long time, scientist believed that HLA-peptides, drive from just cutting proteins to small peptides by an enzyme called proteasome. However, recently it is discovered that proteasome not only cuts the proteins to peptides but also paste the resulted small peptides together and make new peptides called as “spliced peptides” which don’t have any template in the proteome. Understanding the rules of cutting and pasting peptides and prediction of spliced peptides sequences is a critical challenge in the “cancer vaccine design” field. We believe by using our empirical data and machine learning techniques; it is possible to discover splicing rules and ultimately predict spliced peptides sequences.|
|Liah Clark||Monash University, School of Biomedical Sciences||Identification and validation of public neoantigen targets||In a survey of the human mutation database, we have identified a mechanism, which we term ‘structural capacitance’ that results in the de novo generation of microstructure in previously disordered regions. These elements of protein microstructure are implicated in the pathogenesis of a wide range of human diseases, including cancer, and can generate neoantigens – epitopes that break immune tolerance. The aim of this project is to use machine learning to identify novel neoantigens from known somatic mutations, characterise their biochemical, biophysical and structural properties, paving the way for development of neoantigen-focused immunotherapies, including targeted antibodies, and cellular therapies.|
|1:05 PM||Medical Imaging|
|Sazzad Hossain||Monash University, Faculty of Information Technology||Application of Transfer Learning in Medical Imaging||The state-of-the-art prostate boundary identification procedures include manual segmentation of prostate gland in MRI images by skilled radiologists, which is painstaking and time-consuming as they go through MRI images slice-by-slice with careful visual inspection. In recent years, several works have been done on implementing deep neural networks to segment images automatically into some particular classes, which is also known as semantic segmentation. Literature says that if the deep neural net model can be trained properly, it can yield higher accuracy than any other algorithms in case of semantic segmentations. This is why this research adopts a pre-trained deep neural network model (VGG19) to perform semantic segmentation of MRI image slices in order to automatically localize the prostate gland and possibly make a 3D model of it for better diagnosis.|
|1:10 PM||Medical Imaging||Zhaolin Chen||Monash Biomedical Imaging, Head of Imaging Analysis Team||Deep learning in Magnetic Resonance and PET imaging||Recently, machine learning based algorithms have become the state of art methods in image classification. These machine learning methods are mostly driven by the advancement in a specific type of artificial neural network called convolutional neural networks. Medical image analysis is an emerging area for the application of deep learning. In this talk, I will introduce some of the applications of deep learning in medical imaging specifically Magnetic Resonance imaging. The talk will cover application of deep learning for brain tumor segmentation and application to MR/PET attenuation correction and MR image artefact correction.|
|1:30 PM||Physiology||Hsin-Hao Yu||Monash University, Research Fellow in the Department of Physiology||The visual cortex as a deep neural network||I will discuss two developments in my lab in Physiology that aims to bring neuroscience and machine learning closer. I will describe an effort to examine the coding of visual information in the brain, as well as the computational methods used to estimate the "convolutional filters" in higher order areas in the visual cortex. In addition, I will present initial results from a project that models the visual cortex as a deep neural network, using an architecture in which feedforwawrd and feedback connections interact to achieve efficient information processing.|
|1:50 PM||Computer Vision||Tom Drummond||Monash University, Professor in the Department of Electrical and Computer Systems Engineering||Machine Learning for Robotic Vision||Machine learning is a crucial enabling technology for robotics, in particular for unlocking the capabilities afforded by visual sensing. This talk will present research within Prof Drummond’s lab that explores how machine learning can be developed and used within the context of Robotic Vision.|
|2:10 PM||Computer Vision||Roby Santoro||Monash University, Institute of Transport Studies||Autonomous Car Map-Building and Localisation||State-of-the-art autonomous vehicles build a detailed 3D map of the environment in order to localise themselves and track other objects. By using TensorFlow’s object detection API in combination with a stereo camera, a variety of objects including traffic signs, cars and pedestrians can be detected in 2D images and placed within the 3D environment. This presentation will demonstrate the flexibility of TensorFlow’s object detection API, and how it may be easily implemented into almost any robotics project by using the ROS framework.|
|2:30 PM||Afternoon Tea|
|Session chair||Steve Quenette|
|2:40 PM||Computer Vision||Matt Kelcey||ThoughtWorks, Machine Learning Principal||Efficient robotic grasping using simulation and domain adaptation.||Data collection for training robotic grasping controllers is expensive in both time and price. Methods for making use of simulated data are very appealing as they reduce this expense dramatically, but often fail to generalise to a real world environment. GraspGAN is a application of pixel level domain adaptation that can generate synthetic training data good enough that we can reduce the amount of real world data required by a factor of 50.|
|3:00 PM||Industry||Simon See||NVIDIA AI Technology Centre, Director||Asia Pacific AI Initiatives|
|3:20 PM||Engineering||Richard Sandberg||The University of Melbourne, Chair of Computational Mechanics in the Department of Mechanical Engineering||Gene expression programming for improving turbulence models||CFD is becoming increasingly important in the design of gas turbines because correlation based methods are unable to further improve efficiency and laboratory experiments are prohibitively expensive. As first-principles based CFD is too computationally costly in a design context, RANS-based CFD is typically used where turbulence is modelled. However, the inaccuracies introduced by RANS limits the impact CFD can have on technology development. |
In this presentation, a novel machine-learning based approach is introduced that uses high-fidelity data to improve turbulence closures. It will be shown that closure models developed using the gene-expression programming approach outperform traditional models both for cases they were trained on and for cases not seen before.
|3:40 PM||Engineering||Will Nash||Monash University, Department of Materials Science and Engineering||Teaching a computer to detect corrosion||Corrosion costs roughly 3.4% of GDP, representing US$2.5 trillion worldwide per annum. The energy used to convert ores into alloys is constantly driving corrosion back to the oxidised state. Currently the most common corrosion assessment technique is visual inspection. Limitations to visual inspection include the requirement for expert opinions, subjectivity in this opinion, human error, and potentially hazardous access for inspectors. Our research is focused on using deep learning to automate the detection of corrosion. This presentation focuses on the challenges of deep learning for detecting rust, which has no defined form.|
|Ingrid McCarroll||Sydney University, Australian Centre for Microscopy & Microanalysis||Machine Learning for Atom Probe Tomography?||Atom probe tomography (APT) is an atomic scale materials characterisation technique. Utilising high-field emission, APT works by applying an electric field (between 10-60 V/nm), assisted by the application of a pulsed-laser for semi- and non-conductive samples, to a small needle shaped specimen. Ionised atoms at the sample tip are propelled through the electric field towards a multi-channel plate detector, where time-of-flight and x- and y-coordinates are recorded. Z-coordinates are calculated post-experimentation during the reconstruction process and are based on the sequence of events recorded at the detector. The end product of this process is an atomic 3D reconstruction of the sample from which valuable information as to the distribution of minor constituents within the sample, the grain boundary chemistry and more can be obtained. As the assortment of APT samples expands to include heterogeneous materials with complex field behaviour, the application of traditional reconstruction methods is no longer sufficient to produce highly accurate representations of the original samples. As such, the APT community is currently searching out new and novel methods of handling increasingly complex atom probe datasets.|
|Session chair||Amr Hassan|
|Andrew Fowlie||Monash University, Research Fellow in the School of Physics and Astronomy||Potential applications of machine learning in particle physics||We, the GAMBIT collaboration, perform statistical analyses of models in particle physics. We must determine whether points in a high-dimensional parameter space are forbidden or allowed by a variety of experiments, including searches at the Large Hadron Collider. This is a computationally expensive calculation and could benefit from classification algorithms in ML. Finally, we must visualise and understand our high-dimensional parameter space; this could benefit from clustering and dimensional reduction.|
|4:15 PM||Industry||Shenjun Zhong||Monash University, eResearch Centre; Telstra Big Data, Machine Learning Engineer||Deep Learning in Industry||In this presentation, I will be talking about the industry use case of deep learning, particully in the production pipeline. Rather than focusing on the model building and training, this talk will cover more about the serving part of the trained model and the way to scale up/down using the frameworks like tensorflow. The other part of the talk will talk about the AI trends (e.g. deep learning applications) in one of the particular domain, medical imaging.|
|4:35 PM||Business & Economics||Zahraa Abdallah||Monash University, Machine Learning and Data Science Research Fellow in the Faculty of Information Technology||Machine learning for accurate estimation of electrical device usage from smart meters data||Smart meters give us valuable insight into how electricity is used. However, the value is greatly increased if the information can be “disaggregated” into the consumption by each device or activity. This information is of value to customers, retailers, distribution companies and market operators. Providing customers with a timely disaggregation information helps them alter their behaviour to reduce their total energy consumption. Retailers and customers benefit from reduced “bill shock” when customers are told of reasons for consumption spikes. Accurate disaggregation information will help the retailers designing new tariffs, such as time-of-use tariffs and dynamic pricing. Another aim of energy disaggregation is to find trends in electricity usage. This information will support decision making in distribution companies by providing a better understanding of customers patterns, and in market operators by providing more accurate demand forecasts.|
A major challenge in disaggregation is the scarcity of labelled data. Sub-metering all devices is a straightforward yet expensive solution for collecting labels, and visual inspection by domain experts is time consuming and hence expensive. In this research project, we develop and deploy artificial intelligence algorithms for disaggregation of smart meter data using unlabelled or sparsely labelled data. The algorithm combines both domain expert knowledge and machine learning for continuous learning and incremental adaptation in order to gradually decrease the amount of manual intervention per house and also increase the overall accuracy and efficiency of the disaggregation.
|4:55 PM||Art & Design|
|Hannah Korsmeyer||Monash Art Design & Architecture (XYX Lab), Assistant Lecturer||GenderTron: exploring implicit gender bias||How can we use machine learning to provoke new conversations about gender? |
This talk will discuss a project that questioned the role of technology and design within complex social issues like gender inequality. Drawing from feminist theory and linguistics algorithms, this project developed a research device that monitored spoken ‘gendered language’ within a space and revealed these patterns back to users in real time. This device was implemented in a variety of settings and assessed through the interviews and documentary film. In addition to serving as a heuristic learning tool to generate complex discussions about gender identity, this device explored what additional research value can be gained when turning quantitative data into a qualitative experience.
|5:00 PM||Art & Design|
|Pamela Salen||Monash Art Design & Architecture (XYX Lab), Lecturer||Reporting Gender||News media plays a key role in shaping the public discourse and can have a profound influence on people’s attitudes, beliefs and behaviours. This current research has collected all news reports of violence against women and men and stories about gender—ranging from sexism to role models—from the ABC news online in 2016. By collating and reviewing these stories this study will contribute both a quantitative content analysis and a critical comparative visual and linguistic analysis with the aim to identify patterns and tendencies to reveal factors that may perpetuate violence, gender stereotypes and inequity in news media and beyond.|
|5:05 PM||Art & Design||Jon McCormack, Patrick Hutchings, Dilpreet Singh||Monash University, SensiLab||Creative AI at SensiLab||In this talk we'll give an overview of some of the research we're undertaking at SensiLab in the area of creative applications of AI. Recently there has been a lot of interest in applications of neural networks to creative tasks and applying machine intelligence to artistic and creative processes. We'll illustrate with some work we've been doing in visual arts and in music generation and talk about our future plans in this area.|
|5:25 PM||Closing||Paul Bonnington||Monash eResearch Centre, Director|