|Session Title||Speakers||Description||Presenter's Bio/LinkedIn||Session Type||Level||Topic Categories||Session Length|
|Estimators - Managing and Versioning Machine Learning Models in Python||Simon Frid||Open-source libraries like scikit-learn, StatsModels and TensorFlow have made it very easy for developers and data scientists to implement cutting-edge algorithms in sandbox environments. However, most companies have production environments where there is a constant flow of new data, breaking changes with new product releases, and a need to use the best model for business purposes. Over the years, developers have built out sophisticated Continuous Integration systems to help version and manage their demanding production environments, but data science systems like these are up-and-coming. In this session, we will explore some open-source tooling and strategies that can help manage your real-world machine learning needs.||https://www.linkedin.com/in/simonfrid||Presentation||Intermediate||Platforms, Tools, Packages or Languages, Deployment & Model Management||1 Session (50 minutes)|
|A NLP Tool for Efficient Content Compilation||Boris Galitsky||We build a tool to assist in content creation by mining the web for information relevant to a given topic. This tool imitates the process of essay writing by humans: searching for topics on the web, selecting content fragments from the found document, and then compiling these fragments to obtain a coherent text. The process of writing starts with automated building of a table of content by obtaining the list of key entities for the given topic extracted from web resources such as Wikipedia. Once a table of content is formed, each item forms a seed for web mining. The tool builds a full-featured structured Word document with table of content, section structure, images and captions and web references for all mined text fragments.|
Two linguistic technologies are employed: for relevance verification, we use similarity computed as a tree similarity between parse trees for a seed and candidate text fragment. For text coherence, we use a measure of agreement between a given and consecutive paragraph by tree kernel learning of their discourse trees.
#naturalLanguageProcessing #webMining #wikipediaMining #textGeneration #parseTrees #treeKernelLearning #discorseTrees
Boris Galitsky has been contributing natural language-related technologies to Silicon Valley start-ups over last two decades. In 1999 he co-founded iAskWeb which was providing tax and investment recommendations to customers of a few Fortune 500 companies. He contributed his linguistic technology to Xoopit, acquired by Yahoo, Uptake, acquired by Groupon, LogLogic, acquired by Tibco, Zvents, acquired by eBay, and Elastica, acquired by Symantec.
He received his PhD in natural language understanding in 1994 and ANECA/EU Associate Professorship degree in 2011. Boris authored more than 80 publications, two books and multiple patents in the field of search and natural language understanding. Boris is a contributor to OpenNLP, where he leads the development of Similarity component, implementing machine learning of syntactic parse trees for search, text classification and generation.
|Presentation||Advanced||Data Mining, Machine Learning, Natural Language Processing||1 Session (50 minutes)|
|Lifecycle of Model Management||Greg Makowski||Discuss the lifecycle, from selecting and describing the best model, to putting it in production, to recognizing it needs replacement and how much effort to put into the replacement. You don't have to trade-off: accuracy, generalization and description, you can have all three. For model description, sensitivity analysis and LIME (from KDD 2016) will be discussed.|
#modelGeneralization #modelDescription #retrain #refresh
Have been deploying data mining models since 1992.
|Presentation||Intermediate||Data Science, Modeling, Deployment & Model Management||1 Session (50 minutes)|
|Scalable computing in R using Spark||Vanja Paunić|
|My colleague Mario Inchiosa and I (Vanja Paunic) will present on scalable computing in R. A very rough outline on what we planned is:|
- R Server vs. base R comparison
- Demo 1:
o Data wrangling with SparkR
o Modeling with ScaleR
o Deployment operationalization using AzureML
- Demo 2:
o Parameter optimization in grouped time series forecasting with R Server on HDInsight cluster
|www.Linkedin.com/in/VanjaPaunic||Software Demo||Beginner, Intermediate||Data Science, Platforms, Tools, Packages or Languages, Scaling||1 Session (50 minutes)|
|IIoT Analytics (aka Digital Twin)||Robert Benson|
|Robert Benson of Mitek Analytics will provide an overview of the Industrial Internet of Things (IIoT) field, IIoT Analytics, and digital twin. Rich Dost of General Electric Digital will provide an overview of developing digital twin analytics and an overview of the General Electric Predix platform for IIoT. Digital twin is a relatively new term that has become roughly synonomous with IIoT analytics. We will also provide an opportunity for people interested in IIoT and IIoT analytics to meet and learn from each other.||For Richard Dost of GE: I've been a developer for thirty years now, and I've improved. I'm better now than a couple years ago. Then I was better than a few years earlier. And so on, back thirty years. At GE I've helped develop Predix Asset. First I rewrote the entire UI in the evenings, garnering much applause, and some emnity too, and praise from THEM. Our reference customer said "Awesome!". A peer declared me "GE's coolest new hire ever!" And I received a special GE award for "...ability to think outside the box, solve hard problems... ideas and delivery set you apart...". Nice stuff. The next year I completed 25% of the stories, as one of 18 in the group, while leading the successful UI effort, meeting a schedule the principle UI architect had declaimed as impossibly demanding. That was recognized too. "You are an army!" stated one fellow developer. Last year, things of Quality I produced included the Asset Scripting Engine<https://github.com/PredixDev/predix-asset-scripting-engine>, inspired by lambda, and patented by GE. It was the best. Yet Spotlight, a whiz-bang, quick and dirty D3 based tool for GEL query visualization in asset networks, gained the most notice. Now I’m one of 23 Predix Builders. How did that happen?|
For Robert Benson: https://www.linkedin.com/in/rbensonpaloalto
|Presentation||Beginner, Intermediate, Advanced||Data Science, Modeling, Deployment & Model Management, Application||1 Session (50 minutes)|
|Personalizing Education using AI - Deep Learning.||Athanasios Ladopoulos||Building an intelligent learning platform that learns how students learn and then teaches them back in the way they learn better and faster.||https://ch.linkedin.com/in/athanasiosladopoulos||Discussion||Beginner||Machine Learning, Deep Learning, Competitions, Product Demo||Lightning Talk (5-10 minutes)|
|Running Models and their Applications in Production||Dr. Iman Saleh||In this session, Dr. Iman Saleh of Intel will lead a discussion about how models are created and deployed for production and made available to the applications who consume them. As a part of the discussion, Iman will give an overview of the Trusted Analytics Platform (TAP), an open source platform, for building and deploying analytics solutions. The platform hosts machine learning scripts, models and the applications that use them. This session will include both presentation and discussion.||https://www.linkedin.com/in/iman-saleh-ph-d-53a2413||Discussion||Intermediate||Machine Learning, Modeling, Platforms, Tools, Packages or Languages, Deployment & Model Management, Application, Product Demo||1 Session (50 minutes)|
|Open Source Model Dev, Train & Production||Ling Yao||In this session, Ling Yao of Intel will give an overview of the open source frameworks and libraries that Intel uses to develop and train models as well as putting them into production. Included will be a discussion about how Jupyter Notebooks and other code environments can be used to develop and test models using data processing frameworks such as Spark, GearPump, file system HDFS, databases such as Postgres, Cassandra, Redis and algorithms such as ARIMA for time series analysis, while exposing the resulting model to production applications. This session will include a 15 minute presentation followed by a 35 minute discussion.||Ling Yao is a product manager for the Trusted Analytics Platform at Intel. She has a background with large-scale enterprise apps (web/mobile) and data analytics. Ling has an MBA and MS in Computer Science with specializations in Data Science (John Hopkins) in progress.||Discussion||Intermediate, Advanced||Machine Learning, Modeling, Platforms, Tools, Packages or Languages, Deployment & Model Management, Scaling, Application||1 Session (50 minutes)|
|Classification vs Information Extraction in AI. The different approaches in the quest for language understanding.||Michal Wroczynski||CEO & Co-founder of fido.ai||Michal Wroczynski is the CEO and co-founder of Fido AI. He is a futurologist, MD, cognitive behavioral therapist, and entrepreneur. Michal has founded several startups, including Fido Interactive, Fido Intelligence, Intermed, and Medical Web Design, building over 40 AI systems for Fortune 500 companies and governments over the last 20 years. He is currently CEO of Fido AI, an AI lab that has pioneered a new approach for language understanding, enabling information extraction from any unstructured text, without the need for humans labeling or system training.|
|Presentation||Beginner, Intermediate, Advanced||Machine Learning, Deep Learning, NLP / NLU, AI||1 Session (50 minutes)|
|Traffic Signs Recognition with Tensorflow||Waleed Abdulla||A tutorial for those who know the basics of machine learning but want to learn how to apply it to a practical problem. The chosen problem here is to recognize traffic signs in images taken from a moving car.||https://www.linkedin.com/in/waleedka||Workshop||Intermediate||Deep Learning||1 Session (50 minutes)|
|Effective Graphs With R||Sujee Maniyam||R as a fantastic language for data analytics. It also has pretty amazing graphing capabilities. This session will introduce graphing capabilities of R language.|
Come prepared with R Studio installed, so you can also practice along.
|https://www.linkedin.com/in/sujeemaniyam||Workshop||Intermediate||Data Science, Platforms, Tools, Packages or Languages, Visualization||2 Sessions (100 minutes)|
|Introduction to ROC curves||Robert M. Horton||"Receiver Operating Characteristic" or "ROC" curves are an important and widely used tool for characterizing the performance of binary classifiers. This introductory/intermediate level presentation will help you build a better intuitive understanding of the ROC curve as a way to visualize the tradeoffs between sensitivity and specificity. Using examples and simulations, we will show how these curves are constructed, and develop a better understanding of what they are telling us. We'll examine a variety of "funny looking" ROC curves to see what they reveal about the relationships between model predictions and observed outcomes, and contemplate AUC (the area under the curve) as a way to summarize an ROC curve in a single number (and explore some cases where AUC turns out to be a poor metric for model selection.) We'll also look at some related alternative visualizations.||https://www.linkedin.com/in/robertmhortonphd||Presentation||Beginner, Intermediate||Data Science, Machine Learning||1 Session (50 minutes)|
|Where are the "real" machine learning||Jessie||This is not to volunteer myself, but to express interests in hearing how people actually do large machine learning in real production environments. Machine learning is a hot buzz word these days, but there has not been much info re. how serious machine learning has been done aside from the giant companies like Google, Apple, Facebook, and Amazon - info from these companies is often sketchy as well. I am happy to help moderate. But the key of making this session a success is if 3-4 people step up to share how they do large scale machine learning for real applications/systems||n/a||Idea/Suggested Speaker||Intermediate, Advanced||Machine Learning, Deep Learning, Modeling, Application||1 Session (50 minutes)|
|Deep Learning for Sensing and Understanding||Dr. Omar U. Florez||Given impressive results in supervised learning, deep learning has created building blocks for more sophisticated problems in pursue of the ultimate goal of imitating higher-order human capabilities: reasoning, communication, and anticipation. We denominate this process as understanding to differentiate from sensing. In this talk we will talk about the future and existing challenges for developing understanding with deep learning.||https://www.linkedin.com/in/omar-u-florez-35338015||Presentation||Beginner, Intermediate||Machine Learning, Deep Learning, NLP / NLU, Application||1 Session (50 minutes)|
|Graph processing with Spark GraphX||Sujee Maniyam||Graphs are ever more important now - think Facebook friends , LinkedIn connections. Latest Big Data technologies now enable processing massive graphs in a distributed fashion.|
This session introduces students to Spark's graph processing library 'GraphX'. We will show how to start using GraphX and show how to solve real world graph problems.
To get most out of this class, please come with Spark pre-installed on your laptop.
|https://www.linkedin.com/in/sujeemaniyam||Workshop||Intermediate||Data Science, Data Mining, Platforms, Tools, Packages or Languages||2 Sessions (100 minutes)|
|Data Science and Quantitative Finance||Bill Paseman||What tools does data science provide the quant? Can a Neural Network help calculate parameters for established valuation methods? Funds based on Sentiment analysis have had little general success, but have made money in special situations (e.g. Brexit). What other examples are there? In this session participants will discuss the success of Data Science-based quant approaches they have used or heard about.||https://www.linkedin.com/in/paseman||Discussion||Intermediate||Data Science, Deep Learning, NLP / NLU, Finance||1 Session (50 minutes)|
|Knowledge Representation Goals and Requirements||Robert (Bob) Kirby||Knowledge Representation choices drive the general purpose computing needed for Artificial Intelligence. Classes of use cases derive requirements for Knowledge Representation. A normative evaluation of Knowledge Representation considers goals and those requirements regardless of the inspiration for its client algorithms, such as studying the brain or noticing what smart people do. Knowledge Representation is evaluated for associative (sub-symbolic) approaches and logic-based approaches. Issues for crowdsourcing, natural language, and uncertainty get added attention.||Bob Kirby received a Ph.D. degree in Computer Science from the University of Maryland, College Park. He continues an independent project in Knowledge Representation using his commercial background in Artificial Intelligence (Image Understanding), Expert Systems, and software development. He researches and develops an explicit representation for common sense knowledge and natural language syntax and semantics using primitive Curried predicates in high-order logic expressions for conditional probabilities.||Presentation||Intermediate||Data Science, AI||1 Session (50 minutes)|