1 of 23

Data & Intelligent Systems (DAIS)

Professors Abdu Alawini, Arindam Banerjee, George Chacko, Kevin Chang, Pablo Robles-Granda, Jiawei Han, Jingrui He, Heng Ji, Daniel Kang, Yongjoo Park,

Jimeng Sun, Hari Sundaram, Hanghang Tong, Jiaxuan You, ChengXiang Zhai

2 of 23

Who are we? (incomplete list of DAIS faculty)

_{Kevin Chang}

_{Jiawei Han}

_{ChengXiang Zhai}

_{Knowledge Acquisition, Integration, & Mining}

_{Data Mining}

_{Machine Learning & Health Informatics}

_{Hari Sundaram}

_{Social Net.} _{Analysis, Crowdsourcing}

_{Hanghang Tong}

_{Data Mining}

_{Network Mining}

_{Yongjoo Park}

_{Data-intensive Systems}

_{Learning from Data Heterogeneity}

_{Jingrui He}

_{Machine Learning}

_{Spatio-temporal Data}

_{Arindam Banerjee}

_{Jimeng Sun}

_{AI for healthcare}

_{Heng Ji}

_Multi-Source _{Information Extraction}

_{George Chacko}

_{Scientometrics}

_{Daniel Kang}

_{Big Data & Machine Learning} _Systems

_{Pablo R. Granda}

_Intelligent _{Information Systems}

_{Abdu Alawani}

_{Educational Data Mining}

_{CS Education}

_{Jiaxuan You}

_{Graph-powered data and AI}

+ Lots of

students!

3 of 23

What are we working on? Data to Intelligence (“Big Data”)

Scalability

Intelligence

Application impact

Health/Medical/Biology

Education

Productivity (Web, email, …)

Decision making (government, business, personal)

Data/Information Access

Information analysis & Data mining

Browsing

Recommendation

Decision/Task support

Intelligent information agent

Gigabytes

Terabytes

Petabytes

Storage

Artificial Intelligence

Statistics

Systems & Networking

Parallel Computing

Human-Computer Interaction, Graphics

Bioinformatics

Theory & Algorithms

4 of 23

Big Data-Enabled Artificial Intelligence

Big

Data

Training Data

(Supervised) Machine Learning

Autonomous AI: Intelligent systems to replace humans

Observations of World

(Unsupervised) Machine Learning

Generative Models, Data Mining

Intelligence

Human

Assistive AI: Intelligent systems to assist humans

In particular, DAIS is closely related to AI with faculty collaborating with each other regularly. This and the following slide explain the connection between the two areas. Here it shows that the recent rise of AI has been mostly due to the growth of "big data", which enables AI in two somewhat different ways: 1) The data serves as training data to enable the use of supervised machine learning to train intelligent systems that can potentially replace humans, automating many tasks ("Autonomous AI"). 2) The data serves as comprehensive observations of the real world to enable the use of data mining, unsupervised learning, and generative models to derive intelligence and discover knowledge from the data, augmenting human intelligence. ("Assistive AI"). The area of AI tends to focus more on Autonomous AI, whereas DAIS tends to focus more on Assistive AI, though there is no real boundary between the two.

5 of 23

Autonomous AI vs. Assistive AI

Simple tasks, Big training data available, Limited domain

Intelligence(Machine) ≤ Intelligence (Human) [Upper Bound]

Assistive AI

Autonomous AI

Any domain, All kinds of data

“Data Scope” to enhance human perception, Complex tasks

Intelligence (Machine + Human) > Intelligence(Human) [Lower Bound]

Tasks that humans can’t (easily) do

(Augmentation of human intelligence)

Human in the Loop

To see the difference between DAIS and AI, it is useful to make a contrast between autonomous AI and assistive AI. Successful applications of autonomous AI require two conditions to be satisfied: 1) The task is relatively simple (or otherwise machines would not be able to (entirely) replace humans). 2) A large amount of training data is either available or easy to acquire. Thus the application impact is generally restricted only to those domains where the two conditions are satisfied. Moreover, leaving aside the computation power, the intelligence acquired by machines in this way generally does not exceed the intelligence of humans (e.g., a trained chatbot for answering customer questions would not be able to outperform the best human representatives). In contrast, assistive AI can make use of all kinds of data, both labeled and unlabeled and is applicable to any domain to augment human intelligence, which is especially valuable for optimizing decision making. Interestingly, although an assistive AI system may not appear to be intelligent by itself, such tools when combined with humans would enable the "combined intelligence" of machines and humans to exceed human intelligence (e.g., a user with access to a tool such as a search engine would be able to make more intelligent decisions than one without access). The two characteristics of assistive AI are the focus on tasks that humans can't easily do and the inherent involvement of humans in the loop (emphasizing human-AI collaboration). However, as autonomous AI techniques are the building blocks of assistive AI and assistive AI provides enables incorporation of human intelligence into autonomous AI, DAIS and AI naturally interact with each other closely.

6 of 23

Strong presence in multiple research communities: �Data Mining, Information Retrieval, Databases, Web, …

ACM SIGKDD

ACM SIGIR

ACM SIGMOD

Data Source: ACM Digital Library: https://dl.acm.org/sigs, Retrieved September 28, 2025

ACM SIGWEB

Name	Count
Tsinghua Univ.	683
Univ. of Amsterdam	611
UIUC	573

Name	Count
Microsoft Research	634
Microsoft Corporation	605
Tsinghua Univ.	587
UIUC	574

Name	Count
Tsinghua Univ.	675
UIUC	567

Name	Count
Tsinghua	492
Carnegie Mellon Univ.	398
Google LLC	379
UIUC	352

7 of 23

Prominent awards

Multiple ACM SIGKDD/SIGMOD Ph.D. Dissertation Awards /Runner-Ups
10-year highest impact paper awards
Best student paper awards, best paper awards, outstanding paper awards best posters/demos, …
IBM/Microsoft/Google/Yahoo/NSF/NDSEG Ph.D. Fellowships, ...

After graduation

Professors at U. Washington, UCLA (2), U. Michigan (2), Georgia Tech., NorthwesternU, UCSD, Purdue, UCSB, U. Virginia, Ohio St. U., Penn. St. U., Oregon St. U., NC State U., Notre Dame, U. UCDavis, HKUST, Delaware, Emory, Florida St. U., UT Arlington, Virginia Tech (2), Wayne St. U. (2), SMU (2), NUS, SFU, U. Alberta, Chinese U HK, Seoul National U., Yonsei U., POSTECH, Zhejiang U, N. Taiwan U, U Rochester…
Researchers at Amazon, META, SalesForce, IBM, MSR, Google Research/Google, Apple, Yahoo! Lab, Facebook, Twitter, LinkedIn, NEC Lab, Pinterest, Caterpillar, Boson AI…
Leaders in startups & IT industry (CTO of iPinYou, VP of Goldman Sachs, ...)

Come and Join a Strong Body of DAIS Students!�

8 of 23

Want to know more? Visit the DAIS website at

https://dais.cs.illinois.edu /

Data and Intelligent Systems (DAIS)

9 of 23

Abdu Alawini: Educational Data Mining and AI in Education

Current Interests:

Educational data mining and modeling
The applications of AI in education
Data provenance
Scientific data management

Echelon: An AI Tool for Clustering SQL Queries

TriQL: A tool for learning relational, graph and document-oriented database programming

A database approach to identifying collaborative learning behaviors

10 of 23

Arindam Banerjee: Spatio-Temporal Data Analysis

Current Interests:

Spatiotemporal, high-dimensional predictive models
Deep learning geometry, theory, deep generative models
Sequential decision making, smoothed analysis
Applications: climate science, ecology, recommendation systems

Ecology: Modeling plant traits, biodiversity

Climate Science: Predicting climate variables

Deep learning geometry

- Sparse, low-rank gradients

- Optimization, Generalization

Deep generative models

- Normalizing flows

- Variational inference

Sequential Decision Making

- Contextual bandits

- Smoothed analysis

11 of 23

George Chacko: Scientometrics and Networks

Current Interests:

The structure of the global scientific enterprise
The life cycle of research communities
Community detection in very large networks
Agent-based models and synthetic data

The Emergence of Scientific Ideas

Center-periphery Structure in Communities

12 of 23

Kevin Chang: Knowledge acquisition and information integration over structured and unstructured data

General Problems:

How to discover, extract, and synthesize knowledge for everyday work and life?
How to integrate information from structured/unstructured data in the world?

�Techniques: large language models, natural language processing, information retrieval, data mining, machine learning, and large-scale data analytics.

Current Projects:

LLMs-based Knowledge Agent: To automate knowledge-intensive tasks with LLMs.
World-Scale Information Integration: Discover, access, and integrate structured and unstructured data from the world, towards building online digital communities.
Next-Generation End-to-End Search: To help people find and acquire knowledge by searching large corpora such as the Web

�

13 of 23

Pablo Robles-Granda: Machine Learning and Health Informatics

General Problems:

ML Applications to human performance and health
Probabilistic and Statistical Machine Learning Modeling and methodology

�Techniques: Data Mining/ML, Graph Mining, Graphical Models, Nonlinear Dynamical Systems

�

Current Interests:

Graph Mining, Complex Systems
Probabilistic Models, Deep Learning
Applications to Cognitive/Brain Science, Physical Performance, Well-being
Digital Health, Causal Models in Health, Network Science

14 of 23

Jiawei Han: LLM for Text Mining and Science Applications

Text Mining & Knowledge Graph Construction
LLM for Text Structuring and Science Applications

LLM: Knowledge & Guidance

Scientific Text Corpora

Knowledge & Intelligence

General and domain-specific KB

Theme-focused Information Retrieval

Knowledge Graphs

PLMs + LLMs

Selected and Distilled Documents & Passages

Structured Passages

Theme-based Information Extraction

& Knowledge Graph Construction

Feedback& Refinement

15 of 23

Jingrui He: Learning from Data Heterogeneity

Research Theme

Traditional and modern AI centering around the modeling of the heterogeneous nature of real-world data via rich foundation models

Evaluation of Agentic Multimodal RAG

Heterogeneous MAS

Transformer Copilot for LLM Fine-tuning

Climate and Sustainability, National Security, Agriculture, Healthcare, etc.

16 of 23

Heng Ji: Multimodal Knowledge Extraction / �Knowledgeable Foundation Models / Science-Inspired AI�

Knowledgeable and Truthful Large Language Models and Vision-Language Models
Understanding and Controlling Attention and Hidden States
More fine-grained representations for Explainable Multimodal Understanding
LLM Knowledge Updating and Editing
Tool Learning and Creation
Never-Ending Information Extraction and Knowledge Base Population with LLM Web agents
Agentic LLMs to Reason, Plan and Act
Science-Inspired AI for Drug Discovery and Material Discovery
Creative Intelligence and Social Intelligence via Multi-agent and�Human teaming
Founding Director of Amazon-Illinois AI Center (AICE) and �CapitalOne-Illinois AI Center (ASKS)
Recent PhDs 🡪 Professors:

Qingyun Wang (2025, William&Mary)
May Fung (2024, HKUST)
Manling Li (2023, NorthwesternU)
Lifu Huang (2020, UCDavis)

17 of 23

Yongjoo Park: Systems for Data-Intensive Artificial Intelligence (AI)

Exploratory AI

Kishu: World's First Undoable Notebook

● Git-like versioning for exploratory AI

● Not just code: checkpoints code+data

LLM for Data (Retrieval-Augmented Gen)

LazyAttention for Efficient RAG

● enables position-oblivious caching, for the first time

● avoids repetitive processing (inside transformer blocks)

● allows extremely fast response time (time-to-first-token)

…

large

database

LLM with

LazyAttention

query

answer

any documents in any order

Also, building new vector databases for

● extremely parallel indexing/retrieval

● new forms of approximate nearest neighbor

No processing of these docs → fast response

18 of 23

Daniel Kang: ML + Data systems

We are building AI agents to solve data problems

20 of 23

Hari Sundaram: Social Network Dynamics

http://sundaram.cs.illinois.edu

How can social networks influence behavior at large scale?

sustainability

public health

exercising

traffic

In my group we develop theory, design algorithms, build systems and run experiments

algorithmic game theory, equilibria

large-scale network analysis

message synthesis, spatial codes

21 of 23

Research Theme: Understand and Utilize Graphs & Networks
Research Scope

Recent Foci:

Network-as-a-Context, Network Alignment, Diffusion, Anomaly Detection
Optimal Graph Neural Networks (optimization, construction & generation)
Trustworthy Network Learning (safety, explainability, robustness and fairness)
Graphs & LLMs (graphRAG, agentic reasoning, moral alignment, post training).

Hanghang Tong: Network Learning & Mining

22 of 23

Jiaxuan You: Graph AI and AI Agents

Data

Model

Human

Multi-modal data as graphs

Connect ubiquitous interdisciplinary data with graphs

Knowledge sharing with graphs

Organize knowledge from humans and AI agents

Graph data in AI

Neural networks, connected datasets & tasks

Foundational models

Foundation models with relational data and graphs

AI agent

Structured memory, Reasoning, and planning
Multi-agent systems
Autonomous research agent

Democratize AI

Open software and community
Accessible AI research

Human AI Collaboration

AI as copilots for humans
Aligning AI to human values

23 of 23

Applications

ChengXiang (“Cheng”) Zhai: Intelligent Information Systems

Medical decision support

Personalized nutrition

Affordable personalized learning at scale

Acceleration of

scientific discovery

Especially

Big Text Data

Intelligent Task Agent

2. Human-Like NLP

3. Human-AI collaboration

4. Computational user modeling

1. Big data analytics

5. Augmented Intelligence

1 of 23

2 of 23

3 of 23

4 of 23

5 of 23

6 of 23

7 of 23

8 of 23

9 of 23

10 of 23

11 of 23

12 of 23

13 of 23

14 of 23

15 of 23

16 of 23

17 of 23

18 of 23

19 of 23

20 of 23

21 of 23

22 of 23

23 of 23