1 of 23

Data & Intelligent Systems (DAIS)

Professors Abdu Alawini, Arindam Banerjee, George Chacko, Kevin Chang, Pablo Robles-Granda, Jiawei Han, Jingrui He, Heng Ji, Daniel Kang, Yongjoo Park,

Jimeng Sun, Hari Sundaram, Hanghang Tong, Jiaxuan You, ChengXiang Zhai 

2 of 23

Who are we? (incomplete list of DAIS faculty)

Kevin Chang

Jiawei Han

ChengXiang Zhai

Knowledge Acquisition, Integration, & Mining

Data Mining

Machine Learning & Health Informatics

Hari Sundaram

Social Net. Analysis, Crowdsourcing

Hanghang Tong

Data Mining

Network Mining

Yongjoo Park

Data-intensive Systems

Learning from Data Heterogeneity

Jingrui He

Machine Learning

Spatio-temporal Data

Arindam Banerjee

Jimeng Sun

AI for healthcare

Heng Ji

Multi-Source Information Extraction

George Chacko

Scientometrics

Daniel Kang

Big Data & Machine Learning Systems

Pablo R. Granda

Intelligent Information Systems

Abdu Alawani

Educational Data Mining

CS Education

Jiaxuan You

Graph-powered data and AI

+ Lots of

students!

3 of 23

What are we working on? Data to Intelligence (“Big Data”)

Scalability

Intelligence

Application impact

Health/Medical/Biology

Education

Productivity (Web, email, …)

Decision making (government, business, personal)

Search

Data/Information Access

Information analysis & Data mining

Browsing

Recommendation

Decision/Task support

Intelligent information agent

Gigabytes

Terabytes

Petabytes

Storage

Artificial Intelligence

Statistics

Systems & Networking

Parallel Computing

Human-Computer Interaction, Graphics

Bioinformatics

Theory & Algorithms

4 of 23

Big Data-Enabled Artificial Intelligence

4

Big

Data

Training Data

(Supervised) Machine Learning

Autonomous AI: Intelligent systems to replace humans

Observations of World

(Unsupervised) Machine Learning

Generative Models, Data Mining

Intelligence

Human

Assistive AI: Intelligent systems to assist humans

5 of 23

Autonomous AI vs. Assistive AI

5

Simple tasks, Big training data available, Limited domain

Intelligence(Machine) ≤ Intelligence (Human) [Upper Bound]

Assistive AI

Autonomous AI

Any domain, All kinds of data

“Data Scope” to enhance human perception, Complex tasks

Intelligence (Machine + Human) > Intelligence(Human) [Lower Bound]

Tasks that humans can’t (easily) do

(Augmentation of human intelligence)

Human in the Loop

6 of 23

Strong presence in multiple research communities: �Data Mining, Information Retrieval, Databases, Web, …

ACM SIGKDD

ACM SIGIR

ACM SIGMOD

Data Source: ACM Digital Library: https://dl.acm.org/sigs, Retrieved September 28, 2025

ACM SIGWEB

Name

Count

Tsinghua Univ.

683

Univ. of Amsterdam

611

UIUC

573

Name

Count

Microsoft Research

634

Microsoft Corporation

605

Tsinghua Univ.

587

UIUC

574

Name

Count

Tsinghua Univ.

675

UIUC

567

Name

Count

Tsinghua

492

Carnegie Mellon Univ.

398

Google LLC

379

UIUC

352

7 of 23

  • Prominent awards
    • Multiple ACM SIGKDD/SIGMOD Ph.D. Dissertation Awards /Runner-Ups
    • 10-year highest impact paper awards
    • Best student paper awards, best paper awards, outstanding paper awards best posters/demos, …
    • IBM/Microsoft/Google/Yahoo/NSF/NDSEG Ph.D. Fellowships, ...
  • After graduation
    • Professors at U. Washington, UCLA (2), U. Michigan (2), Georgia Tech., NorthwesternU, UCSD, Purdue, UCSB, U. Virginia, Ohio St. U., Penn. St. U., Oregon St. U., NC State U., Notre Dame, U. UCDavis, HKUST, Delaware, Emory, Florida St. U., UT Arlington, Virginia Tech (2), Wayne St. U. (2), SMU (2), NUS, SFU, U. Alberta, Chinese U HK, Seoul National U., Yonsei U., POSTECH, Zhejiang U, N. Taiwan U, U Rochester
    • Researchers at Amazon, META, SalesForce, IBM, MSR, Google Research/Google, Apple, Yahoo! Lab, Facebook, Twitter,  LinkedIn, NEC Lab, Pinterest, Caterpillar,  Boson AI…
    • Leaders in startups & IT industry (CTO of iPinYou, VP of Goldman Sachs, ...)

Come and Join a Strong Body of DAIS Students!�

8 of 23

Want to know more? Visit the DAIS website at

https://dais.cs.illinois.edu/

Data and Intelligent Systems (DAIS)

9 of 23

Abdu Alawini: Educational Data Mining and AI in Education

Current Interests: 

  • Educational data mining and modeling
  • The applications of AI in education
  • Data provenance
  • Scientific data management

Echelon: An AI Tool for Clustering SQL Queries

TriQL: A tool for learning relational, graph and document-oriented database programming

A database approach to identifying collaborative learning behaviors

10 of 23

Arindam Banerjee: Spatio-Temporal Data Analysis

Current Interests: 

  • Spatiotemporal, high-dimensional predictive models  
  • Deep learning geometry, theory, deep generative models
  • Sequential decision making, smoothed analysis
  • Applications: climate science, ecology, recommendation systems 

Ecology: Modeling plant traits, biodiversity

Climate Science: Predicting climate variables

Deep learning geometry

- Sparse, low-rank gradients

- Optimization, Generalization

Deep generative models

- Normalizing flows

- Variational inference

Sequential Decision Making

- Contextual bandits

- Smoothed analysis

11 of 23

George Chacko: Scientometrics and Networks

Current Interests: 

  • The structure of the global scientific enterprise
  • The life cycle of research communities
  • Community detection in very large networks
  • Agent-based models and synthetic data

The Emergence of Scientific Ideas

Center-periphery Structure in Communities

12 of 23

Kevin Chang: Knowledge acquisition and information integration over structured and unstructured data

General Problems:

  • How to discover, extract, and synthesize knowledge for everyday work and life?
  • How to integrate information from structured/unstructured data in the world?

Techniques: large language models, natural language processing, information retrieval, data mining, machine learning, and large-scale data analytics.

Current Projects:

  • LLMs-based Knowledge Agent: To automate knowledge-intensive tasks with LLMs.
  • World-Scale Information Integration: Discover, access, and integrate structured and unstructured data from the world, towards building online digital communities.
  • Next-Generation End-to-End Search: To help people find and acquire knowledge by searching large corpora such as the Web

13 of 23

Pablo Robles-Granda: Machine Learning and Health Informatics

General Problems:

  • ML Applications to human performance and health
  • Probabilistic and Statistical Machine Learning Modeling and methodology

Techniques: Data Mining/ML, Graph Mining, Graphical Models, Nonlinear Dynamical Systems

Current Interests: 

  • Graph Mining, Complex Systems  
  • Probabilistic Models, Deep Learning
  • Applications to Cognitive/Brain Science, Physical Performance, Well-being
  • Digital Health, Causal Models in Health, Network Science 

14 of 23

Jiawei Han: LLM for Text Mining and Science Applications

  • Text Mining & Knowledge Graph Construction
  • LLM for Text Structuring and Science Applications

LLM: Knowledge & Guidance

Scientific Text Corpora

Knowledge & Intelligence

General and domain-specific KB

Theme-focused Information Retrieval

Knowledge Graphs

PLMs + LLMs

Selected and Distilled Documents & Passages

Structured Passages

Theme-based Information Extraction

& Knowledge Graph Construction

Feedback& Refinement

Feedback& Refinement

15 of 23

Jingrui He: Learning from Data Heterogeneity

  • Research Theme
    • Traditional and modern AI centering around the modeling of the heterogeneous nature of real-world data via rich foundation models

Evaluation of Agentic Multimodal RAG

Heterogeneous MAS

Transformer Copilot for LLM Fine-tuning

Climate and Sustainability, National Security, Agriculture, Healthcare, etc.

16 of 23

Heng Ji: Multimodal Knowledge Extraction / �Knowledgeable Foundation Models / Science-Inspired AI�

  • Knowledgeable and Truthful Large Language Models and Vision-Language Models
  • Understanding and Controlling Attention and Hidden States
  • More fine-grained representations for Explainable Multimodal Understanding
  • LLM Knowledge Updating and Editing
  • Tool Learning and Creation
  • Never-Ending Information Extraction and Knowledge Base Population with LLM Web agents
  • Agentic LLMs to Reason, Plan and Act
  • Science-Inspired AI for Drug Discovery and Material Discovery
  • Creative Intelligence and Social Intelligence via Multi-agent and�Human teaming
  • Founding Director of Amazon-Illinois AI Center (AICE) and �CapitalOne-Illinois AI Center (ASKS)
  • Recent PhDs 🡪 Professors:
    • Qingyun Wang (2025, William&Mary)
    • May Fung (2024, HKUST)
    • Manling Li (2023, NorthwesternU)
    • Lifu Huang (2020, UCDavis)

17 of 23

Yongjoo Park: Systems for Data-Intensive Artificial Intelligence (AI)

Exploratory AI

Kishu: World's First Undoable Notebook

Git-like versioning for exploratory AI

● Not just code: checkpoints code+data

LLM for Data (Retrieval-Augmented Gen)

LazyAttention for Efficient RAG

● enables position-oblivious caching, for the first time

● avoids repetitive processing (inside transformer blocks)

● allows extremely fast response time (time-to-first-token)

large

database

LLM with

LazyAttention

query

answer

any documents in any order

Also, building new vector databases for

● extremely parallel indexing/retrieval

● new forms of approximate nearest neighbor

No processing of these docs fast response

18 of 23

Daniel Kang: ML + Data systems

We are building AI agents to solve data problems

19 of 23

Jimeng Sun

20 of 23

Hari Sundaram: Social Network Dynamics

http://sundaram.cs.illinois.edu

How can social networks influence behavior at large scale?

sustainability

public health

exercising

traffic

In my group we develop theory, design algorithms, build systems and run experiments

algorithmic game theory, equilibria

large-scale network analysis

message synthesis, spatial codes

21 of 23

  • Research Theme: Understand and Utilize Graphs & Networks
  • Research Scope

  • Recent Foci:
    • Network-as-a-Context, Network Alignment, Diffusion, Anomaly Detection
    • Optimal Graph Neural Networks (optimization, construction & generation)
    • Trustworthy Network Learning (safety, explainability, robustness and fairness)
    • Graphs & LLMs (graphRAG, agentic reasoning, moral alignment, post training).

Hanghang Tong: Network Learning & Mining

22 of 23

Jiaxuan You: Graph AI and AI Agents

Data

Model

Human

Multi-modal data as graphs

  • Connect ubiquitous interdisciplinary data with graphs

Knowledge sharing with graphs

  • Organize knowledge from humans and AI agents

Graph data in AI

  • Neural networks, connected datasets & tasks

Foundational models

  • Foundation models with relational data and graphs

AI agent

  • Structured memory, Reasoning, and planning
  • Multi-agent systems
  • Autonomous research agent

Democratize AI

  • Open software and community
  • Accessible AI research

Human AI Collaboration

  • AI as copilots for humans
  • Aligning AI to human values

23 of 23

Applications

ChengXiang (“Cheng”) Zhai: Intelligent Information Systems

Medical decision support

Personalized nutrition

Affordable personalized learning at scale

Acceleration of

scientific discovery

Especially

Big Text Data  

Intelligent Task Agent  

2. Human-Like NLP  

3. Human-AI collaboration 

4. Computational user modeling  

1. Big data analytics 

5. Augmented Intelligence