Visualizing ML Embeddings in Physical Space
Understanding high-dimensional data using passthrough AR
The Challenge: Understanding High-Dimensional Data
Machine learning models generate embeddings in hundreds or thousands of dimensions. Traditional visualization flattens these into 2D plots using PCA, t-SNE, or UMAP, which loses critical spatial relationships and structural information that exists in the original high-dimensional space.
Traditional 2D Projection
Why This Matters
• Cluster boundaries are ambiguous in 2D
• Distance relationships are distorted
• Outliers may appear grouped or vice versa
• Cannot walk around data to see structure from different angles
• Hard to build intuition about the embedding space
The Solution: Passthrough AR for 3D Embeddings
Passthrough AR provides metric grounding to physical space. Users can walk through 3D embeddings, use their body as a reference frame, and interrogate spatial structure naturally — improving understanding over 2D plots.
Current 3D Embedding Visualization
AR Advantages Over 2D
Physical Grounding
Room scale provides natural distance metrics
Natural Navigation
Walk, turn, crouch to explore from any angle
Spatial Memory
Physical location aids recall of data structure
True 3D Perception
Depth perception reveals structure lost in 2D
Schedule and Milestones
1
2/10-2/19: Generate Real ML Embeddings
Use a real ML model (CNN for image embeddings or transformer for sentence embeddings) to generate high-dimensional vectors from public datasets. Validate embedding quality and select representative samples.
2
2/19-3/03: Project & Render in Passthrough AR
Apply dimensionality reduction (PCA/UMAP/custom projection) to map embeddings to 3D coordinates. Render in Unity with Meta XR SDK passthrough, allowing users to see embeddings overlaid on their physical environment.
3
3/03-3/05: Comparative User Study
Run a controlled study comparing 2D plots vs passthrough AR on tasks like cluster identification and similarity judgment. Measure accuracy, time, and subjective understanding.
In-Class Activity
Comparative Visualization Experiment
Students will attempt to identify clusters in the same embedding dataset using two different visualization techniques, then compare their accuracy and experience.
2D Scatter Plot
Traditional screen-based visualization
Passthrough AR
Embeddings in physical space
Measured Outcomes
• Cluster identification accuracy
• Time to complete task
• Subjective preference ratings
• Understanding of spatial relationships
Wiki Contributions & Deliverables
Wiki Location: VR Visualization Software → Scientific Visualization → ML Embeddings in AR
Tutorial Document
• Step-by-step guide for visualizing ML embeddings in passthrough AR
• Unity setup with Meta XR SDK integration
• Code examples for embedding generation and 3D projection
Comparative Analysis Table
• Quantitative results: accuracy, completion time, user ratings
• Qualitative insights on spatial understanding
• Recommendations for when each method is most effective
Design Guidelines Page
• When AR helps vs when dimensionality overwhelms
• Best practices for embedding visualization in physical space
• Limitations and future directions
Technology Stack
PyTorch
ML Framework
Generate embeddings from pre-trained models (ResNet, BERT, etc.)
NumPy / Scikit-learn
Data Processing
Data processing and dimensionality reduction (PCA, UMAP)
Unity
AR Platform
3D rendering engine for AR visualization
Meta XR SDK
AR SDK
Passthrough AR capabilities for Quest headsets