Visualizing Word Embeddings as Discrete and Continuous Structures in VR
Understanding high-dimensional data using passthrough AR
The Challenge: Understanding High-Dimensional Data
Machine learning models generate embeddings in hundreds or thousands of dimensions. Traditional visualization flattens these into 2D plots using PCA, t-SNE, or UMAP, which loses critical spatial relationships and structural information that exists in the original high-dimensional space.
I study how people understand word embeddings in VR.
In my first project, I visualized 64 word embeddings as 2D and 3D point clouds across four categories like emotions and professions.
I found that 2D plots gave clearer clusters, while VR improved spatial intuition but still treated meaning as discrete groups.
Transition + Project 2 system
1
But this revealed a limitation — point clouds show where clusters are, but not how meaning changes between them.
2
So in my second project, I reframe embeddings as a continuous semantic field using density and gradients.
3
I built a VR system where flow lines show semantic direction and users can probe local category composition.
Results and Insights
I found that VR fields and flows reveal smoother transitions and more overlap between categories, especially in abstract concepts.
Compared to point clouds, this exposes semantic continuity that clustering hides.
Overall, the projects show that moving from discrete to continuous representations changes not just visualization, but how people interpret meaning in embedding space.
Come Try Them Out!