1 of 27

26-Winter C&A Lab Seminar

Subspace Collision: An Efficient and Accurate Framework for High-dimensional Approximate Nearest Neighbor Search

JIUQI WEI et al. (accepted to SIGMOD 2025)

Presenter : Yunki Kim

2 of 27

Approximate Nearest Neighbor Search

3 of 27

 

Nearest Neighbor (NN) Search(Informal)

 

 

 

 

 

 

4 of 27

Subspace Collision: An Efficient and Accurate Framework for High-dimensional Approximate Nearest Neighbor Search

5 of 27

  • They designed SC-score that can act as a proxy for the Euclidean distance between data points.

  • They proposed a novel ANN search framework called Subspace Collision (SC) based SC-score. They perform rigorous mathematical analysis showing that the subspace collision framework can provide theoretical guarantees on the quality of its results.

  • They design SuCo, which achieves efficient and accurate ANN search based on their subspace collision framework.

  • They conducted extensive experiments, demonstrating
    1. Performing 1-2 orders of magnitude faster query answering with only up to one-tenth of the index memory footprint compared state-of-the-art ANN methods that can provide theoretical guarantees
    2. Top performance even when compared to methods that do not provide theoretical guarantees.

Contribution

6 of 27

Concept

  • Request1 : combine multiple subspace to alleviate an error.
  • Request2 : reduce the impact of distance ranking within a single subspace on the final result.

7 of 27

Definition 1 - 4

8 of 27

SC-Score

9 of 27

 

SC-Linear

 

 

1st data point

 

 

 

 

 

 

query

10 of 27

SC-Linear

11 of 27

Theoretical Guarantee

 

12 of 27

The SuCo Method

13 of 27

Index construction

14 of 27

K-mean clustering

 

 

 

15 of 27

Create Index

16 of 27

Query Answering

17 of 27

 

Query Answering

dim1\dim2

0

1

2

3

4

0

0.7(1)

1.0(3)

1.5

1

0.9(2)

1.2(4)

1.7

2

1.3(5)

1.6

3

1.8

4

18 of 27

Dynamic Activation

19 of 27

k-ANN Query

20 of 27

Experimental Evaluation

21 of 27

  • DET-LSH, DB-LSH, PM-LSH, and LCCS-LSH are the state-of-the-art LSH-based methods that provide theoretical guarantees. They are single-threaded because the authors did not design parallel methods.

  • DET-LSH
    • Jiuqi Wei, Botao Peng, Xiaodong Lee, and Themis Palpanas. 2024. DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search. Proceedings of the VLDB Endowment 17, 9 (2024), 2241–2254.
  • DB-LSH
    • Yao Tian, Xi Zhao, and Xiaofang Zhou. 2023. DB-LSH 2.0: Locality-Sensitive Hashing With Query-Based Dynamic Bucketing. IEEE Transactions on Knowledge and Data Engineering (2023).
  • PM-LSH
    • Bolong Zheng, Zhao Xi, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu, and Christian S Jensen. 2020. PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search. Proceedings of the VLDB Endowment 13, 5 (2020), 643–655.
  • LCCS-LSH
    • Yifan Lei, Qiang Huang, Mohan Kankanhalli, and Anthony KH Tung. 2020. Locality-sensitive hashing scheme based on longest circular co-substring. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2589–2599.

Previous Works 1

22 of 27

  • OPQ [38], Annoy [14], HNSW [67], and SPTAG [23] are the state-of-the-art VQ-based, tree-based, graph-based, and tree-graph hybrid methods, respectively, which do not provide theoretical guarantees.

  • OPQ
    • Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence 36, 4 (2013), 744–755.
  • Annoy
    • Erik Bernhardsson. 2015. Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk. https://github.com/spotify/annoy
  • HNSW
    • Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.
  • SPTAG
    • Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lintao Zhang, and Jingdong Wang. 2018. SPTAG: A library for fast approximate nearest neighbor search.

Previous Works 2

23 of 27

Multi-Sequence Alg. Vs Dynamic Activation

24 of 27

SuCo Vs Previous works (Indexing)

25 of 27

SuCo Vs Previous works (QPS)

26 of 27

  • Pros
    • They conducted extensive experiments, demonstrating
      1. Performing 1-2 orders of magnitude faster query answering with only up to one-tenth of the index memory footprint compared state-of-the-art ANN methods that can provide theoretical guarantees
      2. Top performance even when compared to methods that do not provide theoretical guarantees.

  • Cons
    • This theoretical guarantee depends on a data distribution, whereas LSH does not.

Discussion

27 of 27

Thank you.