Inverted File Index (IVF)

This is based on the idea of Dirichlet Tessellation

Given a set of points

R_k = \{x \in X|d(x_i, P_k)\le d(x, P_j) \text{ for all } j \ne k\} $$ check Basically, each point is assigned to the closest centroid. We use this to limit the search of the $k$-nearest neighbor to the region of the centroid. The centroid are used as "seed" of a region, a cell. For each point in the plane i found the closest centroid and I assign it to it. I confront the query only with the points in the region it falls in. Of course, this has limits because the query could fall near the end of a centroid area. So, to solve this, we define the number of centroid to check (probe) as an hyperparameter. To find centroid we use $k$-means. ## Graph Search Proximity graph G(V, E) $V = X$ data points. $E$ are edges between similar nodes. Graph indexes exploits beam search to greedily explore the graph. So, you take that that all the nodes near the query, and those near those are near the query.