[P] EVōC: Embedding Vector Oriented Clustering
I have written a new library specifically targeting the problem of clustering for embedding vectors. This is often a challenging task, as embedding vectors are very high dimensional, and classical clustering algorithms can struggle to perform well (either in terms of cluster quality, or compute time performance) because of that.
EVōC builds from foundations such as UMAP and HDBSCAN, redesigned, tuned and optimized specifically to the task of clustering embedding vectors. If you use UMAP + HDBSCAN for embedding vector clustering now, EVōC can provide better quality results in a fraction of the time. In fact EVōC is performance competitive in scaling with sklearn's MiniBatchKMeans.
Github: https://github.com/TutteInstitute/evoc
[link] [comments]
Want to read more?
Check out the full article on the original site