Skip to main content

· 2 min read
ChatGPT
Joe

Some people who have experience in ANN (Approximate Nearest Neighbors) search might be questioning why we use HNSW, because SQ, PQ, or even brute force can be faster than HNSW in small datasets, so we thought we'd write a quick blog post about it.

Before delving into the main topic, let's briefly explain "SQ" and "PQ" for those unfamiliar with these terms. "PQ" refers to Product Quantization, a technique used to compress high-dimensional vectors into compact codes, which can then be used for approximate nearest neighbor search in a compressed domain. On the other hand, "SQ" refer to Scalar Quantization, which involves representing high-dimensional vectors using a lower-dimensional space, also aiming to facilitate efficient similarity search.

Yes, at different data scales, the HNSW algorithm might not be the optimal solution. The core value of CloseVector is not to achieve the optimal solution in computational performance at different scales, but to provide a set of solutions for running a vector database locally. This kind of solution might be suitable for scenarios that are data-sensitive and require relatively strong scalability (CloseVector relies only on local storage or CDN storage). These scenarios might not necessarily require the participation of a server-side vector database, for instance, when you need to index all local images, or index all local documents, as long as the local device can accept the running, storage, and transmission costs.

As for why CloseVector chooses HNSW, it is because firstly, HNSW performs acceptably at different data scales; and secondly, the HNSW algorithm is simple enough and there are mature open-source libraries available, which can conveniently support subsequent versions of CloseVector in languages like Python, Swift, Kotlin, etc.

In scenarios with smaller scales, the performance of HNSW should not have a noticeable impact within the user-perceptible range. If CloseVector needs to optimize operational efficiency in the future, it can optimize the serialization structure and then adopt different algorithms at different scales.

· 2 min read
ChatGPT
Joe

We're pleased to present the alpha release of CloseVector, a portable vector database designed with machine learning applications in mind. If you've been considering building an app that interacts with articles or PDFs, or if you're in need of a recommendation system for limited candidates, we hope CloseVector can be a potential solution for you.

What is CloseVector?

CloseVector is fundamentally a vector database. We have made dedicated libraries available for both browsers and node.js, aiming for easy integration no matter your platform. One feature we've been working on is its potential for scalability. Instead of being bound by server limitations, CloseVector's vector index operates directly on the user's machine, aiming for more efficient performance.

The foundation of CloseVector is built on the HNSW algorithm. We've integrated the hnswlib to ensure compatibility across various platforms, from browsers to node.js. We're continuously looking to enhance and expand the capabilities of CloseVector.

Starting with CloseVector

For those interested, we've put together a tutorial. This guide provides a step-by-step overview on using closevector-node for text embeddings and vector storage. Some of the tutorial highlights include:

  • How to install closevector-node using npm.
  • Generating your first access key for authentication.
  • TypeScript code samples for various tasks.
  • An overview of fetching Hacker News stories, formatting them, and initializing a vector store.
  • Creating an index.

You can access the detailed tutorial here.

Demo and Source Code

To provide a glimpse of CloseVector in action, we have a demo. This integrates closevector-web, closevector-node, and docusaurus to create a document website using CloseVector's capabilities.

The demo includes:

  1. Injecting a document into the CloseVector index using closevector-node.
  2. Experiencing the search function with closevector-web.

If you're keen to see the behind-the-scenes, the source code for this demo is available at closevector-doc. You can also interact with the demo by using the search button on the top right.

Conclusion

As we share CloseVector's alpha release, we understand that this is just the starting point. We see potential in CloseVector, but also know there's room for growth and improvement. Your feedback and suggestions will be crucial in shaping its evolution. We invite you to test out CloseVector and let us know your thoughts. With your input, we hope to better align CloseVector with user needs. Thank you for considering joining us in this phase.