Why Vector Databases Are Getting Popular During the AI Hype Boom

 


Vector databases are gaining popularity, with many startups diving into the field and investors showing keen interest. This surge is fueled by the rise of large language models (LLMs) and generative AI (GenAI) technology, which have paved the way for vector database innovations to thrive.

Traditional databases like Postgres or MySQL are good for organizing structured data, which is data that fits neatly into rows and columns with specific types like numbers or text. But they're not so great for handling unstructured data, like pictures, videos, emails, or social media posts, where there's no clear format or organization.

Vector databases are like smart organizers for data. Instead of just storing information in its original form, like words or pictures, they transform it into numbers that show what it means and how it's connected to other things. This makes it simpler for computers to understand and find similar information. It's like putting related things closer together on a shelf so you can find them easily.

This is really useful for smart AI chatbots like OpenAI's GPT-4. It helps them understand conversations better by checking out similar chats from before. And it's not just for chatbots! It's also super helpful for things like suggesting cool stuff to check out on social media or shopping apps. It can quickly find things similar to what you've been looking for.

Using vector search can also make LLM applications better by giving them more details, even if those details weren't in the training data.

"Even if you don't use vector similarity search, you can still create AI and machine learning apps. But without it, you'll have to spend more time training and adjusting the models. Vector databases are helpful when you have a big dataset and need a tool to manage vector embeddings quickly and easily."

In January, Qdrant got $28 million to grow even more. It's now one of the top 10 fastest-growing open source companies. But it's not alone. Other companies like Vespa, Weaviate, Pinecone, and Chroma also got a lot of money last year—$200 million in total—for their vector databases.

Recently, we've witnessed some exciting developments in the startup world. Index Ventures spearheaded a $9.5 million investment in Superlinked, a platform that simplifies tricky data into easy-to-understand vectors. Also, Y Combinator (YC) introduced its Winter '24 cohort, featuring Lantern, a startup offering a handy search engine for Postgres databases.

Elsewhere, Marqo got $4.4 million last year and then got another $12.5 million in February. Marqo has a platform with lots of vector tools that you can use right away. These tools help with making, storing, and finding vectors. With Marqo, you don't need other tools from companies like OpenAI or Hugging Face. Everything you need is in one place, and you can access it all using just one API.

Marqo's founders, Tom Hamer and Jesse N. Clark, used to work at Amazon, where they noticed a big problem: it was hard to search for things in different ways, like using words or pictures. So, in 2021, they left Amazon and started Marqo to solve this problem.

Enter the enterprise:

Although vector databases are getting a lot of attention because of ChatGPT and the GenAI trend, they can't solve every search problem for businesses.
Even though some databases are really good at certain tasks, they might not be great at everything. That's why we're seeing big-name databases like Elastic, Redis, OpenSearch, Cassandra, Oracle, and MongoDB now adding smart features for searching vectors. Even cloud providers like Microsoft Azure, Amazon AWS, and Cloudflare are joining in on this trend.

Zaitsev compares this new trend to what happened with JSON over ten years ago. Back then, as web apps became more common, developers needed a data format that worked with any programming language and was easy for people to understand. This led to the rise of document databases like MongoDB, alongside traditional databases that started supporting JSON.

"People working on complex and large AI projects will use specialized vector search databases. Meanwhile, those who just need a little AI feature for their current app will probably use the vector search feature in the databases they already use."

Zayarni and his team at Qdrant believe that creating solutions from scratch using vectors will be faster, safer for memory, and more scalable as vector data grows. They think this approach is better than adding vector search later on as an extra feature.

But Zayarni and his Qdrant colleagues are betting that native solutions built entirely around vectors will provide the “speed, memory safety, and scale” needed as vector data explodes, compared to the companies bolting vector search on as an afterthought.








Post a Comment

0 Comments