This is generally very context/use case specific. In general, if a document is a `Dict[str, Any]`, then you either have to have one (or multiple) vector(s) per field, unless you want to combine vectors across fields (it's not self-evident how you'd best do that). In saying that, specific reason's to do this (or why I've done it in the past).
1. Chunking long text fields in documents so as to get a better semantic vector for them (also you can only fit so much into an LLM).
2. Differently to 1. chunking long text fields (or even chunking images, audio, etc), is one way to perform highlighting. It helps to answer the question, for example, for a given document what about it was the reason it was returned? You can then point to the area in the image/text/audio that was most relevant.
3. You may want to run different LLMs on different fields (perhaps a separate multi-modal LLM vs a standard text LLM), or like another comment said have different transforms/representations of the same field.
Perhaps 100 vectors is non-standard, but definitely not unseen.
Only Vespa allows you to index multiple vectors per schema field, avoiding duplicating all the meta data of the document into the "chunk", and avoids maintaining the document to chunk fan-out. See https://blog.vespa.ai/semantic-search-with-multi-vector-inde...
1. Chunking long text fields in documents so as to get a better semantic vector for them (also you can only fit so much into an LLM). 2. Differently to 1. chunking long text fields (or even chunking images, audio, etc), is one way to perform highlighting. It helps to answer the question, for example, for a given document what about it was the reason it was returned? You can then point to the area in the image/text/audio that was most relevant. 3. You may want to run different LLMs on different fields (perhaps a separate multi-modal LLM vs a standard text LLM), or like another comment said have different transforms/representations of the same field.
Perhaps 100 vectors is non-standard, but definitely not unseen.