What are 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 (𝗥𝗔𝗚) 𝗦𝘆𝘀𝘁𝗲𝗺𝘀?
Here is an example of a simple RAG based Chatbot to query your Private Knowledge Base.
First step is to store the knowledge of your internal documents in a format that is suitable for querying. We do so by embedding it using an embedding model:
𝟭: Split text corpus of the entire knowledge base into chunks - a chunk will represent a single piece of context available to be queried. Data of interest can be from multiple sources, e.g. Documentation in Confluence supplemented by PDF reports.
𝟮: Use the Embedding Model to transform each of the chunks into a vector embedding.
𝟯: Store all vector embeddings in a Vector Database.
𝟰: Save text that represents each of the embeddings separately together with the pointer to the embedding (we will need this later).
Next we can start constructing the answer to a question/query of interest:
𝟱: Embed a question/query you want to ask using the same Embedding Model that was used to embed the knowledge base itself.
𝟲: Use the resulting Vector Embedding to run a query against the index in the Vector Database. Choose how many vectors you want to retrieve from the Vector Database - it will equal the amount of context you will be retrieving and eventually using for answering the query question.
𝟳: Vector DB performs an Approximate Nearest Neighbour (ANN) search for the provided vector embedding against the index and returns previously chosen amount of context vectors. The procedure returns vectors that are most similar in a given Embedding/Latent space.
𝟴: Map the returned Vector Embeddings to the text chunks that represent them.
𝟵: Pass a question together with the retrieved context text chunks to the LLM via prompt. Instruct the LLM to only use the provided context to answer the given question. This does not mean that no Prompt Engineering will be needed - you will want to ensure that the answers returned by LLM fall into expected boundaries, e.g. if there is no data in the retrieved context that could be used make sure that no made up answer is provided.
To make it a real Chatbot - face the entire application with a Web UI that exposes a text input box to act as a chat interface. After running the provided question through steps 1. to 9. - return and display the generated answer. This is how most of the chatbots that are based on a single or multiple internal knowledge base sources are actually built nowadays.
As described, the system is really just a naive RAG that is usually not fit for production grade applications. You need to understand all of the moving pieces in the system in order to tune them by applying advanced techniques, consequently transforming the Naive RAG to Advanced RAG fit for production. More on this in the upcoming posts, so stay tuned in!
nDimensionsAI
What are 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 (𝗥𝗔𝗚) 𝗦𝘆𝘀𝘁𝗲𝗺𝘀?
Here is an example of a simple RAG based Chatbot to query your Private Knowledge Base.
First step is to store the knowledge of your internal documents in a format that is suitable for querying. We do so by embedding it using an embedding model:
𝟭: Split text corpus of the entire knowledge base into chunks - a chunk will represent a single piece of context available to be queried. Data of interest can be from multiple sources, e.g. Documentation in Confluence supplemented by PDF reports.
𝟮: Use the Embedding Model to transform each of the chunks into a vector embedding.
𝟯: Store all vector embeddings in a Vector Database.
𝟰: Save text that represents each of the embeddings separately together with the pointer to the embedding (we will need this later).
Next we can start constructing the answer to a question/query of interest:
𝟱: Embed a question/query you want to ask using the same Embedding Model that was used to embed the knowledge base itself.
𝟲: Use the resulting Vector Embedding to run a query against the index in the Vector Database. Choose how many vectors you want to retrieve from the Vector Database - it will equal the amount of context you will be retrieving and eventually using for answering the query question.
𝟳: Vector DB performs an Approximate Nearest Neighbour (ANN) search for the provided vector embedding against the index and returns previously chosen amount of context vectors. The procedure returns vectors that are most similar in a given Embedding/Latent space.
𝟴: Map the returned Vector Embeddings to the text chunks that represent them.
𝟵: Pass a question together with the retrieved context text chunks to the LLM via prompt. Instruct the LLM to only use the provided context to answer the given question. This does not mean that no Prompt Engineering will be needed - you will want to ensure that the answers returned by LLM fall into expected boundaries, e.g. if there is no data in the retrieved context that could be used make sure that no made up answer is provided.
To make it a real Chatbot - face the entire application with a Web UI that exposes a text input box to act as a chat interface. After running the provided question through steps 1. to 9. - return and display the generated answer. This is how most of the chatbots that are based on a single or multiple internal knowledge base sources are actually built nowadays.
As described, the system is really just a naive RAG that is usually not fit for production grade applications. You need to understand all of the moving pieces in the system in order to tune them by applying advanced techniques, consequently transforming the Naive RAG to Advanced RAG fit for production. More on this in the upcoming posts, so stay tuned in!
#LLM hashtag#GenAI hashtag#LLMOps hashtag#MachineLearning
1 year ago | [YT] | 2