Leveraging AI in SAP with CAP and HANA Cloud’s Vector Engine

Artificial Intelligence (AI) is the most significant trend in 2024, and for good reason. The possibilities for users leveraging AI are immense, including generating code, AI-supported debugging, image and video generation, text generation, and process automation. Popular AI models like ChatGPT and DALL·E 3 showcase these capabilities. The real question for developers is how to harness the power of these models effectively. While these models are not typically trained on business-specific domains and may produce inaccuracies or “hallucinations,” understanding and mitigating these limitations can unlock their full potential.

Understanding AI Hallucinations

AI hallucinations are incorrect or misleading results generated by AI models. These errors can be caused by a variety of factors, including insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model. According to Google Cloud, these hallucinations can be mitigated by extending the knowledge of the Large Language Model (LLM) with information about the context of our business domain.

One effective way to do this is by using vector embeddings.

Vector Embeddings and SAP HANA Cloud’s Vector Engine

Vector embeddings are mathematical representations used to encode objects into multi-dimensional vector space. These embeddings capture the relationships and similarities between objects. The SAP HANA Cloud Vector Engine facilitates the storage and analysis of complex and unstructured vector data (embeddings) into a format that can be seamlessly processed, compared, and utilized in building various intelligent data applications and adding more context in case of GenAI scenarios.

Using the SAP HANA Cloud’s Vector Engine is a convenient way for SAP Developers to create context for the AI models provided by SAP Core AI. SAP HANA Cloud is already part of the SAP ecosystem, making integration easier. SAP offers LLMs through the use of SAP’s partner foundation models, providing all the tools needed to add context to an LLM like GPT-4.

With an LLM and the right contextual embeddings, a developer can build an application consuming the model’s capabilities, exposing features or APIs that allow for meaningful business software to improve user and developer experience.

Steps to Implement

  1. Create an instance of SAP AI Core.
  2. Create deployments for a model supporting ChatCompletion (e.g., gpt-35-turbo or gpt-4) and an embedding model (text-embedding-ada-002).
  3. Establish a connection to SAP AI Core via Destination Services.
  4. Create SAP HANA Cloud with Vector Engine (QRC 1/2024 or later).
  5. Implement the CAP service using the CAP LLM Plugin (Beta).
  6. Create an input document to create the contextual data in the form of vector embeddings.
  7. Create and store vector embeddings for SAP HANA Cloud Vector Engine.
  8. Send the RAG request with the needed vector embeddings to the AI model within SAP AI Core.
  9. Enjoy the response!

Architecture Overview

A CAP application is connected to an SAP HANA Cloud instance on the SAP Business Technology Platform (BTP). The application interacts with the SAP HANA Cloud Vector Engine. The CAP LLM Plugin used within the application executes AI-specific tasks on the SAP AI Core services. The connection to the SAP AI Core goes through BTP’s destination service. SAP AI Core routes the requests through to the partner foundation models, such as Azure OpenAI Services, and sends back the response.

CAP LLM Plugin

Within the CAP service, the CAP LLM Plugin can be used not only to connect to the SAP AI Core or SAP HANA Cloud Vector Engine but also to execute operations like anonymization of data, creation of embeddings, executing similarity search, chat completion, and RAG responses. The plugin is available as an npm package, with documentation and samples available on GitHub (currently in beta and not suitable for production use cases yet).

Configuration and Connection

The .cdsrc.json file is used for configuring which embedding and chat models the CAP LLM Plugin can connect to via the SAP AI Core.

{
  "GENERATIVE_AI_HUB": {
    "CHAT_MODEL_DESTINATION_NAME": "",
    "CHAT_MODEL_DEPLOYMENT_URL": "",
    "CHAT_MODEL_RESOURCE_GROUP": "",
    "CHAT_MODEL_API_VERSION": "",
    "EMBEDDING_MODEL_DESTINATION_NAME": "",
    "EMBEDDING_MODEL_DEPLOYMENT_URL": "",
    "EMBEDDING_MODEL_RESOURCE_GROUP": "",
    "EMBEDDING_MODEL_API_VERSION": ""
  },
  "AICoreAzureOpenAIDestination": {
    "kind": "rest",
    "credentials": {
      "destination": "<destination name>",
      "requestTimeout": "300000"
    }
  }
}

Preparing Input Data for Vector Embeddings

By default, none of the partner foundation models know anything about our specific business domain. That means we have to train the model on any relevant context information needed for a successful response. The relevant information can be fed through vector embeddings to the model so it will understand and learn the context of the business case. The information can be provided through a simple text document which gets chunked and sent to the SAP AI Core to create the needed vector embeddings. These are then stored in the vector engine database of an SAP HANA Cloud instance.

Before creating the embeddings and training the model, it’s interesting to see what the model returns on a question on a recent event. Let’s ask the model about a recent event in May 2024. The response will most likely be a hallucination or it can’t be answered at all:

{
  "@odata.context": "$metadata#Edm.String",
  "value": {
    "completion": {
      "content": "I'm sorry, but I cannot provide the recent info. If you have any other questions, feel free to ask!",
      "role": "assistant"
    },
    "additionalContents": []
  }
}

In the CAP application code, the LangChain Text Loader and Recursive Character Text Splitter are used to read and chunk the text file:

const { TextLoader } = require('langchain/document_loaders/fs/text')
const { RecursiveCharacterTextSplitter } = require('langchain/text_splitter')

let textChunkEntries = []
const loader = new TextLoader(path.resolve('path/input.txt'))
const document = await loader.load()
const splitter = new RecursiveCharacterTextSplitter({
      chunkSize: 500,
      chunkOverlap: 0,
      addStartIndex: true
})
        
const textChunks = await splitter.splitDocuments(document)

The input text file is an excerpt of the recent event. The resulting text chunks are sent to SAP AI Core via the CAP LLM Plugin. The embedding API retrieves the embeddings based on the text chunks, which are then written into an object in the form of a vector and inserted into the database:

try {
    const vectorPlugin = await cds.connect.to('cap-llm-plugin')
    for (const chunk of textChunks) {
       const embedding = await vectorPlugin.getEmbedding(chunk.pageContent)
       const entry = {
           "text_chunk": chunk.pageContent,
           "metadata_column": loader.filePath,
           "embedding": array2VectorBuffer(embedding)
       }
       textChunkEntries.push(entry)
    }
    const insertStatus = await INSERT.into(DocumentChunk).entries(textChunkEntries)
    if (!insertStatus) {
        throw new Error("Insertion of text chunks into db failed!")
    }
        return `Embeddings stored successfully to db.`
} catch (error) {
    console.log('Error while generating and storing vector embeddings:', error)
    throw error
}

The database table containing the text chunks will look something like this:

{
  "@odata.context": "$metadata#DocumentChunk",
  "value": [
    {
      "text_chunk": "Welcome to the SAP Innovation Day, an extraordinary convergence of cutting-edge technology and visionary insights set against the backdrop of Dubai's awe-inspiring Museum of the Future.",
      "metadata_column": null
    },
    {
      "text_chunk": "At SAP, we believe in harnessing the power of AI to drive better business outcomes, and this event is dedicated to showcasing how AI can be seamlessly integrated into your business solutions": null
    },
    {
      "text_chunk": ", empowering you to unlock unprecedented levels of innovation, efficiency, and growth.",
      "metadata_column": null
    },
    {
      "text_chunk": "Under the theme - AI Everywhere For Better Business Outcomes",
      "metadata_column": null
    }
  ]
}

It’s helpful to implement a helper service within your CAP service application to store and delete embeddings, making experimentation easier.

service EmbeddingStorageService {
    entity DocumentChunk as projection on db.DocumentChunk excluding { embedding };

    function storeEmbeddings() returns String;
    function deleteEmbeddings() returns String;
}

Using the CAP LLM Plugin for the RAG Response

The embedding vectors are stored, and the connection to the LLM model is set. The last piece is to implement the application serving the RAG response to the user’s question.

The CAP application defines an OData function that can be called via the service’s API:

service RoadshowService {
    function getRagResponse() returns String;
    function executeSimilaritySearch() returns String;
}

The implementation uses the CAP LLM Plugin’s APIs to execute the requests:

const cds = require('@sap/cds')
const tableName = 'SAP_ADVOCATES_DEMO_DOCUMENTCHUNK'
const embeddingColumn = 'EMBEDDING'
const contentColumn = 'TEXT_CHUNK'
const userQuery = 'In which city is the SAP Innovation Day held?'

module.exports = function() {
    this.on('getRagResponse', async () => {
        try {
            const vectorplugin = await cds.connect.to('cap-llm-plugin')
            const ragResponse = await vectorplugin.getRagResponse(
                userQuery,
                tableName,
                embeddingColumn,
                contentColumn
            )
            return ragResponse
        } catch (error) {
            console.log('Error while generating response for user query:', error)
            throw error;
        }
    })

    this.on('executeSimilaritySearch', async () => {
        const vectorplugin = await cds.connect.to('cap-llm-plugin')
        const embeddings = await vectorplugin.getEmbedding(userQuery)
        const similaritySearchResults = await vectorplugin.similaritySearch(
            tableName,
            embeddingColumn,
            contentColumn,
            embeddings,
            'L2DISTANCE',
            3
        )
        return similaritySearchResults
    })
}

The call for the RAG response expects the user query, the table name for the text chunks, the column where the embeddings are stored, and the content column. The response contains the LLM’s response to the user’s question. After the model knows the context of the event, it will most likely answer with the correct information.

Conclusion

I hope this blog post helps you understand how to enhance the AI models behind SAP AI Core with embeddings using the SAP HANA Cloud Vector Engine and the CAP LLM Plugin. Visit the CAP LLM Plugin. for more samples.

Happy coding!