Takeaway: Custom models solve all the AI specific problems long term - data remains private, Model is internal policy compliant, No dependency on external updates, Responses are better and more relevant, and can be customized according to use cases. Try out ChatGPT/Azure APIs to see if Gen AI is a good fit for your use case, and if the results are encouraging, go for a full fledged Custom Model Implementation.
We have covered this partially in previous posts. You can read them here and here.
This post is based on a series of conversations with large companies and enterprises where people wanted to implement Gen AI models like GPT4 on their data, but were not getting the right outputs consistently. They tried many things out there - like finetuning, Retrieval Augmented Generation (RAG) where you pass context within the prompt, and were mildly successful but eventually struggled to get to a level of consistency and predictability to deploy all this into production.
With the speed and growth of Generative AI as a phenomenon over the last one year or so, every enterprise needs an AI strategy in place. With such a fast changing landscape and the speed of tech progress, that is easier said than done. It becomes even more tricky when you have folks reaching out from everywhere pitching an AI solution for just about everything - better writing, better sales, better coding, better revops, better product management, better CX, and so on. I am one of them, so I empathize.
The capabilities are clear and extremely useful as evident by the viral usage of ChatGPT at nearly 100M Weekly Active Users. For the purpose of this post, it would help to think of LLM as a text compression engine. There is some initial knowledge, compressed and stored in very small space. When asked a question, or to complete a sequence, the model tends to look at the initial knowledge and predict the next word based on what it's supposed to output. If something is part of the training data, it can give accurate answers, and if something is totally tangential it can hallucinate and still produce a sample text that is grammatically correct and sounds about right.
When the domain knowledge is absent from training data, how do you make an LLM respond accurately? That's what AI startups try to do.
Many startups have been doing it in multiple ways.
Just add all of context to the prompt directly so that it generates an output in line with the context. Of course, some of this is common sense and some of it is based on training data. Eg: For a writing task, you will get a considerably better output if you start the prompt with "You are a top 1% copywriter in the world. Write a simple conversational title for this post" vs a normal "write a title for this post".
Sometimes, it helps just to put the entire context in the prompt and just ask a question based on the context itself. With 100K context windows and file uploads, you can almost pass anything in a simple prompt.
This is similar to prompt engineering, but you pass examples of expected output in the prompt itself to teach an LLM the result you are expecting. See Open AI Cookbook on how they do it here. It's a variant of prompt engineering but listing it separately just to separate from hackish ways prompt engineering happens.
What happens when your data is large and you dont want to pass everything in a single prompt? RAG involves chunking your data into bits, creating embeddings. When you query, it looks up relevant information, and then pass that along with your question to generate a well informed response. It works for smaller data sets, but will always be a large prompt been passed to an LLM behind the scenes. Call it dynamically generated prompt engineering.
This is used if you want a model to output in a certain pattern like capturing the format or the tone of a specific blog post. You collect 100-10000 examples of how the prompt and expected answer would be and run a finetuning job on a model. Next when you query the finetuned model with a similar prompt, it will generate output similar to your examples at the time of finetuning. Don't worry about the quantity, but make sure every example you use in fine tuning is of high quality.
Important: Unlike the other three techniques, finetuning does not involve adding new knowledge to the model instead saving on context.
You can either one or all of the above together to improve output. This video from Open AI would help clear a lot of questions and help you understand this better.
Yeah, I know. Hence, this post exists.
At this point, many will tell you different solutions - some will suggest techniques like Hyde, some will ask to embed metadata, or add more Retrieved chunks in prompt, more examples for few shot learning and so on. Most would not work either.
To understand why, let's dive into how models generate output and develop an intuition about its working.
To get the desired output, a model needs three things: similar knowledge (in training data), examples of output (Few Shot Learning/Finetuning), and enough context in the prompt (RAG/Prompt Engineering). We're essentially balancing the lack of knowledge with these techniques, but there's a limit. If the model's training data lacks the domain knowledge, the output quality suffers.
The listed techniques use engineering skills to address what's fundamentally a data and data science problem. Attempting to augment the model with external knowledge to fill gaps in the training data often leads to struggles, especially when introducing a completely new domain to a model.
If the outputs are not upto the mark, that means your domain knowledge is not in model's training data
If what you are prompting about is in the training data, the model works incredibly well. See how ChatGPT responds to a generic question or any query. If not, the quality decline is apparent.
At this point, the ultimate solution is just simply adding more knowledge to a model. It's not as easy to do, and that is where Clio AI comes in.
If you have a large corpus of private knowledge, about 1M words (1000 pages) or greater, the above tactics won't cut it. The issue is not with engineering, but with knowledge, and should be solved with adding knowledge.
A Custom Model, trained specifically on your company's private knowledge, is the key.
Just like how GPT 4 compresses the world's knowledge and answers about it when asked, a custom model would do so for your company only.
This is also what Open AI is looking to do with large Enterprises. We covered it here in a previous post.
As of today, to train a custom model, you basically need three things - a good Foundational Model, Lots of Data, and Expertise. (and compute though that is given when it comes to Data Science).
Open Source Supplies models like Llama-2 which are exceptionally good, you supply the data you need a LLM trained on, and Clio AI team provides the expertise.
Our team works closely with you to help make a great custom model for your company's use cases. Below is a schematic of the training process.
We modify every step of the model training process:
This is pushing stuff as far as it can currently go.
Tailored for Your Company:
This isn't a generic model; it's specifically trained to understand and cater to your company's needs. This approach is particularly beneficial for companies with large datasets, as other techniques may not yield satisfactory results.
Please note that this would not be a generic model, but mostly trained and built in a way it understands what the company does to supply output accordingly.
Companies should consider a custom AI model for:
Needing to incorporate domain knowledge the model isn't trained on.
Needing the model to adhere to company-specific policies regarding compliance, data, and communication.
Ensuring privacy for business data and trade secrets.
Requiring cross-functional insights spanning multiple apps and databases.
If the typical ChatGPT wrappers are not working for you, you should consider a custom model.
I think most companies have to go for open source implementation because of system prompt requirements, which give them more control. You can use RAG, Finetuning etc on a self-deployed Llama-2 as well. Custom Model is useful for a higher degree of customization.
At Clio AI, we estimate that a custom model trained on your org's knowledge could save every one of your employee about 2-3 hours a day. With 5x faster decisions and 10x less time spent on looking for context. That could simply translate to a direct and proportional impact on your top and bottom line.
Your custom model is more aware of your org's years of accumulated knowledge, context, and language, thus more adept at accurately answering questions about your business. This is one instance where your competition cannot replicate the results using the same tools. Your data and your knowledge gives you the push to the next level.
A custom model is an asset when it comes to decisions. It can provide much needed context, help teams by distilling complex information into quick actionable insights, and unblock individuals with getting them information they need instantly.
This is non trivial and not immediately obvious. Open AI and other providers keep updating their models which may result in a performance change. You would have heard about "ChatGPT getting dumber", it's just newer iterations on the model. While using the APIs, you are likely to encounter changes in the form of output quality degrading suddenly. With a custom model, you decide on the update, and can deploy when you are satisfied with your own benchmarks.
Other minor things that are advantageous compared to using a generic model. Ironing out these minor irritations help a lot with user experience and drive company wide adoption.
Usual use cases for ChatGPT, like code assist, sales support, and marketing copy generation, remain relevant. However, the key difference lies in the model's in-depth understanding of your business—its knowledge, priorities, and goals. Going beyond standard applications, there are advanced use cases that a typical RAG-based ChatGPT can't handle.
Search for any document, discussion, and report from the past without having to ping/interrupt the person who may have the right context. This is particularly useful for your best employees who get interrupted and pulled in multiple discussions in addition to their own work.
Insights from Live Data at your Business Leaders' fingertips even for complex questions. Imagine a custom dashboard that responds to dynamic queries like "What's our total CAC if we include the time spent in prep and meetings?"
With a model that understands your schema and business, every employee can get the right insights in time without having to go through the data request process and excessive analysis.
Open AI's pricing starts at about $2M-$3M minimum with an acknowledgement that they won't be able to accommodate most companies. Our pricing is one-tenth of that. It takes about 2-3 months realistically to have a state of the art model for your company going from Pre-Training to custom Post RLHF process. This is a highly customized process that varies according to the business and industry.
You have the data, models come from open source, you still need the expertise. Thats where we come in. There are a handful of people who have trained and deployed a ML model like this from scratch in production. Clio AI's team counts one of them as Cofounder. Abhinav did it for Tokopedia in the past.
As stated, we will reduce your training cost by 90% while giving a near almost similar performance.
You can deploy the model in your cloud, in your private network, thereby controlling the costs yourself, unlike Open AI which would deploy it on Azure servers and charge you for the service.
For more details and questions, you can check this out.
Choosing Clio AI over Open AI brings not only cost savings but also specialized expertise. The model's understanding of your business enhances standard use cases and introduces advanced applications like workplace search and dynamic data insights. The reduced pricing and deployment flexibility further make Clio AI a strategic and cost-effective choice for companies seeking tailored AI solutions.
You can reach out directly with any questions or schedule a conversation to understand more via this page.