Model merging is a recent development in LLM community. We can combine multiple LLMs into a single architecture strategically, and it retains the capabilities of all the original models - without requiring any additional training or compute. That is the capabilities are superposable and this becomes an extremely cost effective approach for developing new models. The Open LLM leaderboard showcases most of the merged models on top in terms of capabilities and shows how groundbreaking this finding is. CaLM, as we discussed in our previous analysis, also does something similar without merging.
However, model merging today is an art, relying more on one person's intuition and instincts about selecting models and merging recipes to create and refine a new model that is more capable. Typically, model maker needs to have domain knowledge about the models, right intuition and instinct about open source models and what their training data may contain. Given the vast diversity of models, benchmarks, and evals in OSS community, the instinctive approach can only go so far.
This research paper presents a novel approach called Evolutionary Model Merge that uses evolutionary algorithms to automatically discover optimal ways to combine diverse open-source models to create new foundation models with desired capabilities.
This work makes several key contributions to the field of foundation model development:
In terms of what the paper did:
The key findings are that evolutionary model merging is a powerful, automated way to combine the knowledge in diverse open-source models to create new foundation models with expanded capabilities in a compute-efficient manner. The evolved models show surprising generalization and cultural understanding despite not being explicitly trained for the downstream tasks.
In terms of business implications, this work democratizes foundation model development by leveraging the open-source model ecosystem. It provides a path for quickly prototyping capable models by combining existing building blocks rather than training from scratch. Organizations can use this evolutionary approach to develop proof-of-concept models to assess feasibility before investing heavily in custom models. The efficiency and generalization of the evolved models make them compelling for real-world applications.
Overall, this paper presents an exciting new direction for automated foundation model creation that makes it more accessible while pushing the boundaries of model capabilities through cross-domain composition. The evolutionary model merging paradigm has significant implications for accelerating foundation model development and deployment.