The LLM Cycle

There are hundreds of thousands of models on HuggingFace, with more models releasing every day. We have been noticing a pattern in the releases after a new foundational model comes out.

Base Model
A new base model or foundational model is released, greatly improving the previous version, sometimes claiming better results than commercial alternatives. e.g.: Meta Llama 2

Fine-tuned Models
The first fine-tuned models are released pretty soon offering a slight improvement over the base model, different formats and more quantisations options (allow running the model with different system requirements), e.g.: NousResearch, Upstage, StabilityAI, lmsys

Specific Purpose Models
These models build upon the base model and make it better at a specific use case (different language, programming language, field). e.g.: WizardCoder and CodeUp, Faradaylab (french), LinkSoul and FlagAlpha (chinese), llSourcell (medllama)

Long Context Models
The base models often have limited context windows (around 1k or 2k tokens) which means it can only keep a limited record of the history of the conversation or general context provided. Through further fine-tuning these models are able to provide larger context windows of 32k, 64k and even 128k, allowing these models to process large amounts of information at once. e.g.: NousResearch and togethercomputer

Model merging
These models are the result of the combination of other fine-tuned models, joining their datasets and hopefully achieving a higher quality. e.g.: Stable-vicuna, Luban-Marcoroni, OpenOrca-Platypus

Uncensored Versions
Uncensored Models aim to reduce refusals, avoidance, and bias that the base models have when answering certain topics hence why they are often called uncensored or unfiltered. There are also some versions tuned for role-play and other exotic purposes.

Benchmark race
At this point, there are very good models for many different purposes and use cases. The only way left for a model to stand out is to top the benchmarks: HuggingFaceH4 Open LLM Leaderboard.

Mix and match
Once there are enough high quality models, it starts getting harder to improve by fine-tuning, so we start seeing more and more models resulting from merging two or more fine-tuned models into one. Different ratios of each source model yield different results. And the race continues.

What’s next?
Will we see a new category of models? Will we switch to a new base model? Can new fine-tuning methods jumpstart a new generation of models? We can’t wait to find out!