Updating LLMs: Fine-tuning vs. Retraining or Why ChatGPT’s Cut-Off Date Doesn’t Change
Consider this analogy: If a history book written in 2000 is to remain relevant in 2023, it either needs new chapters added or a complete rewrite. Similarly, large language models need either fine-tuning (adding new chapters) or retraining (complete rewrite) to remain current.
- Fine-tuning: This is essentially a continuation of training. By using saved weights and introducing new data, the model adjusts its previous knowledge. However, this process might result in the model leaning heavily towards its original training. It’s like adding a chapter to our history book — the core doesn’t change, but there’s an additional perspective.
- Retraining: This means starting from scratch. Every piece of data, old and new, is given equal weight. In terms of our book analogy, it’s a complete rewrite, ensuring that newer events are interwoven seamlessly into the story. But it’s resource-intensive — a long, taxing rewrite.
ChatGPT: Is OpenAI silently fine-tuning their model?
While ChatGPT often cites a 2022 knowledge cutoff, some users have noticed information from 2023 seeping in.
Is OpenAI silently fine-tuning their model?
I do think so. Fine-tuning is the economical choice to add new information. But a fine-tuned model is biased towards its original data. So while the model might know of events from 2023, its core understanding would still reflect the worldview of 2022.
The outcome isn’t that great. If you ask ChatGPT to write code in a modern version of a framework it seems to know about some of the new stuff but that will be just mixed in with the old syntax.
So OpenAI doesn’t advertise the fine-tuned model as a model with a new cut-off date, because it’s not the same as a retrained model with a new cut-off date.