DeepSeek
December 29, 2024
Chinese startup DeepSeek has released DeepSeek-V3. According to the benchmarks they shared, this is now the most capable open-source large language model currently available. It even achieves performance comparable to leading closed-source models even though it was trained on a budget of just $5.6 million—a fraction of what major tech companies typically spend.
-
DeepSeek-V3 was trained using just 2.8 million GPU hours, costing approximately $5.6 million - significantly less than competitors.
-
The model achieves performance comparable to GPT-4 and Claude 3.5 on various benchmarks, particularly excelling in mathematics and coding tasks.
-
The model's efficiency comes from innovative architecture and training techniques, including a novel approach to training called auxiliary-loss-free load balancing.
Recent Entries
- Nosferatu (2024) December 26, 2024
- ChatGPT o3 December 21, 2024
- Albums 2024 December 08, 2024