DeepSeek
December 29, 2024
Chinese startup DeepSeek has released DeepSeek-V3. According to the benchmarks they shared, this is now the most capable open-source large language model currently available. It even achieves performance comparable to leading closed-source models even though it was trained on a budget of just $5.6 million — a fraction of what major tech companies typically spend.
-
DeepSeek-V3 was trained using just 2.8 million GPU hours, costing approximately $5.6 million — significantly less than competitors.
-
The model achieves performance comparable to GPT-4 and Claude 3.5 on various benchmarks, particularly excelling in mathematics and coding tasks.
-
The model's efficiency comes from innovative architecture and training techniques, including a novel approach to training called auxiliary-loss-free load balancing.
Recent Entries
- Abul Mogard - Quiet Pieces June 04, 2025
- Never Flinch June 04, 2025
- Image Fixes June 03, 2025