Chinese startup DeepSeek has released DeepSeek-V3. According to the benchmarks they shared, this is now the most capable open-source large language model currently available. It even achieves performance comparable to leading closed-source models even though it was trained on a budget of just $5.6 million—a fraction of what major tech companies typically spend.

  • DeepSeek-V3 was trained using just 2.8 million GPU hours, costing approximately $5.6 million - significantly less than competitors.

  • The model achieves performance comparable to GPT-4 and Claude 3.5 on various benchmarks, particularly excelling in mathematics and coding tasks.

  • The model's efficiency comes from innovative architecture and training techniques, including a novel approach to training called auxiliary-loss-free load balancing.