DeepSeek rolls out latest method to boost AI reasoning

The Chinese Artificial Intelligence (AI) startup DeepSeek has launched the latest method to improve the reasoning capabilities of large language models (LLMs).

With this significant update, the company highlights the ongoing efforts to enhance the capabilities of LLMs and their potential applications.

It was developed in collaboration with researchers from Tsinghua University.

The latest method combines generative reward modelling (GRM) and self-principled critique tuning to enhance LLMs' capability to respond to general queries with greater precision.

The dual approach incorporates GRM with self-principled critique tuning, enabling LLMs to optimise their behaviour by human feedback.

The resulting DeepSeek-GRM models outpace all the existing techniques, offering improved overall performance with powerful public reward models, according to a paper published.

DeepSeek has also planned to make its GRM models open source, although there is no certain timeline.

The news comes ahead of the company’s rising interest in future advancements, following significant attention garnered by its flagship V3 foundation model and famous R1 reasoning model.

R1 reasoning model rose to prominence after performing better than previous models, including OpenAI’s first ChatGPT model.