728x90 DPO1 [논문 리뷰] V-STaR: Training Verifiers for Self-Taught Reasoners 논문 링크 : https://arxiv.org/abs/2402.06457 V-STaR: Training Verifiers for Self-Taught Reasoners Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts arxiv.org 기존 LLM들의 Self-Taught 방법론들은 스스로 생성한 해결책.. 2024. 2. 14. 이전 1 다음 728x90