Sveriges mest populära poddar
Data Brew by Databricks

Reward Models | Data Brew | Episode 40

40 min20 mars 2025

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).

Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.

Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/

Fler avsnitt av Data Brew by Databricks

Visa alla avsnitt av Data Brew by Databricks

Data Brew by Databricks med Databricks finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.