Can the Tsinghua University AI Lab Prevent Model Collapse in Synthetic Data?
2024/12/24
再生時間： 6 分
ポッドキャスト

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Can the Tsinghua University AI Lab Prevent Model Collapse in Synthetic Data?

無料で聴く

ポッドキャストの詳細を見る

サマリー
This episode analyzes the research paper titled "HOW TO SYNTHESIZE TEXT DATA WITHOUT MODEL COLLAPSE?" authored by Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, and Bowen Zhou, affiliated with institutions such as LUMIA Lab at Shanghai Jiao Tong University, the State Key Laboratory of General Artificial Intelligence at BIGAI, Tsinghua University, Peking University, and the Shanghai Artificial Intelligence Laboratory. Published on December 19, 2024, the discussion explores the critical issue of model collapse in language models trained on synthetic data. It examines the researchers' investigation into the negative impacts of synthetic data on model performance and the innovative solution of token-level editing to generate semi-synthetic data. The episode reviews the study's theoretical foundations and experimental results, highlighting the implications for enhancing the reliability and effectiveness of AI language systems.

This podcast is created with the assistance of AI, the producers and editors take every effort to ensure each episode is of the highest quality and accuracy.

For more information on content and research relating to this episode please see: https://arxiv.org/pdf/2412.14689

続きを読む一部表示

あらすじ・解説

This episode analyzes the research paper titled "HOW TO SYNTHESIZE TEXT DATA WITHOUT MODEL COLLAPSE?" authored by Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, and Bowen Zhou, affiliated with institutions such as LUMIA Lab at Shanghai Jiao Tong University, the State Key Laboratory of General Artificial Intelligence at BIGAI, Tsinghua University, Peking University, and the Shanghai Artificial Intelligence Laboratory. Published on December 19, 2024, the discussion explores the critical issue of model collapse in language models trained on synthetic data. It examines the researchers' investigation into the negative impacts of synthetic data on model performance and the innovative solution of token-level editing to generate semi-synthetic data. The episode reviews the study's theoretical foundations and experimental results, highlighting the implications for enhancing the reliability and effectiveness of AI language systems.

This podcast is created with the assistance of AI, the producers and editors take every effort to ensure each episode is of the highest quality and accuracy.

For more information on content and research relating to this episode please see: https://arxiv.org/pdf/2412.14689

続きを読む一部表示