<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>yuha933 님의 블로그</title>
    <link>https://yuha933.tistory.com/</link>
    <description>yuha933 님의 블로그 입니다.</description>
    <language>ko</language>
    <pubDate>Fri, 10 Apr 2026 04:25:22 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>yuha933</managingEditor>
    <item>
      <title>[논문 리뷰] Accurate predictions on small data with a tabular foundation model (TabPFN)</title>
      <link>https://yuha933.tistory.com/24</link>
      <description>&lt;h2 data-ke-size=&quot;size26&quot;&gt;Introduction&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;연구 배경&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;수작업으로 설계된 알고리즘 구성 요소들은 더 높은 성능을 보이는 end-to-end 학습 방식으로 대체되어 왔다. 컴퓨터 비전에서는 SIFT와 HOG와 같은 수작업 특징들이 학습된 convolution으로 대체되었고, 자연어 처리에서의 문법 기반 접근 방식은 학습된 transformer로 대체되었다.&lt;/li&gt;
&lt;li&gt;표형 데이터셋은 텍스트나 이미지와 같은 비가공 데이터 형태와 구별되는 다양한 특성을 가진다.&lt;/li&gt;
&lt;li&gt;딥러닝 방법들은 전통적으로 표형 데이터에서 어려움을 겪어왔으며, 이는 데이터셋 간의 이질성과 원시 데이터 자체의 이질성 때문이다. 이러한 이유로 트리 기반 모델과 같은 비딥러닝 방법들이 지금까지 가장 강력한 경쟁자로 자리잡아 왔다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;기존 연구의 한계&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;전통적인 머신러닝 모델들은 아래와 같은 한계들을 지닌다.
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;충분한 전처리 없이 사용될 경우, out-of-distribution 데이터에 대한 예측 성능이 낮다.&lt;/li&gt;
&lt;li&gt;한 데이터셋에서 다른 데이터셋으로 지식을 전이하는 능력이 부족하다.&lt;/li&gt;
&lt;li&gt;gradient를 전파하지 않기 때문에 신경망과 결합하기 어렵다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;본 연구의 제안&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;소규모에서 중간 규모의 표형 데이터를 위한 foundation model인 TabPFN을 제안한다.&lt;br /&gt;
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;새로운 지도학습 기반 표형 학습 방법은 소규모에서 중간 규모의 어떤 데이터셋에서도 적용 가능하며, 최대 10,000개의 샘플과 500개의 feature를 가진 데이터셋에서 뛰어난 성능을 보인다.&lt;/li&gt;
&lt;li&gt;단 한 번의 forward pass만으로, TabPFN은 벤치마크에서 state-of-the-art 방법들, 특히 gradient-boosted DT보다도 훨씬 뛰어난 성능을 보인다.&lt;/li&gt;
&lt;li&gt;fine-tuning, 생성 능력, 밀도 추정 등 다양한 foundation model의 특성을 가진다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Principled in-context learning&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;ICL 도입 배경&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;LLM이 잘 되는 이유는 ICL(in-context learning) 덕분이다. transformer의 경우에는 ICL로 로지스틱 회귀와 베이지안 모델까지 학습이 가능하다.&lt;/li&gt;
&lt;li&gt;기존 TabPFN에서도 ICL을 도입했지만, 개념적으로만 가능하고 실적 적용이 어렵다는 문제가 있었다. 이에 개선된 TabPFN, 즉 본 연구에서는 더 큰 데이터셋 처리가 가능하도록, regression, categorial, missing value 지원이 가능하도록, 그리고 outlier나 irrelevant feature에 강건하도록 개선하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;본 연구의 아이디어&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&amp;nbsp;기존 연구들은 사람이 알고리즘을 설계하는 방식이었지만, TabPFN은 모델이 데이터 예시를 통해 알고리즘을 학습하는 exemplar-based declarative programming 방식을 사용한다. 이러한 방식은 forward pass 한 번으로 gradient와 retraining 없이 학습과 예측을 동시에 가능하도록 한다.&lt;/li&gt;
&lt;li&gt;이때, 데이터셋은 다양한 tabular 데이터셋을 인위적으로 대량 생성해서 사용한다. 이러한 데이터셋은 feature-target 관계가 다양하게, 노이즈나 결측값, 이상치는 포함하도록 구성된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;전체 파이프라인&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;Data Generation : 다양한 synthetic dataset을 생성하고, 일부 label을 masking한다.&lt;/li&gt;
&lt;li&gt;Pre-training : transformer가 missing label을 예측하도록 학습한다.&lt;/li&gt;
&lt;li&gt;Real-world Prediction : 새로운 dataset을 입력하고, 즉시 예측하는 ICL 방식을 가능하게 한다.&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;이론적 해석&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1775494512479&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;p(y_test|X_test, X_train, y_train)&lt;/code&gt;&lt;/pre&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;train 데이터를 기반으로 test를 예측하도록 학습하며, TabPFN은 Bayesian inference 근사를 하는 것으로 해석할 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;An architecture designed for tables&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;문제 1 : transformer와 tabular 데이터 구조가 안 맞는다.&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문제 : transformer는 sequence용 구조이기 때문에 tabular 데이터에 사용하면 데이터를 표 구조(행/열)로 보는 것이 아니라 일렬(문장처럼)로 본다.&lt;/li&gt;
&lt;li&gt;본 연구의 아이디어 : 표를 sequence로 보지 말고, 각 cell에 개별 representation을 부여해서 cell들의 집합으로 본다.&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Two-way attention 구조 : 각 cell이 보는 방향이 2개라서, 결과적으로 feature 관계와 데이터 분포 모두 학습할 수 있다.&amp;nbsp;
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;Row 방향 : 같은 데이터 안에서 feature 간 관계를 학습한다.&lt;/li&gt;
&lt;li&gt;Column 방향 : 다른 샘플 간 비교를 학습한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;문제 2 : 계산 낭비 문제가 발생한다.&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문제 : 기존 방식은 test마다 train을 다시 계산하는 구조다보니 계산 낭비가 생긴다.&lt;/li&gt;
&lt;li&gt;본 연구의 아이디어 : train으로 ICL을 한 번 수행한 후, 그 결과를 저장해둔 다음 여러 test에 재사용한다.&lt;/li&gt;
&lt;li&gt;결과 : CPU 환경에서는 최대 800배 속도 향상을, GPU에서는 최대 30배 속도 향상을 보였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;추가 최적화&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;flash attention, half precision, activation checkpointing을 효율성을 위해 추가적으로 적용하였다.&lt;/li&gt;
&lt;li&gt;결과 : 메모리가 1/4로 감소하였으며, 큰 데이터도 처리 가능함을 확인하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Regression 문제 처리 방식 변경&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 연구에서는 하나의 값을 예측했다면, TabPFN에서는 확률 분포를 예측하는 방식으로 변경하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Synthetic data based on causal models&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;synthetic data의 도입&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문제 : 현실 데이터는 부족하고, 편향이 존재하며, 개인정보나 저작권 문제가 있는 경우가 많다. 그러나 TabPFN 성능의 핵슴은 좋은 학습 데이터이다.&lt;/li&gt;
&lt;li&gt;본 연구의 아이디어 : 데이터 자체를 만들자 !!&lt;/li&gt;
&lt;li&gt;단순 랜덤 데이터가 아니라 SCM(Structural Casual Model)을 사용하여 feature들이 어떻게 서로 영향을 주는지 원인-결과 관계를 가진 데이터를 생성한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;전체 생성 파이프라인&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;Hyperparameter sampling : 데이터 크기, feature 개수와 난이도 등 어떤 문제를 만들지를 먼저 결정한다.&lt;/li&gt;
&lt;li&gt;Causal graph construction :&amp;nbsp; DAG(Directed Acyclic Graph)를 생성하여 feature들 사이 관계 구조를 정의한다.&lt;/li&gt;
&lt;li&gt;Data propagation : 초기값(noise)을 생성한 후, graph를 따라 값을 전달한다. 이때 각 edge마다 neural network, activation, DT 구조, categorial 변환, Gaussian noise 추가를 하며, 현실처럼 복잡한 데이터를 생성한다.&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 결과적으로, graph 끝까지 지나면 feature와 target이 생성된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Post-processing&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Kumaraswamy distribution&lt;/li&gt;
&lt;li&gt;warping&lt;/li&gt;
&lt;li&gt;discretization&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;최종 결과&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;약 1억개의 synthetic dataset을 생성하였으며, 각 dataset은 다른 구조, 다른 feature, 다른 관계를 가지는 dataset이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Qualitative analysis&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;분석 목적&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;TabPFN이 어떤 상황에서 어떻게 동작하는지 직관적으로 이해하기 위해서&lt;/li&gt;
&lt;li&gt;다양한 데이터 특성이 모델에 미치는 영향을 분리해서 보기 위해&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;기존 모델 vs. TabPFN&lt;/b&gt;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot; data-ke-style=&quot;style3&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 19.7287%;&quot;&gt;Model&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;특징&lt;/td&gt;
&lt;td style=&quot;width: 39.6123%;&quot;&gt;단점&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 19.7287%;&quot;&gt;Linear Regression&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;선형 관계만 학습 가능하며, 단순하고 해석 가능하다.&lt;/td&gt;
&lt;td style=&quot;width: 39.6123%;&quot;&gt;비선형 데이터에서는 성능이 급격히 하락한다.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 19.7287%;&quot;&gt;MLP&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;비선형 학습이 가능하다.&lt;/td&gt;
&lt;td style=&quot;width: 39.6123%;&quot;&gt;불연속적이거나 급격한 변화에서는 성능이 떨어진다.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 19.7287%;&quot;&gt;CatBoost&lt;br /&gt;(Tree-based Model)&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;구간별 함수로 모델링하며, 안정적이다.&lt;br /&gt;(catastrophic failure이 없다.)&lt;/td&gt;
&lt;td style=&quot;width: 39.6123%;&quot;&gt;근사 오차가 존재하고, 예측이 직관적이지 않을 수 있다.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 19.7287%;&quot;&gt;TabPFN&lt;/td&gt;
&lt;td style=&quot;width: 40.6589%;&quot;&gt;부드러운 함수와 불연속 함수 모두 잘 처리한다. 또한, step function도 잘 근사하고 신경망인데도 불연속 패턴 대응이 가능하다.&lt;/td&gt;
&lt;td style=&quot;width: 39.6123%;&quot;&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;TabPFN의 장점 : 불확실성 모델링이 가능하다.&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 모델은 하나의 값만 출력했다면, TabPFN은 확률 분포를 출력해 이 값일 가능성이 어느 정도인지까지 같이 예측해준다.&lt;/li&gt;
&lt;li&gt;ex. double slit experiment (복잡한 multi-modal 분포 생성하는 실험)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;TabPFN : 한 번의 연산으로 복잡한 분포 그대로 예측한다.&lt;/li&gt;
&lt;li&gt;기존 모델 (CatBoost) : 여러 모델을 따로 학습해야 하고, 분포를 나중에 재구성해야 한다. 이로 인해 시간이 오래 걸리고 성능 또한 떨어진다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Quantitative analysis&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 배경&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;사용 데이터셋
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;AutoML Benchmark와 OpenML-CTR23을 사용하였으며, 추가로 Kaggle 대회 데이터와 Tabular Playground Series도 사용하였다.&lt;/li&gt;
&lt;li&gt;분류 29개, 회귀 28개로 구성되었으며 최대 10,000개의 샘플, 최대 500개의 feature로 구성된 데이터셋을 사용하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;baseline
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;트리 기반으로는 Random Forest, XGBoost, CatBoost, LightGBM을 사용하였다.&lt;/li&gt;
&lt;li&gt;그 외로 선형 모델과 SVM, MLP도 사용하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;평가 지표
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;분류 지표로는 ROC-AUC와 Accuracy를, 회귀 지표로는 R^2과 RMSE를 사용하였다. 모든 결과는 정규화하여 최고 성능이면 1, 최악의 성능일 수록 0에 가깝게 보았다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;실험 설정
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;각 실험은 10번 반복하였으며, random seed는 변경하였다.&lt;/li&gt;
&lt;li&gt;데이터 분할은 train : test = 9:1 로 하였다.&lt;/li&gt;
&lt;li&gt;하이퍼파라미터 튜닝으로는 random search, 5-fold cross-validation을 사용하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;TabPFN은 사전학습은 GPU 8개로 2주 동안 한 번만 수행하였으며, 이후 새로운 데이터셋마다 추가 학습 없이 forward pass 1번으로 예측하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;Comparison with state-of-the-art baselines&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 목적&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;튜닝이 없어도 좋은가?&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;default setting 결과&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;분류 : CatBoost보다 +0.187&lt;/li&gt;
&lt;li&gt;회귀 : CatBoost보다 +0.051&lt;/li&gt;
&lt;li&gt;결과적으로, 튜닝 없이도 이미 최고 수준이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;tuning 결과 비교&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;다른 모델들은 최대 4시간 tuning을 진행하였다.&lt;/li&gt;
&lt;li&gt;분류 : +0.13&lt;/li&gt;
&lt;li&gt;회귀 : +0.093&lt;/li&gt;
&lt;li&gt;다른 모델들을 tuning하였음에도 TabPFN이 여전히 다른 모델보다 높은 성능을 보인다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;속도 비교&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;TabPFN : 2.8초 / 4.8초&lt;/li&gt;
&lt;li&gt;기존 모델 : 최대 4시간&lt;/li&gt;
&lt;li&gt;분류에서는 5,140배, 회귀에서는 3,000배 가까이 빠른 것을 볼 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;Evaluating diverse data attributes&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 목적&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;데이터 특성이 바뀌면 성능이 어떻게 변하는가?&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;독립 변인&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;useless feature 추가&lt;/li&gt;
&lt;li&gt;outliers 추가&lt;/li&gt;
&lt;li&gt;sample / feature 수 감소&lt;/li&gt;
&lt;li&gt;categorial / missing value 포함 여부&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 결과&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;noise(outliers) / useless feature &amp;rarr; TabPFN은 매우 robust하지만, MLP는 성능이 크게 하락함을 보였다.&lt;/li&gt;
&lt;li&gt;sample / feature 수 감소 &amp;rarr; 모든 모델 성능이 감소했지만, TabPFN은 절반 데이터에서도 여전히 상위 성능을 유지하였다.&lt;/li&gt;
&lt;li&gt;categorial / missing value 포함 여부 &lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&amp;rarr;&lt;span&gt; TabPFN 성능에 거의 영향이 없다.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt; Comparison with tuned ensemble methods&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;실험 대상&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;비교 대상 : AutoGluon (stacked ensemble + tuning)&lt;/li&gt;
&lt;li&gt;TabPFN 확장 : TabPFN끼리 ensemble + tuning을 했다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 결과&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;분류 : AutoGluon보다 5,140배 더 빠르며 더 좋은 성능을 보였다.&lt;/li&gt;
&lt;li&gt;회귀 : AutoGluon보다 48배 빠르면서, 성능도 더 높음을 보였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Foundation model with interpretability&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Foundation model로써의 TabPFN 분석&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Foundation model은 여러 task에 활용 가능한 범용 모델을 말한다.&lt;/li&gt;
&lt;li&gt;Density estimation : 수치형 데이터의 경우에는 확률 밀도 함수를 추정하고, 범주형 데이터의 경우에는 확률 질량 함수를 추정한다. 이로 인해 TabPFN은&amp;nbsp;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;단순 예측이 아니라, 데이터가 얼마나 정상인지 판단 가능하다.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;Data generation : 실제 데이터처럼 생긴 synthetic tabular data를 생성하여, 데이터 부족을 해결, privacy 보호, 데이터 증강을 가능하게 한다.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;Representation learning : feature representation을 학습하여 결측값 보정과 클러스터링을 가능하게 한다. 이를 통해 raw 데이터보다 클래스 분리가 더 잘 되게 된다.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;Fine-tuning : 트리 모델과 다르게 fine-tuning이 가능해서 새로운 데이터에도 적응 가능하게끔 한다.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;Interpretability : SHAP을 사용해 각 feature가 예측에 얼마나 기여했는지를 계산한다.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;기존 모델 vs. TabPFN&lt;/b&gt;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 57.0936%; height: 68px;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot; data-ke-style=&quot;style3&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 23.8342%; height: 17px;&quot;&gt;Model&lt;/td&gt;
&lt;td style=&quot;width: 14.6931%; height: 17px;&quot;&gt;해석 난이도&lt;/td&gt;
&lt;td style=&quot;width: 14.2665%; height: 17px;&quot;&gt;성능&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 23.8342%; height: 17px;&quot;&gt;Logistic regression&lt;/td&gt;
&lt;td style=&quot;width: 14.6931%; height: 17px;&quot;&gt;낮음&lt;/td&gt;
&lt;td style=&quot;width: 14.2665%; height: 17px;&quot;&gt;낮음&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 23.8342%; height: 17px;&quot;&gt;CatBoost&lt;/td&gt;
&lt;td style=&quot;width: 14.6931%; height: 17px;&quot;&gt;어려움&lt;/td&gt;
&lt;td style=&quot;width: 14.2665%; height: 17px;&quot;&gt;높음&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 23.8342%; height: 17px;&quot;&gt;TabPFN&lt;/td&gt;
&lt;td style=&quot;width: 14.6931%; height: 17px;&quot;&gt;낮음&lt;/td&gt;
&lt;td style=&quot;width: 14.2665%; height: 17px;&quot;&gt;높음&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Future work&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Scaling to larger datasets : 더 큰 규모의 데이터셋으로 확장하는 연구&lt;/li&gt;
&lt;li&gt;Handling data drift : 시간에 따라 데이터 분포가 변하는 문제(data drift)를 해결하는 방법 연구&lt;/li&gt;
&lt;li&gt;Fine-tuning across related tabular tasks : 서로 관련된 tabular 데이터셋 간에서 모델을 효과적으로 fine-tuning하는 방법 연구&lt;/li&gt;
&lt;li&gt;Understanding theoretical foundations : TabPFN이 왜 잘 작동하는지에 대한 이론적 기반을 더 깊이 이해하는 연구&lt;/li&gt;
&lt;li&gt;Extending to new data modalities : 다양한 데이터 유형으로 확장한 연구들 (시계열 데이터, 멀티모달 데이터, ECG, 신경영상 데이터, 유전체 데이터 등)&lt;/li&gt;
&lt;li&gt;Designing specialized priors : 데이터 유형별로 더 적합한 맞춤형 prior를 설계하는 연구&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/24</guid>
      <comments>https://yuha933.tistory.com/24#entry24comment</comments>
      <pubDate>Mon, 6 Apr 2026 18:30:19 +0900</pubDate>
    </item>
    <item>
      <title>[논문 리뷰] TabNet: Attentive Interpretable Tabular Learning</title>
      <link>https://yuha933.tistory.com/23</link>
      <description>&lt;h2 data-ke-size=&quot;size26&quot;&gt;1. Introduction&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;연구 배경&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DNN은 이미지, 텍스트, 오디오 분야에서 두드러진 성공을 보여왔으며, 원시 데이터를 의미 있는 표현으로 효율적으로 인코딩하는 표준 아키텍처들이 빠른 발전을 이끌었다.&lt;/li&gt;
&lt;li&gt;반면, 테이블 데이터는 가장 흔한 데이터 유형임에도 불구하고, 딥러닝 기반 접근이 상대적으로 덜 탐구되어 왔다. 현재까지도 앙상블 결정 트리의 다양한 변형이 대부분의 응용에서 지배적인 성능을 보이고 있다. 그 이유는 아래와 같다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;표현 효율성 : DT 기반 방법은 테이블 데이터에서 흔히 나타나는 결정 경계를 근사하는 데 효율적이다.&lt;/li&gt;
&lt;li&gt;높은 해석 가능성 : 기본적인 트리 구조를 통해 해석이 쉽고, 앙상블 모델에서도 다양한 사후 해석 기법이 존재한다.&lt;/li&gt;
&lt;li&gt;빠른 학습 속도 : 트리 기반 모델은 일반적으로 학습 속도가 빠르다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;기존 연구의 한계&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존에 제안된 딥러닝 아키텍처들은 테이블 데이터에 적합하지 않은 경우가 많다. 그 이유는 다음과 같으며, CNN이나 MLP 구조의 특정에 기반한다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;overparameterization 문제를 가진다.&lt;/li&gt;
&lt;li&gt;적절한 inductive bias 부족 문제가 있다.&lt;/li&gt;
&lt;li&gt;테이블 데이터의 결정 구조를 잘 학습하지 못하는 경우가 많다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;테이블 데이터에 딥러닝을 적용하려는 이유&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DNN은 트리 기반 방법과 달리 end-to-end 학습이 가능하며, 아래와 같은 장점을 가진다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;이미지 등 다양한 데이터 타입과 함께 효율적인 통합 표현이 학습 가능하다.&lt;/li&gt;
&lt;li&gt;수작업 feature engineering 부담이 감소한다.&lt;/li&gt;
&lt;li&gt;스트리밍 데이터 학습 가능성이 높아진다.&lt;/li&gt;
&lt;li&gt;representation learning을 통한 다양한 응용 확장성이 높다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;위와 같은 특성들은 도메인 적응, 생성 모델링, 반지도 학습을 가능하게 한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;본 연구의 제안&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;본 연구는 테이블 데이터를 위한 새로운 표준 DNN 아키텍처인 TabNet을 제안한다. 주요 기여는 아래와 같다.&amp;nbsp;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;전처리 없이 raw tabular data 입력이 가능해진다.&lt;/li&gt;
&lt;li&gt;Sequential Attention 기반 Feature 선택이 가능하다.&lt;/li&gt;
&lt;li&gt;Local interpretability, Global interpretability의 두 가지 해석 가능성을 제공한다.&lt;/li&gt;
&lt;li&gt;Self-supervisef pretraining을 적용할 수 있다. (이는 tabular 최초 수준이다.)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;2. Related Work&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Feature selection&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Feature selection : 예측에 유용한 feature들의 부분집합을 선택하는 과정
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;global methods : 전체 학습 데이터에 기반하여 feature 중요도를 계산하는 방식이다. (ex. forward selection, Lasso 정규화 등)&lt;/li&gt;
&lt;li&gt;instance-wise feature selection : 각 입력 마다 개별적으로 feature를 선택하는 방식이다. (ex. 선택된 feature들과 응답 변수 간의 mutual information을 최대화하는 설명 모델, actor-critic 프레임워크를 활용하여 baseline을 모방하면서 feature selection을 최적화하는 방법 등)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;TabNet : feature selection을 end-to-end 학습에서 직접 제어한다. 하나의 단일 모델이 feature 선택과 출력 매핑을 동시에 수행하며, 이를 통해 더 컴팩트한 표현과 향상된 성능을 달성한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Tree-based learning&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DT는 통계적으로 가장 중요한 feature를 효율적으로 선택하는 능력 덕분에 테이블 데이터 학습에서 널리 사용되어 왔다. 또한, 기존 DT의 성능을 개선하기 위해 앙상블 기법이 자주 사용되며, 이는 분산을 줄이기 위한 방법이다. 대표적인 트리 기반 모델들을 아래와 같이 있다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Randoem Forest : feature의 부분집합을 사용하여 여러 트리를 학습하는 모델이다.&lt;/li&gt;
&lt;li&gt;XGBoost, LightGBM : 최근 데이터 과학 대회에서 지배적인 성능을 보이는 앙상블 DT 모델이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Integration of DNNs into DTs&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DT와 딥러닝을 결합하려는 연구들도 존재했다. 연구들은 아래와 같으며, 이러한 방법들은 자동 feature selection 성능이 저하되는 문제가 있다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Humbrid (Peterson et al. 2018) : DNN 블록으로 트리를 표현하는 방식이다. 다만, 표현 중복이나 비효율과 같은 문제가 발생한다.&lt;/li&gt;
&lt;li&gt;Soft (neural) decision trees (Yang et al. 2018; Kontschieder et al. 2015) : 미분 불가능한 axis-aligned split 대신 미분 가능한 결정 함수를 사용한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;위의 문제를 보완하기 위한 모델들도 존재했다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Yang et al. (2018) : DT를 DNN에서 시뮬레이션하기 위한 soft binning 함수를 제안했다. 다만, 모든 가능한 분할을 열거해야 하는 비효율이 존재한다.&lt;/li&gt;
&lt;li&gt;Ke et al. (2019) : feature 조합을 명시적으로 활용하는 DNN 구조를 제안했다. 이는 gradient 기반 학습 대신 knowledge transfer 기반 학습이다.&lt;/li&gt;
&lt;li&gt;Tanno et al. (2018) : root부터 leaf까지 구조를 점진적으로 성장시키는 방식을 제안했다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;TabNet : sequential attention을 통해 controllable sparsity를 가진 feature selection을 직접 수행하다는 점에서 기존 연구들과 차별성을 가진다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Self-supervised learning&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;비지도 학습은 특히 데이터가 적은 환경에서 지도 학습 성능을 향상 시킬 수 있으며, 텍스트 분야와 이미지 분야에서 이는 큰 성능 향상을 보여왔다. 이는 적절한 pretraining objective 설계와 attention 기반 딥러닝 구조에 의해 가능했다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;3. TabNet for Tabular Learning&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;본 연구의 아이디어&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DT가 테이블 데이터에서 좋은 이유 : 테이블 데이터에서는 어떤 feature를 사용하는가가 성능을 좌우하는데, DT는 매 단계마다 가장 중요한 feature를 하나 선택해 해당 feature로 데이터를 나누는 걸 잘한다.&lt;/li&gt;
&lt;li&gt;TabNet의 아이디어 : decision boundary를 feature들의 선형 결합을 만들건데, DNN으로 구현해보자 !!&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;TabNet의 핵심 특징&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;feature selection : 고정된 feature를 사용하지 않고, 각 샘플마다 중요한 feature를 고르는 instance-wise feature selection을 사용한다.&lt;/li&gt;
&lt;li&gt;sequential multi-step architecture : 각 단계가 선택된 feature에 기반해 전체 결정의 일부를 담당하는 순차적 다단계 구조를 갖는다.&lt;/li&gt;
&lt;li&gt;non-linear transformation : 선택된 feature를 비선형적으로 처리하여 학습 능력을 향상시킨다.&lt;/li&gt;
&lt;li&gt;ensemble 효과 : 더 높은 차원과 더 많은 step을 사용하여 앙상블 효과를 모방한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;TabNet 전체 구조&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;0. Overall structure&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;입력 feature : 수치형 feature는 raw 값 그대로 사용하고, 범주형 feature는 학습 가능한 임베딩으로 매핑한다. 또한 전체 feature에 대한 전역 정규화는 하지 않고 BN만 적용한다. 모든 decision step에는 동일한 D차원 feature가 입력된다.&lt;/li&gt;
&lt;li&gt;각 step i는 이전 step의 출력을 입력으로 받아, 어떤 feature를 사용할지 결정하고 그 결과를 현재 step의 결정 기여와 다음 step으로 전달할 정보로 나눈다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. Feature selection&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;feature selection 수식 : 이전 step 정보를 보고, 어떤 feature를 쓸지 점수를 매긴 후, sparse하게 선택하는 과정을 갖는다.
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;a[i-1] : 이전 step이 만든 정보&lt;/li&gt;
&lt;li&gt;hj(a[i-1]) : attention 점수를 만들기 위한 변환 (FC+BN)&lt;/li&gt;
&lt;li&gt;P[i-1] : prior 적용으로, 이미 많이 쓴 feature의 점수를 줄이는 역할&lt;/li&gt;
&lt;li&gt;sparsemax : softmax와 달리 일부는 0으로 완전히 제거하여 진짜 선택을 함&amp;nbsp;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;594&quot; data-origin-height=&quot;106&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/tCCQw/dJMcaiW71sv/ZlGKtEXFxWlyHM8VNG8rB1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/tCCQw/dJMcaiW71sv/ZlGKtEXFxWlyHM8VNG8rB1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/tCCQw/dJMcaiW71sv/ZlGKtEXFxWlyHM8VNG8rB1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FtCCQw%2FdJMcaiW71sv%2FZlGKtEXFxWlyHM8VNG8rB1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;54&quot; data-origin-width=&quot;594&quot; data-origin-height=&quot;106&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;실제 적용은 M[i]와 f의 곱으로 이루어지며, M[i] 값이 0이면 완전히 제거하여 선택된 feature만 남게 된다.&lt;/li&gt;
&lt;li&gt;prior 수식 : 어떤 feature가 이미 많이 선택됐으면, 다음 step에서 덜 선택되도록 한다.&lt;/li&gt;
&lt;li&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;497&quot; data-origin-height=&quot;142&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bYWFtJ/dJMcabqez3B/7sgh1ThCKguoJ1S9MRRKIk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bYWFtJ/dJMcabqez3B/7sgh1ThCKguoJ1S9MRRKIk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bYWFtJ/dJMcabqez3B/7sgh1ThCKguoJ1S9MRRKIk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbYWFtJ%2FdJMcabqez3B%2F7sgh1ThCKguoJ1S9MRRKIk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;250&quot; height=&quot;71&quot; data-origin-width=&quot;497&quot; data-origin-height=&quot;142&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/li&gt;
&lt;li&gt;sparsity loss 수식 : 값이 퍼져 있으면 loss가 크게끔, 몇 개만 선택하면 loss가 작게끔 되어 있어서, 적은 feature만 쓰도록 강제하는 역할을 한다.&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;430&quot; data-origin-height=&quot;122&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/wPLT7/dJMcacCFwmt/zNbIXQKfFcdTes27wz3kuk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/wPLT7/dJMcacCFwmt/zNbIXQKfFcdTes27wz3kuk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/wPLT7/dJMcacCFwmt/zNbIXQKfFcdTes27wz3kuk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FwPLT7%2FdJMcacCFwmt%2FzNbIXQKfFcdTes27wz3kuk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;250&quot; height=&quot;122&quot; data-origin-width=&quot;430&quot; data-origin-height=&quot;122&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. Feature processing&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Feature processing 수식 : 선택된 feature를 처리해서 의미 있는 정보로 변환한다.
&lt;ul style=&quot;list-style-type: circle;&quot; data-ke-list-type=&quot;circle&quot;&gt;
&lt;li&gt;d[i] : 현재 step의 판단 결과로, 최종 output에 바로 들어가는 값&lt;/li&gt;
&lt;li&gt;a[i] : 다음 step이 사용할 정보로, 다음 feature selection에 사용됨&lt;/li&gt;
&lt;li&gt;내부적으로는 FC &amp;rarr; BN &amp;rarr; GLU (중요한 정보만 통과 역할) &amp;rarr; residual (정보 손실 방지 역할) 의 구조를 가짐&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;434&quot; data-origin-height=&quot;104&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/2ct1O/dJMcaax5fYb/jVL0zJNL5jpsIsdzVDD21k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/2ct1O/dJMcaax5fYb/jVL0zJNL5jpsIsdzVDD21k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/2ct1O/dJMcaax5fYb/jVL0zJNL5jpsIsdzVDD21k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F2ct1O%2FdJMcaax5fYb%2FjVL0zJNL5jpsIsdzVDD21k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;250&quot; height=&quot;60&quot; data-origin-width=&quot;434&quot; data-origin-height=&quot;104&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3. Decision aggregation&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Decision aggregation 수식 : 각 step의 결과를 다 더한다. ReLU를 사용하여 d[i]값이 음수일 때는 제거, 양수일 때만 기여로 인정한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;445&quot; data-origin-height=&quot;135&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cpQk6O/dJMcagrvJ1c/44EFKeNX4P4ttLkwpnx9N0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cpQk6O/dJMcagrvJ1c/44EFKeNX4P4ttLkwpnx9N0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cpQk6O/dJMcagrvJ1c/44EFKeNX4P4ttLkwpnx9N0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcpQk6O%2FdJMcagrvJ1c%2F44EFKeNX4P4ttLkwpnx9N0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;250&quot; height=&quot;76&quot; data-origin-width=&quot;445&quot; data-origin-height=&quot;135&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;최종 출력 : 마지막 linear layer로 예측값을 생성한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;377&quot; data-origin-height=&quot;83&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/CdWfE/dJMcacvUrzE/pwKYJu1PGLlnkAaZvXwmDk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/CdWfE/dJMcacvUrzE/pwKYJu1PGLlnkAaZvXwmDk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/CdWfE/dJMcacvUrzE/pwKYJu1PGLlnkAaZvXwmDk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FCdWfE%2FdJMcacvUrzE%2FpwKYJu1PGLlnkAaZvXwmDk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;200&quot; height=&quot;44&quot; data-origin-width=&quot;377&quot; data-origin-height=&quot;83&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4. Interpretability&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;step 중요도 수식 : step의 output이 크면 중요한 step으로, 작으면 영향이 거의 없도록 설정했다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;381&quot; data-origin-height=&quot;102&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/djFho9/dJMcagdZYWI/whlOzQtbtWH1OrW5rkSBP0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/djFho9/dJMcagdZYWI/whlOzQtbtWH1OrW5rkSBP0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/djFho9/dJMcagdZYWI/whlOzQtbtWH1OrW5rkSBP0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdjFho9%2FdJMcagdZYWI%2FwhlOzQtbtWH1OrW5rkSBP0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;230&quot; height=&quot;62&quot; data-origin-width=&quot;381&quot; data-origin-height=&quot;102&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;feature 중요도 수식 : 얼마나 많이 선택됐는지, 중요한 step에서 선택됐는지를 모두 반영한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;488&quot; data-origin-height=&quot;133&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/de84h4/dJMcafF9G6a/4o0l4pnqyZKs1sGb2PQKI0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/de84h4/dJMcafF9G6a/4o0l4pnqyZKs1sGb2PQKI0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/de84h4/dJMcafF9G6a/4o0l4pnqyZKs1sGb2PQKI0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fde84h4%2FdJMcafF9G6a%2F4o0l4pnqyZKs1sGb2PQKI0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;250&quot; height=&quot;68&quot; data-origin-width=&quot;488&quot; data-origin-height=&quot;133&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;5. Tabular self-supervised learning&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;일부 feature를 랜덤하게 가린 후, 나머지 feature로 가려진 값을 복원하도록 학습한다. 이는 BERT의 masked learning과 동일한 아이디어로, 결과적으로 라벨 없이 representation을 할 수 있도록 미리 잘 학습한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;4. Experiments&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;학습 설정&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;전처리 : 범주형 feature는 학습 가능한 임베딩을 통해 1차원 스칼라로 매핑하였으며, 수치형 feature는 별도의 전처리 없이 그대로 입력하였다.&lt;/li&gt;
&lt;li&gt;loss : 분류에는 softmax cross entropy를, 회귀에는 mean squared error를 사용하였다.&lt;/li&gt;
&lt;li&gt;optimizer : Adam을 사용하였다.&lt;/li&gt;
&lt;li&gt;initialization : Glorot uniform을 사용하였다.&lt;/li&gt;
&lt;li&gt;모든 실험에서는 기존 연구들과 동일한 train/validation/test split을 사용하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Instance-wise feature selection&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;6개의 tabular 데이터셋을 사용하였으며, 각 데이터셋은 약 10,000개의 학습 샘플로 구성되어 있다.&lt;/li&gt;
&lt;li&gt;Syn1 ~ Syn3 : 중요한 feature가 모든 샘플에서 동일했으며, 이 경우 global feature selection만으로도 좋은 성능을 보일 수 있다.&lt;/li&gt;
&lt;li&gt;Syn4 ~ Syn6 : 중요한 feature가 샘플마다 다르며, 이때는 global feature selection이 최적은 아니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1372&quot; data-origin-height=&quot;845&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/mM0HX/dJMcajhqVNt/OoREHynCykfoir59McAFc0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/mM0HX/dJMcajhqVNt/OoREHynCykfoir59McAFc0/img.png&quot; data-alt=&quot;Table 1&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/mM0HX/dJMcajhqVNt/OoREHynCykfoir59McAFc0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FmM0HX%2FdJMcajhqVNt%2FOoREHynCykfoir59McAFc0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;370&quot; data-origin-width=&quot;1372&quot; data-origin-height=&quot;845&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 1&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 결과
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;TabNet은 Tree Ensemble, Lasso, L2X보다 성능이 좋고, INVASE와는 비슷한 성능을 보인다.&lt;/li&gt;
&lt;li&gt;TabNet은 Syn1~3에서는 global 중요 feature를 잘 찾아내고, Syn4~6에서는 instance-wise feature selection으로 인해 성능이 향상됨을 보인다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Performance on real-world datasets&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. &lt;span style=&quot;background-color: #ffffff; color: #292929; text-align: justify;&quot;&gt;Forest Cover Type&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;(Dua and Graff&lt;span&gt; 2017&lt;/span&gt;)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목표 : 지도 정보로 forest cover type을 분류하자 !&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;672&quot; data-origin-height=&quot;387&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dARVx5/dJMcadnZFeL/zBeaFHVxee6YuHFBpCNS5k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dARVx5/dJMcadnZFeL/zBeaFHVxee6YuHFBpCNS5k/img.png&quot; data-alt=&quot;Table 2&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dARVx5/dJMcadnZFeL/zBeaFHVxee6YuHFBpCNS5k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdARVx5%2FdJMcadnZFeL%2FzBeaFHVxee6YuHFBpCNS5k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;202&quot; data-origin-width=&quot;672&quot; data-origin-height=&quot;387&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 2&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 결과
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;TabNet은 기존 트리 기반 모델보다 성능이 우수하다. 특히, 단일 모델임에도 불구하고, 복잡한 AutoML ensemble보다 더 높은 성능을 보인다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. &lt;span style=&quot;background-color: #ffffff; color: #292929; text-align: justify;&quot;&gt;Poker Hand&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;(Dua and Graff&lt;span&gt; 2017&lt;/span&gt;)&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목표 : 카드 정보로 포커 핸드를 분류하자 !&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;666&quot; data-origin-height=&quot;546&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bmcWBA/dJMcagdZ0mz/4Q0y6mxG1KRCx5aPDlA4Jk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bmcWBA/dJMcagdZ0mz/4Q0y6mxG1KRCx5aPDlA4Jk/img.png&quot; data-alt=&quot;Table 3&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bmcWBA/dJMcagdZ0mz/4Q0y6mxG1KRCx5aPDlA4Jk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbmcWBA%2FdJMcagdZ0mz%2F4Q0y6mxG1KRCx5aPDlA4Jk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;287&quot; data-origin-width=&quot;666&quot; data-origin-height=&quot;546&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 3&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 결과 : TabNet은 깊은 비선형 처리가 가능하고, instance-wise feature selection으로 인해 overfitting를 방지한다. 이에 복잡한 규칙 구조를 잘 학습하게 되고, 높은 성능을 보인다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3. &lt;span style=&quot;background-color: #ffffff; color: #292929; text-align: justify;&quot;&gt;Sarcos&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;(Vijayakumar and Schaal&lt;span&gt; 2000&lt;/span&gt;)&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목표 : 로봇 팔 inverse dynamics 회귀 문제를 풀어보자 !&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;833&quot; data-origin-height=&quot;586&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d5XpZK/dJMcabjtnja/oIb38gMXQVFYLPuIiuqUgk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d5XpZK/dJMcabjtnja/oIb38gMXQVFYLPuIiuqUgk/img.png&quot; data-alt=&quot;Table 4&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d5XpZK/dJMcabjtnja/oIb38gMXQVFYLPuIiuqUgk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd5XpZK%2FdJMcabjtnja%2FoIb38gMXQVFYLPuIiuqUgk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;281&quot; data-origin-width=&quot;833&quot; data-origin-height=&quot;586&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 4&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 결과
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;작은 모델에서도 성능을 유지하며, 큰 모델에서는 압도적인 성능을 보인다. 이는 모델 크기 대비 효율이 매우 좋다는 것을 알 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4. &lt;span style=&quot;background-color: #ffffff; color: #292929; text-align: justify;&quot;&gt;Higgs Boson&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;(Dua and Graff&lt;span&gt; 2017&lt;/span&gt;)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목표 : Higgs vs. background 분류 문제를 해결하자 !&lt;/li&gt;
&lt;li&gt;문제 특징 : 데이터가 매우 크다. (10.5M)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;901&quot; data-origin-height=&quot;490&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/buylAm/dJMcafMWDYX/lhrhU2X0sTGDkPKueuQ7SK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/buylAm/dJMcafMWDYX/lhrhU2X0sTGDkPKueuQ7SK/img.png&quot; data-alt=&quot;Table 5&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/buylAm/dJMcafMWDYX/lhrhU2X0sTGDkPKueuQ7SK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbuylAm%2FdJMcafMWDYX%2FlhrhU2X0sTGDkPKueuQ7SK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;218&quot; data-origin-width=&quot;901&quot; data-origin-height=&quot;490&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 5&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 결과&amp;nbsp;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;TabNet은 더 적은 파라미터로 비슷하거나 더 좋은 성능을 보인다.&lt;/li&gt;
&lt;li&gt;structured sparsity를 유지해 효율적인 sparse 구조를 가진다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;5. &lt;span style=&quot;background-color: #ffffff; color: #292929; text-align: justify;&quot;&gt;Rossmann Store Sales&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;(Kaggle&lt;span&gt; 2019b&lt;/span&gt;)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목표 : 매출을 예측하자 !&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;428&quot; data-origin-height=&quot;368&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b2CRX6/dJMcaf7cugG/1TF2EQ6VNKQK9UTvzIO92k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b2CRX6/dJMcaf7cugG/1TF2EQ6VNKQK9UTvzIO92k/img.png&quot; data-alt=&quot;Table 6&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b2CRX6/dJMcaf7cugG/1TF2EQ6VNKQK9UTvzIO92k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb2CRX6%2FdJMcaf7cugG%2F1TF2EQ6VNKQK9UTvzIO92k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;200&quot; height=&quot;172&quot; data-origin-width=&quot;428&quot; data-origin-height=&quot;368&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 6&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 결과&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;시간 feature가 중요하게 선택된다.&lt;/li&gt;
&lt;li&gt;휴일 같은 특수 상황에서 instance-wise selection 효과를 보인다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Interpretability&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Synthetic dataset
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;3348&quot; data-start=&quot;3205&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3274&quot; data-start=&quot;3205&quot;&gt;Syn2에서 output이 X3 ~ X6만 의존하는데 TabNet도 정확히 해당 feature만 선택하는 것을 볼 수 있다.&lt;/li&gt;
&lt;li data-end=&quot;3274&quot; data-start=&quot;3205&quot;&gt;Syn4에서 조건에 따라 다른 feature를 선택해 TabNet이 정확히 instance-wise 선택을 수행하는 것을 볼 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Real-world Dataset
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Mushroom dataset에서 Odor feature가 가장 중요한 feature인데 기존 방법들은 이에 대한 중요도를 30% 미만으로 잡는 것에 반해, TabNet은 43% 중요도로 본다.&lt;/li&gt;
&lt;li&gt;Adult dataset에서 기존 연구와 일관된 feature importance를 제공한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;결과적으로, TabNet은 실제로 해석 가능한 feature 중요도를 제공한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;3660&quot; data-start=&quot;3631&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Self-supervised learning&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;801&quot; data-origin-height=&quot;321&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/vsyr6/dJMcaiJA0rc/S1EcKOu12WPlkE9XHdoUhK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/vsyr6/dJMcaiJA0rc/S1EcKOu12WPlkE9XHdoUhK/img.png&quot; data-alt=&quot;Table 7&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/vsyr6/dJMcaiJA0rc/S1EcKOu12WPlkE9XHdoUhK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fvsyr6%2FdJMcaiJA0rc%2FS1EcKOu12WPlkE9XHdoUhK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;160&quot; data-origin-width=&quot;801&quot; data-origin-height=&quot;321&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 7&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;915&quot; data-origin-height=&quot;614&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/yzy36/dJMcaduJlwd/t5GkXw2MIQxmejCdv5oJ3K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/yzy36/dJMcaduJlwd/t5GkXw2MIQxmejCdv5oJ3K/img.png&quot; data-alt=&quot;Figure 7&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/yzy36/dJMcaduJlwd/t5GkXw2MIQxmejCdv5oJ3K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fyzy36%2FdJMcaduJlwd%2Ft5GkXw2MIQxmejCdv5oJ3K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;201&quot; data-origin-width=&quot;915&quot; data-origin-height=&quot;614&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Figure 7&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3660&quot; data-start=&quot;3631&quot;&gt;pretraining이 특히 데이터가 적을 때 큰 효과를 보인다.&lt;/li&gt;
&lt;li data-end=&quot;3660&quot; data-start=&quot;3631&quot;&gt;학습 속도도 더 빠른 것을 볼 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;5. Conclusion&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;본 연구의 기여&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;각 decision step마다 의미적으로 중요한 일부 feature를 선택하여 처리하기 위해 순차적 attention 메커니즘을 사용한다.&lt;/li&gt;
&lt;li&gt;instance-wise feature 선택을 통해 모델의 용량이 중요한 feature에 집중적으로 사용되도록 하여 학습 효율을 높인다.&lt;/li&gt;
&lt;li&gt;selection mask를 시각화함으로써 더 높은 해석 가능성을 제공한다.&amp;nbsp;&lt;/li&gt;
&lt;li&gt;unsupervised pre-training이 fast adaptation과 성능 향상에 큰 이점을 제공함을 보인다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/23</guid>
      <comments>https://yuha933.tistory.com/23#entry23comment</comments>
      <pubDate>Mon, 6 Apr 2026 17:10:02 +0900</pubDate>
    </item>
    <item>
      <title>[Denoising Diffusion Probabilistic Models] 논문 리뷰</title>
      <link>https://yuha933.tistory.com/20</link>
      <description>&lt;h2 data-ke-size=&quot;size26&quot;&gt;1. Introduction&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;연구 배경&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;최근 Deep Generative Models은 다양한 데이터 도메인에서 고품질 샘플 생성 성능을 보여왔으며, 이미지 및 오디오 생성에서 매우 높은 품질을 달성했다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;GAN (Generative Adversarial Networks)&lt;/li&gt;
&lt;li&gt;Autogressive Models&lt;/li&gt;
&lt;li&gt;Flow-Based Models&lt;/li&gt;
&lt;li&gt;VAEs (Variational Autoencoders)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;또한, GAN과 경쟁 가능한 이미지 생성 성능을 보이는 모델들도 등장하기 시작했다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;EBM (Energy-Based Models)&lt;/li&gt;
&lt;li&gt;Score Matching 기반 Model&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;기존 연구(Diffusion)의 한계&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Diffusion 모델이 존재하기는 했으나, 고품질 샘플 생성 능력이 입증되지 않았다.&lt;/li&gt;
&lt;li&gt;기존 확률 모델들과 비교했을 때, log-likelihood 성능의 경쟁력이 부족했다.&lt;/li&gt;
&lt;li&gt;모델이 많은 비트를 사람이 인지 못하는 디테일 표현에 낭비했다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Diffusion Probabilistic Model 제안&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;Forward process
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;원래 이미지 x0에 아주 조금씩 Gaussian noise를 계속 추가하는 과정&lt;/li&gt;
&lt;li&gt;모델이 학습하는 과정이 아니라, 사람이 정의하는 과정을 말한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Reverse process
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;noise를 없애 원래 이미지 x0로 다시 되돌리는 과정&lt;/li&gt;
&lt;li&gt;모델이 학습하는 과정으로, 한 번에 복원하는 것이 아니라 한 step씩 조금씩 복원한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;※ noise가 Gaussian이면, reverse도 Gaussian으로 설정 가능하다.&lt;/p&gt;
&lt;pre id=&quot;code_1774408188173&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;[Forward process]
x0 (원본 이미지)
-&amp;gt; x1 (조금 흐림)
-&amp;gt; x2 (더 흐림)
-&amp;gt; ...
-&amp;gt; xT (noise)

[Reverse process]
xT (Noise)
-&amp;gt; xT-1
-&amp;gt; xT-2
-&amp;gt; ...
-&amp;gt; x0 (원본 이미지)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;본 논문의 기여&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1125&quot; data-origin-height=&quot;683&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/btxVfE/dJMcaiioYjr/RJtdv9jkYK7F93K2YjMKpK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/btxVfE/dJMcaiioYjr/RJtdv9jkYK7F93K2YjMKpK/img.png&quot; data-alt=&quot;Figure 1&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/btxVfE/dJMcaiioYjr/RJtdv9jkYK7F93K2YjMKpK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbtxVfE%2FdJMcaiioYjr%2FRJtdv9jkYK7F93K2YjMKpK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;364&quot; data-origin-width=&quot;1125&quot; data-origin-height=&quot;683&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Figure 1&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Diffusion 모델이 실제로 고품질 이미지 생성이 가능함을 최초로 입증하였으며, 일부 경우에는 기존 모델보다 우수함을 보인다.&lt;/li&gt;
&lt;li&gt;Training은 Denoising Score Matching 과정과 같고, Sampling은 Langevin Dynamics와 같아 Diffusion을 새로운 모델로 보기보다 기존 score-based generative model과 같은 계열로 볼 수 있는 이론적 연결성을 확보했다.&lt;/li&gt;
&lt;li data-end=&quot;1965&quot; data-start=&quot;1941&quot;&gt;Gaussian 기반 transition을 사용함으로써, 복잡한 구조 없이 학습이 가능함을 보인다.&lt;/li&gt;
&lt;li data-end=&quot;1965&quot; data-start=&quot;1941&quot;&gt;특정 순서로 생성하는 것이 아니라 bit ordering 자체를 일반화하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;2. Background&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Reverse Process : 데이터를 생성하는 모델을 정의하자 !!&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1065&quot; data-origin-height=&quot;109&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/btfmTV/dJMcadg6dZ7/e5aztEMXekeFDXF9aV9kD1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/btfmTV/dJMcadg6dZ7/e5aztEMXekeFDXF9aV9kD1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/btfmTV/dJMcadg6dZ7/e5aztEMXekeFDXF9aV9kD1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbtfmTV%2FdJMcadg6dZ7%2Fe5aztEMXekeFDXF9aV9kD1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;51&quot; data-origin-width=&quot;1065&quot; data-origin-height=&quot;109&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;noise에서 시작해서 이미지를 생성하는 과정
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;시작 : xT ~ N(0,I)&lt;/li&gt;
&lt;li&gt;xT -&amp;gt; xT-1 -&amp;gt; xT-2 -&amp;gt; ... -&amp;gt; x0 로 가면서 점점 덜 noisy한 상태로 만든다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;한 번에 이미지를 생성하는 것이 아니라, 조금씩 복원하는 형태의 모델이다.\&lt;/li&gt;
&lt;li&gt;다만, xT만 보고 xT-1의 정답 분포를 알 수 없다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Forward Process : 반대로 가는 과정을 우리가 만들자 !!&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;928&quot; data-origin-height=&quot;105&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/NDv2n/dJMcabQ6YKP/inen4wT3bruES5kGEDTbc0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/NDv2n/dJMcabQ6YKP/inen4wT3bruES5kGEDTbc0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/NDv2n/dJMcabQ6YKP/inen4wT3bruES5kGEDTbc0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FNDv2n%2FdJMcabQ6YKP%2Finen4wT3bruES5kGEDTbc0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;450&quot; height=&quot;51&quot; data-origin-width=&quot;928&quot; data-origin-height=&quot;105&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;이미지에 noise를 조금 추가하는 과정
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;x0 -&amp;gt; x1 -&amp;gt; x2 -&amp;gt; ... -&amp;gt; xT로 가면서 결과 완전 noise가 된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;이 과정은 학습을 하지 않고, reverse를 forward 기반으로 학습할 수 있게 된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Training Objective (ELBO) : 진짜 likelihood 대신 계산 가능한 목표를 최적화하자 !!&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;log p&amp;theta;(x0) 가 필요하지만, 직접 계산은 어렵다.&lt;/li&gt;
&lt;li&gt;그래서 ELBO를 도입한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1186&quot; data-origin-height=&quot;96&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/wPN4E/dJMcahcJbdI/qdPsoKa6IE9Slr44ZAi1Kk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/wPN4E/dJMcahcJbdI/qdPsoKa6IE9Slr44ZAi1Kk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/wPN4E/dJMcahcJbdI/qdPsoKa6IE9Slr44ZAi1Kk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FwPN4E%2FdJMcahcJbdI%2FqdPsoKa6IE9Slr44ZAi1Kk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;40&quot; data-origin-width=&quot;1186&quot; data-origin-height=&quot;96&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;다만, ELBO는 식이 매우 복잡하고, 직관적이지 않아 계산이 어렵고 그로 인해 학습이 불가능하다.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; ELBO 분해 : ELBO를 시간 step별로 쪼개자 !!&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1169&quot; data-origin-height=&quot;124&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bwNyz9/dJMcafzeMBb/7N6BY3YawMnVztbRwTNTW0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bwNyz9/dJMcafzeMBb/7N6BY3YawMnVztbRwTNTW0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bwNyz9/dJMcafzeMBb/7N6BY3YawMnVztbRwTNTW0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbwNyz9%2FdJMcafzeMBb%2F7N6BY3YawMnVztbRwTNTW0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;53&quot; data-origin-width=&quot;1169&quot; data-origin-height=&quot;124&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;ELBO를 풀어쓰면, 각 step마다 모델이 진짜 분포를 잘 맞추고 있는지에 대한 비교 형태로 변한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;474&quot; data-origin-height=&quot;86&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/lllub/dJMcafsruKV/WXkIQxuwUmuar7sdzs64hk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/lllub/dJMcafsruKV/WXkIQxuwUmuar7sdzs64hk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/lllub/dJMcafsruKV/WXkIQxuwUmuar7sdzs64hk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Flllub%2FdJMcafsruKV%2FWXkIQxuwUmuar7sdzs64hk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;200&quot; height=&quot;36&quot; data-origin-width=&quot;474&quot; data-origin-height=&quot;86&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;다만, 우리는 &lt;span&gt;q&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;span&gt;&amp;minus;&lt;/span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;) 를 알아야 KL 계산이 가능하다.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Forward process 기반으로 이 posterior를 계산해보자 !!&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1089&quot; data-origin-height=&quot;140&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/wUehO/dJMcaipbLP8/qudksNRMsrjXvnyc8hapk1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/wUehO/dJMcaipbLP8/qudksNRMsrjXvnyc8hapk1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/wUehO/dJMcaipbLP8/qudksNRMsrjXvnyc8hapk1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FwUehO%2FdJMcaipbLP8%2FqudksNRMsrjXvnyc8hapk1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;64&quot; data-origin-width=&quot;1089&quot; data-origin-height=&quot;140&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;posterior기 Gaussian으로 close-form이 존재한다.&lt;/li&gt;
&lt;li&gt;KL 계산, Loss 계산, gradient 계산이 모두 가능하다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;3. Diffusion&amp;nbsp;models&amp;nbsp;and&amp;nbsp;denoising&amp;nbsp;autoencoders&lt;/h2&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3.1 Forward process and &lt;span&gt;&lt;span&gt;L_T&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;forward varience &amp;beta;t는 그냥 고정값으로 쓴다.&lt;/li&gt;
&lt;li&gt;Forward process에서 q는 학습할 게 없기 때문에, 결론적으로 L_T는 상수가 된다. 따라서, 학습에 영향이 없으므로 무시해도 된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;466&quot; data-origin-height=&quot;107&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/HHugY/dJMcahRkq4O/KE8ZSouf2eqrxy5kpphJe0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/HHugY/dJMcahRkq4O/KE8ZSouf2eqrxy5kpphJe0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/HHugY/dJMcahRkq4O/KE8ZSouf2eqrxy5kpphJe0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FHHugY%2FdJMcahRkq4O%2FKE8ZSouf2eqrxy5kpphJe0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;200&quot; height=&quot;46&quot; data-origin-width=&quot;466&quot; data-origin-height=&quot;107&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3.2 Reverse process and &lt;span&gt;&lt;span&gt;L_{1:T&amp;minus;1}&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목표 : 모델 &lt;span&gt;&lt;span&gt;p&amp;theta;(xt&amp;minus;1∣xt)&lt;/span&gt;&lt;/span&gt;가 진짜 posterior &lt;span&gt;&lt;span&gt;q(xt&amp;minus;1∣xt,x0)&lt;/span&gt;&lt;/span&gt;를 맞추게 한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Gaussian 구조 활용&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;KL 최소화하려면 평균을 맞추면 되므로, 모델은 posterior mean을 맞추도록 하면 된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;587&quot; data-origin-height=&quot;94&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nrVwU/dJMcabKoaED/ePch2TPapYqZvTkDTfPsvk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nrVwU/dJMcabKoaED/ePch2TPapYqZvTkDTfPsvk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nrVwU/dJMcabKoaED/ePch2TPapYqZvTkDTfPsvk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnrVwU%2FdJMcabKoaED%2FePch2TPapYqZvTkDTfPsvk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;48&quot; data-origin-width=&quot;587&quot; data-origin-height=&quot;94&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;문제 전환&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문제 : 정답에 x0가 필요한데, 모델은 xt만 본다.&lt;/li&gt;
&lt;li&gt;해결 : xt를 다시 표현한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;493&quot; data-origin-height=&quot;74&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/w7RSW/dJMcahX6o2a/mzjhRgck3IeWN2OW44KE5K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/w7RSW/dJMcahX6o2a/mzjhRgck3IeWN2OW44KE5K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/w7RSW/dJMcahX6o2a/mzjhRgck3IeWN2OW44KE5K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fw7RSW%2FdJMcahX6o2a%2FmzjhRgck3IeWN2OW44KE5K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;250&quot; height=&quot;38&quot; data-origin-width=&quot;493&quot; data-origin-height=&quot;74&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;결과적으로, loss가 &amp;micro;를 맞추기보다 ɛ를 맞추는 것으로 변환되었고, 이미지 복원 문제에서 노이즈 예측 문제로 전화되었다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;최종 parameterization&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1062&quot; data-origin-height=&quot;89&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cYUlPL/dJMcaiJrmWN/ElFkgGH165vyDV8fFXV2N0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cYUlPL/dJMcaiJrmWN/ElFkgGH165vyDV8fFXV2N0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cYUlPL/dJMcaiJrmWN/ElFkgGH165vyDV8fFXV2N0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcYUlPL%2FdJMcaiJrmWN%2FElFkgGH165vyDV8fFXV2N0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;42&quot; data-origin-width=&quot;1062&quot; data-origin-height=&quot;89&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;수식 의미 : 이 이미지에 섞인 노이즈는 무엇인가 (denoising 문제)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;최종 Loss 형태&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;787&quot; data-origin-height=&quot;95&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b8vEPu/dJMcab4Esvu/916MkaryHLms1jgtfHQ9PK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b8vEPu/dJMcab4Esvu/916MkaryHLms1jgtfHQ9PK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b8vEPu/dJMcab4Esvu/916MkaryHLms1jgtfHQ9PK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb8vEPu%2FdJMcab4Esvu%2F916MkaryHLms1jgtfHQ9PK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;42&quot; data-origin-width=&quot;787&quot; data-origin-height=&quot;95&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;단순 MSE로, 실제 noise와 모델이 예측한 noise를 비교한 값이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Algorithm 1 (Training)&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;725&quot; data-origin-height=&quot;351&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/GMoJK/dJMcafFZXVj/K2KNdru4bS4Y0OF6BJmqBk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/GMoJK/dJMcafFZXVj/K2KNdru4bS4Y0OF6BJmqBk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/GMoJK/dJMcafFZXVj/K2KNdru4bS4Y0OF6BJmqBk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FGMoJK%2FdJMcafFZXVj%2FK2KNdru4bS4Y0OF6BJmqBk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;145&quot; data-origin-width=&quot;725&quot; data-origin-height=&quot;351&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Algorithm 2 (Sampling)&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;763&quot; data-origin-height=&quot;364&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/5oGHP/dJMcaaLsamf/KhKCU2KGSno1ZFPbrxKLKk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/5oGHP/dJMcaaLsamf/KhKCU2KGSno1ZFPbrxKLKk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/5oGHP/dJMcaaLsamf/KhKCU2KGSno1ZFPbrxKLKk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F5oGHP%2FdJMcaaLsamf%2FKhKCU2KGSno1ZFPbrxKLKk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;143&quot; data-origin-width=&quot;763&quot; data-origin-height=&quot;364&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3.3 Data scaling, reverse process decoder, and L_0&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Data scaling&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;원본 이미지 : 0 ~ 255&lt;/li&gt;
&lt;li&gt;모델 입력 : [-1, 1]&lt;/li&gt;
&lt;li&gt;Gaussian 기반 모델에 맞추기 위해 이미지를 연속 공간으로 변환한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Decoder의 필요성&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문제 : 모델은 x0를 연속 값으로 출력하고, 실제 데이터는 이산 픽셀값이라 직접 비교가 불가능하다.&lt;/li&gt;
&lt;li&gt;해결 : 마지막 단계에 discrete decoder을 추가한다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;decoder는 Gaussian 분포를 바로 픽셀값으로 쓰지 않고, 각 픽셀 구간에 들어갈 확률로 변환하는 역할을 한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;결과적으로, 실제 이미지 데이터에 대해 올바른 log-likelihood 계산이 가능하다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;L_0의 의미&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;마지막 복원 단계의 loss로, x1에서 x0으로의 복원 정확도를 의미한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3.4 Simplified training objective&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Section 3.2의 결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;901&quot; data-origin-height=&quot;99&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bllPnT/dJMcabjjy55/4WhkK7nWeYKzL1JhtRqPN0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bllPnT/dJMcabjjy55/4WhkK7nWeYKzL1JhtRqPN0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bllPnT/dJMcabjjy55/4WhkK7nWeYKzL1JhtRqPN0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbllPnT%2FdJMcabjjy55%2F4WhkK7nWeYKzL1JhtRqPN0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;44&quot; data-origin-width=&quot;901&quot; data-origin-height=&quot;99&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;timestep마다 가중치가 존재하기 때문에, 가중치가 복잡하고 학습 불안정의 문제가 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;simple loss : 가중치를 제거하자 !!&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;897&quot; data-origin-height=&quot;110&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/9eYM6/dJMcahX6sC9/GZChNopW6Z78pBvhXIO9L1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/9eYM6/dJMcahX6sC9/GZChNopW6Z78pBvhXIO9L1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/9eYM6/dJMcahX6sC9/GZChNopW6Z78pBvhXIO9L1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F9eYM6%2FdJMcahX6sC9%2FGZChNopW6Z78pBvhXIO9L1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;49&quot; data-origin-width=&quot;897&quot; data-origin-height=&quot;110&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;t값이 작으면 noise가 적어 쉬운 문제로 인식하고, 크면 noise가 많아 어려운 문제로 인식한다.&lt;/li&gt;
&lt;li&gt;따라서, 어려운 문제에 더 집중할 있어, 샘플 품질 향상을 돕는다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;4. Experiments&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 설정&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;325&quot; data-start=&quot;311&quot;&gt;&lt;span&gt;&lt;span&gt;T = 1000&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;367&quot; data-start=&quot;326&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;beta;&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;t &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;: 0.0001&lt;span&gt;&lt;span&gt;&amp;rarr;0.02&lt;/span&gt;&lt;/span&gt;&amp;nbsp;(선형 증가)&lt;/li&gt;
&lt;li data-end=&quot;400&quot; data-start=&quot;368&quot;&gt;모델 : U-Net + self-attention&lt;/li&gt;
&lt;li data-end=&quot;426&quot; data-start=&quot;401&quot;&gt;positional embedding 사용&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4.1 Sample&amp;nbsp;quality&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 내용&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;CIFAR-10 기준&lt;/li&gt;
&lt;li&gt;metrics
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;IS (Inception Score)&lt;/li&gt;
&lt;li&gt;FID&lt;/li&gt;
&lt;li&gt;NLL&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;657&quot; data-origin-height=&quot;500&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bfE9qe/dJMcahDMJ3f/X7QSJ3gLmNyWR6BNysshl0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bfE9qe/dJMcahDMJ3f/X7QSJ3gLmNyWR6BNysshl0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bfE9qe/dJMcahDMJ3f/X7QSJ3gLmNyWR6BNysshl0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbfE9qe%2FdJMcahDMJ3f%2FX7QSJ3gLmNyWR6BNysshl0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;304&quot; data-origin-width=&quot;657&quot; data-origin-height=&quot;500&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;FID = 3.17로, 기존 모델들보다 우수하며, conditional 모델보다도 좋다.&lt;/li&gt;
&lt;li&gt;Diffusion은 likelihood 최적화보다 샘플 품질 측면에서 강력하다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;샘플 결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;384&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dMcjp1/dJMcacCvOI8/AKBKf0bB5R0w7r9jM6vFJ1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dMcjp1/dJMcacCvOI8/AKBKf0bB5R0w7r9jM6vFJ1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dMcjp1/dJMcacCvOI8/AKBKf0bB5R0w7r9jM6vFJ1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdMcjp1%2FdJMcacCvOI8%2FAKBKf0bB5R0w7r9jM6vFJ1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;194&quot; data-origin-width=&quot;989&quot; data-origin-height=&quot;384&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;매우 자연스러운 구조로, texture 및 lighting이 안정적이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4.2 Reverse process parameterization and training objective ablation&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 내용&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&amp;micro; 예측 (baseline)&lt;/li&gt;
&lt;li&gt;ɛ 예측 (본 논문 제안)&lt;/li&gt;
&lt;li&gt;varience 학습 vs. 고정&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;445&quot; data-origin-height=&quot;286&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/rlgvm/dJMcajhhc79/IcnmAHc4ImIPI9KdD3XYak/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/rlgvm/dJMcajhhc79/IcnmAHc4ImIPI9KdD3XYak/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/rlgvm/dJMcajhhc79/IcnmAHc4ImIPI9KdD3XYak/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Frlgvm%2FdJMcajhhc79%2FIcnmAHc4ImIPI9KdD3XYak%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;225&quot; data-origin-width=&quot;445&quot; data-origin-height=&quot;286&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&amp;micro; 예측 : variational bound에서만 잘 작동하고, simple objective에서는 성능 나쁘다.&lt;/li&gt;
&lt;li&gt;ɛ 예측 : simple objective에서 가장 좋다.&lt;/li&gt;
&lt;li&gt;varience 학습 : unstable하고, 성능이 낮다.&lt;/li&gt;
&lt;li&gt;결과적으로, 노이즈 추정 문제로 바꾸는 것이 핵심이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4.3 Progressive&amp;nbsp;coding&lt;/h3&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Progressive&amp;nbsp;lossy&amp;nbsp;compression&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 내용&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1782&quot; data-start=&quot;1731&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1782&quot; data-start=&quot;1759&quot;&gt;rate-distortion 분석 수행&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;결과&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;train/test gap이 매우 작으며, overfitting이 없다.&lt;/li&gt;
&lt;li&gt;다만, likelihood 성능이 SOTA는 아니다.&lt;/li&gt;
&lt;li&gt;결과적으로, Diffusion은 좋은 Generative 모델이지만 likelihood 모델로 최고는 아니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;그래프 결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1134&quot; data-origin-height=&quot;379&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xc2Md/dJMcahcJm6k/ycJQNZpsTscRkwWmOWiufk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xc2Md/dJMcahcJm6k/ycJQNZpsTscRkwWmOWiufk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xc2Md/dJMcahcJm6k/ycJQNZpsTscRkwWmOWiufk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fxc2Md%2FdJMcahcJm6k%2FycJQNZpsTscRkwWmOWiufk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;167&quot; data-origin-width=&quot;1134&quot; data-origin-height=&quot;379&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;초반에는 distortion이 급격히 감소한다.&amp;nbsp;&lt;/li&gt;
&lt;li&gt;대부분의 bit는 미세 디테일에 사용된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Progressive&amp;nbsp;generation&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 내용&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;생성 과정 중간 상태 관찰&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;그래프 결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1117&quot; data-origin-height=&quot;301&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cfxgQ0/dJMcagEVqO7/TMxhoMJgtkf3myQHy9gwzk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cfxgQ0/dJMcagEVqO7/TMxhoMJgtkf3myQHy9gwzk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cfxgQ0/dJMcagEVqO7/TMxhoMJgtkf3myQHy9gwzk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcfxgQ0%2FdJMcagEVqO7%2FTMxhoMJgtkf3myQHy9gwzk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;162&quot; data-origin-width=&quot;1117&quot; data-origin-height=&quot;301&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1132&quot; data-origin-height=&quot;289&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c1WilI/dJMcabXTH9s/U4pdphyzLEux9KQhTC091K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c1WilI/dJMcabXTH9s/U4pdphyzLEux9KQhTC091K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c1WilI/dJMcabXTH9s/U4pdphyzLEux9KQhTC091K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc1WilI%2FdJMcabXTH9s%2FU4pdphyzLEux9KQhTC091K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;153&quot; data-origin-width=&quot;1132&quot; data-origin-height=&quot;289&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;초기에는 큰 구조만 생성하고, 후반에 가서 디테일을 생성한다.&lt;/li&gt;
&lt;li&gt;diffusion은 coarse에서 fine 생성 구조를 유지한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;Connection&amp;nbsp;to&amp;nbsp;autoregressive&amp;nbsp;decoding&lt;/h4&gt;
&lt;p data-end=&quot;2785&quot; data-start=&quot;2751&quot; data-ke-size=&quot;size16&quot;&gt;Diffusion = autoregressive의 일반화&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2785&quot; data-start=&quot;2751&quot;&gt;Diffusion objective를 재해석하면 autoregressive 형태로 변환 가능하다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4.4 Interpolation&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 내용&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;3237&quot; data-start=&quot;3204&quot;&gt;latent space에서 interpolation 수행&lt;/li&gt;
&lt;li data-end=&quot;3259&quot; data-start=&quot;3238&quot;&gt;reverse process로 복원&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1145&quot; data-origin-height=&quot;304&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cmAS2T/dJMcad2qiar/As5GyKjxsyftxLDeDHSyG1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cmAS2T/dJMcad2qiar/As5GyKjxsyftxLDeDHSyG1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cmAS2T/dJMcad2qiar/As5GyKjxsyftxLDeDHSyG1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcmAS2T%2FdJMcad2qiar%2FAs5GyKjxsyftxLDeDHSyG1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;159&quot; data-origin-width=&quot;1145&quot; data-origin-height=&quot;304&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;diffusion latent space는 의미 있는 구조를 가진다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;5. Related Work&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;flows / VAEs&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;flows/VAEs : latent 데이터 정보 유지&lt;/li&gt;
&lt;li&gt;Diffusion : 완전한 noise로 만들어버림&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Score Matching / Langevin Dynamics&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Diffusion은 새로운 모델이 아니라 score-based 모델을 variational inference로 표현한 것이다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;training :&amp;nbsp;denoising score matching&lt;/li&gt;
&lt;li&gt;sampling : annealed Langevin dynamics&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Markov chain&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;본 연구는 Markov chain 기반 생성 모델은 기존에도 있었지만 Diffusion은 가장 tractable하고 단순한 구조이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Energy-Based Model (EBM)&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Diffusion은 EBM까지 확장 가능한 프레임워크이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Rate-Distortion / Compression&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;diffusion은 단순 생성 모델이 아니라 compression 모델로 해석 가능하다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Progressive decoding&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Diffusion 생성 과정은 한 번에 복원이 아니라, 점진적인 복원으로 이루어진다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Autoregressive Models&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Diffusion은 autoregressive 모델의 일반화된 형태이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;6. Conclusion&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;의의&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;본 논문은 diffusion 모델을 통해 고품질 생성이 가능함을 입증하고, 이를 score matching&amp;middot;Langevin dynamics&amp;middot;variational inference와 연결하여 생성 모델의 통합적 프레임워크를 제시한 연구이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Future work&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;다른 데이터(audio, etc.)로 확장 가능하다.&lt;/li&gt;
&lt;li&gt;다른 Generative Model 구성 요소로 활용될 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/20</guid>
      <comments>https://yuha933.tistory.com/20#entry20comment</comments>
      <pubDate>Wed, 25 Mar 2026 14:04:35 +0900</pubDate>
    </item>
    <item>
      <title>[DeepSeek] 논문 리뷰</title>
      <link>https://yuha933.tistory.com/19</link>
      <description>&lt;h2 data-ke-size=&quot;size26&quot;&gt;1. Introduction&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;연구 배경&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;추론 능력의 중요성 : 추론 능력은 인간 지능의 핵심 요소이며, 수학 문제 해결, 논리적 추론, 프로그래밍과 같은 복잡한 인지 작업 수행이 필수적이다.&lt;/li&gt;
&lt;li&gt;추론 능력 향상을 위한 방법론 등장 :
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Chain-of-Thought(CoT) 프롬프팅을 사용하면, 모델이 중간 추론 과정을 생성하면서 복잡한 문제 해결 성능이 향상된다.&lt;/li&gt;
&lt;li&gt;post-training 단계에서 multi-stop reasoning trajectory를 학습하면, 추가적인 성능 향상이 관찰된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;기존 연구의 한계&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;인간 주석에 의존하기 때문에, 확장성이 제한되고, 동시에 인지적 편향이 도입될 수 있다.&lt;/li&gt;
&lt;li&gt;모델이 인간의 사고 과정을 그대로 모방하도록 제한되기 때문에, 성능이 인간이 제공한 예시의 수준에 의해 제한된다.&lt;/li&gt;
&lt;li&gt;인간 추론 패턴을 모방하도록 제한되어, 비인간적이고 더 효율적인 추론 전략을 발견하기 어렵다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;기존 연구의 수정 방향 및 한계&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;수정 방향 아이디어 : 인간 추론을 모방하는 것이 아니라 정답 기반 reward만 제공하여 자율적인 추론 학습을 유도하자 !!&lt;/li&gt;
&lt;li&gt;DeepSeek-R1-Zero
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;SFT 없이 RL만으로 추론 능력을 학습한다.&lt;/li&gt;
&lt;li&gt;최종 답의 정확도만 reward로 사용한다.&lt;/li&gt;
&lt;li&gt;추론 과정에는 제약을 두지 않는다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;한계
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;가독성이 낮다.&lt;/li&gt;
&lt;li&gt;한 응답의 CoT 내부에서 영어와 중국어가 혼합되는 현상이 발생하기도 한다.&lt;/li&gt;
&lt;li&gt;reasoning task에만 특화되어 있어.writing 및 general QA 성능이 제한된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;DeepSeek-R1 제안&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2368&quot; data-start=&quot;2346&quot; data-section-id=&quot;9u90oc&quot;&gt;DeepSeek-R1의 multi-stage training pipeline :
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2368&quot; data-start=&quot;2346&quot; data-section-id=&quot;9u90oc&quot;&gt;rejection sampling : RL 과정에서 생성된 다양한 reasoning trajectory 중 품질이 높은 reasoning 데이터만 선택해, 고품질 reasoning 데이터셋을 구축한다.&lt;/li&gt;
&lt;li data-end=&quot;2395&quot; data-start=&quot;2369&quot; data-section-id=&quot;1xi597n&quot;&gt;reinforcement learning (RL) : Group Relative Policy Optimization 기반 RL을 적용한다.&lt;/li&gt;
&lt;li data-end=&quot;2395&quot; data-start=&quot;2369&quot; data-section-id=&quot;1xi597n&quot;&gt;supervised fine-tuning (SFT) : rejection sampling으로 선별된 데이터를 활용하여 지도학습을 수행해, 모델의 출력 품질을 향상시키고 가독성, 언어 일광선, 일반적인 QA 및 writing 능력을 개선한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;2. DeepSeek-R1-Zero&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DeepSeek-R1-Zero의 학습 방식 : SFT를 사용하지 않고, 오직 강화학습에만 의존하여 학습된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;2.1 Group Relative Policy Optimization&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;GRPO&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DeepSeek-R1-Zero와 DeepSeek-R1을 학습하기 위해 사용하는 강화학습 알고리즘으로, PPO의 학습 과정을 단순화하고 자원 사용량을 줄이기 위해 제안되었다.&lt;/li&gt;
&lt;li&gt;학습 방식 :&amp;nbsp;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;이전 정책으로부터 여러 개의 출력을 샘플링한다&lt;/li&gt;
&lt;li&gt;이후 정책 모델을 다음 목적함수를 최대화하도록 최적화한다.&lt;br /&gt;(목적함수의 의미 : 좋은 답변의 확률은 높이고, 나쁜 답변의 확률은 낮추면서, 모델이 갑자기 크게 변하지는 않도록 한다.)&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;blob&quot; data-origin-width=&quot;1274&quot; data-origin-height=&quot;220&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bfRoob/dJMcaiidWDB/n0fuqbGaNJqf8zJKVI1ED0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bfRoob/dJMcaiidWDB/n0fuqbGaNJqf8zJKVI1ED0/img.png&quot; data-alt=&quot;목적 함수&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bfRoob/dJMcaiidWDB/n0fuqbGaNJqf8zJKVI1ED0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbfRoob%2FdJMcaiidWDB%2Fn0fuqbGaNJqf8zJKVI1ED0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;121&quot; data-filename=&quot;blob&quot; data-origin-width=&quot;1274&quot; data-origin-height=&quot;220&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;목적 함수&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;※ KL divergence&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;policy가 reference model에서 너무 멀어지는 것을 방지하는 수식이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;783&quot; data-origin-height=&quot;113&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bO0JVc/dJMb99S7p9K/y3AQNWQ3Yqk1JiGHKij1E1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bO0JVc/dJMb99S7p9K/y3AQNWQ3Yqk1JiGHKij1E1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bO0JVc/dJMb99S7p9K/y3AQNWQ3Yqk1JiGHKij1E1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbO0JVc%2FdJMb99S7p9K%2Fy3AQNWQ3Yqk1JiGHKij1E1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;58&quot; data-origin-width=&quot;783&quot; data-origin-height=&quot;113&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;※ Advantage :&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;답변들 사이의 상대적 품질을 학습시키는 수식이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;573&quot; data-origin-height=&quot;151&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cGn15F/dJMcagEJ2mD/MAJJuXEB4jXlYmcXZD0Ec0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cGn15F/dJMcagEJ2mD/MAJJuXEB4jXlYmcXZD0Ec0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cGn15F/dJMcagEJ2mD/MAJJuXEB4jXlYmcXZD0Ec0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcGn15F%2FdJMcagEJ2mD%2FMAJJuXEB4jXlYmcXZD0Ec0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;79&quot; data-origin-width=&quot;573&quot; data-origin-height=&quot;151&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;학습 설정&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;learning rate : 3e-6&lt;/li&gt;
&lt;li&gt;KL coefficient : 0.001&lt;/li&gt;
&lt;li&gt;sampling temperature : 1&lt;/li&gt;
&lt;li&gt;토큰 길이 제한 : 32,768 tokens &amp;rarr; 65,536 tokens&lt;/li&gt;
&lt;li&gt;각 질문마다 16개의 출력을 샘플링한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;학습 과정&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;총 10,400 training steps (약 1.6 training epochs)&lt;/li&gt;
&lt;li&gt;각 step마다 32개의 질문, batch size = 512&lt;/li&gt;
&lt;li&gt;400 step마다 reference model을 최신 policy로 업데이트한다.&lt;/li&gt;
&lt;li&gt;훈련을 가속하기 위해 각 rollout에서 8,192 outputs 생성하고, 이를 16 mini-batch로 나누어 학습한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Prompt Template&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;[추론 과정] - [최종 답] 을 생성하도록 요구한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre id=&quot;code_1773227310256&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;&amp;lt;think&amp;gt; reasoning process &amp;lt;/think&amp;gt;
&amp;lt;answer&amp;gt; final answer &amp;lt;/answer&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-end=&quot;2392&quot; data-start=&quot;2373&quot; data-section-id=&quot;197s6x9&quot; data-ke-size=&quot;size23&quot;&gt;2.2 Reward Design&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;rule-based reward&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Accuracy reward&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;: 모델의 최종 답이 정답인지 여부를 평가한다.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;Format reward : 모델이 정래진 출력 형식을 따르도록 유도한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;최종 Reward&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1773227614011&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;Reward_rule = Reward_acc + Reward_format&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Neural reward model을 사용하지 않는 이유&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;reward hacking&lt;/li&gt;
&lt;li&gt;추가 계산 비용&lt;/li&gt;
&lt;li&gt;훈련 파이프라인 복잡도 증가&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-end=&quot;3318&quot; data-start=&quot;3291&quot; data-section-id=&quot;1aslcw8&quot; data-ke-size=&quot;size23&quot;&gt;2.3.&amp;nbsp;Incentivize&amp;nbsp;Reasoning&amp;nbsp;Capability&amp;nbsp;in&amp;nbsp;LLMs&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;학습 방식&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DeepSeek-R1-Zero는 DeepSeek-V3 base 모델 위에서 강화학습(RL)을 적용하여 학습된다.&lt;/li&gt;
&lt;li&gt;모델의 추론 능력을 자연스럽게 관찰하기 위해 제약을 최소화한다. (reasoning 구조는 요구하되, 특정 reasoning 전략은 강제하지 않고, 내용에 대한 추가적인 규칙도 두지 않는다.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;성능 결과&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1259&quot; data-origin-height=&quot;505&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bIBEom/dJMcaiJgelo/5RHtNpO4WCLPQKLvlzQYW1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bIBEom/dJMcaiJgelo/5RHtNpO4WCLPQKLvlzQYW1/img.png&quot; data-alt=&quot;Frgure 1&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bIBEom/dJMcaiJgelo/5RHtNpO4WCLPQKLvlzQYW1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbIBEom%2FdJMcaiJgelo%2F5RHtNpO4WCLPQKLvlzQYW1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1259&quot; height=&quot;505&quot; data-origin-width=&quot;1259&quot; data-origin-height=&quot;505&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Frgure 1&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;RL 학습을 통해 AIME 2024 benchmark 성능이 15.6%에서 77.9%까지 크게 향상되었다.&lt;/li&gt;
&lt;li&gt;self-consistency decoding을 적용하면 성능은 86.7%까지 향상되며, 이는 AIME 인산 평균 성능을 크게 초과하는 결과이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;학습 진행에 따른 특징적인 변화&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Thinking Time의 증가 : 훈련이 진행될수록 모델의 응답 길이가 점차 증가되며, 이는 모델이 더 많은 추론 과정을 탐색하고 있음을 의미한다.&lt;/li&gt;
&lt;li&gt;고급 reasoning 전략 : 학습 과정에서 아래와 같은 고급 reasoning 전략을 자연스럽게 학습한다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Reflective reasoning : 이전 reasoning을 다시 검토한다.&lt;/li&gt;
&lt;li&gt;Alternative solution exploration : 여러 해결 방법을 탐색한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Aha Moment : 모델이 자신의 reasoning을 다시 평가하고 수정하는 행동을 보이며, 연구에서는 이를 RL 기반 self-evolution 과정의 증거로 본다.&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1192&quot; data-origin-height=&quot;691&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nED42/dJMcaio0IeY/3Lf7ErRGAkxgCBYiis1TCK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nED42/dJMcaio0IeY/3Lf7ErRGAkxgCBYiis1TCK/img.png&quot; data-alt=&quot;Table 2 : Aha moment를 확인해볼 수 있다.&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nED42/dJMcaio0IeY/3Lf7ErRGAkxgCBYiis1TCK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnED42%2FdJMcaio0IeY%2F3Lf7ErRGAkxgCBYiis1TCK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;406&quot; data-origin-width=&quot;1192&quot; data-origin-height=&quot;691&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 2 : Aha moment를 확인해볼 수 있다.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;DeepSeek-R1-Zero의 의의&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;RL은 명시적인 reasoning instruction 없이도, 모델이 자율적으로 문제 해결 전략을 발전시킬 수 있도록 한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;3. DeepSeek-R1&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;DeepSeek-R1-Zero의 한계&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;가독성이 낮다.&lt;/li&gt;
&lt;li&gt;언어 혼합 문제가 발생한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;DeepSeek-R1의 파이프라인&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1224&quot; data-origin-height=&quot;596&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ckgeFw/dJMcadA9pxY/muYuwlbQM6UJvQFM6j09Ek/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ckgeFw/dJMcadA9pxY/muYuwlbQM6UJvQFM6j09Ek/img.png&quot; data-alt=&quot;Figure 2&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ckgeFw/dJMcadA9pxY/muYuwlbQM6UJvQFM6j09Ek/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FckgeFw%2FdJMcadA9pxY%2FmuYuwlbQM6UJvQFM6j09Ek%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;341&quot; data-origin-width=&quot;1224&quot; data-origin-height=&quot;596&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Figure 2&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;Cold-Start Data 수집 :
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-section-id=&quot;1iwjckz&quot; data-start=&quot;558&quot; data-end=&quot;591&quot;&gt;conversational thinking process이면서&lt;span&gt;&amp;nbsp;&lt;/span&gt;human-like reasoning을 가진 데이터를 수집한다.&lt;/li&gt;
&lt;li data-section-id=&quot;1iwjckz&quot; data-start=&quot;558&quot; data-end=&quot;591&quot;&gt;▶ 모델이 초기 reasoning 스타일을 안정적으로 시작하도록 한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;첫 번째 RL : &lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-section-id=&quot;1iwjckz&quot; data-start=&quot;558&quot; data-end=&quot;591&quot;&gt;알고리즘 : GRPO&lt;/li&gt;
&lt;li data-section-id=&quot;1iwjckz&quot; data-start=&quot;558&quot; data-end=&quot;591&quot;&gt;추가 reward : Language consistency reward&lt;br /&gt;&amp;rarr; target language 비율이 높을수록 reward 증가하는 구조이다.&lt;/li&gt;
&lt;li data-section-id=&quot;1iwjckz&quot; data-start=&quot;558&quot; data-end=&quot;591&quot;&gt;▶ RL을 통해 reasoning 능력과 언어 일관성을 개선하고, language mixing 문제를 해결한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Rejection Sampling + SFT :
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-section-id=&quot;1iwjckz&quot; data-start=&quot;558&quot; data-end=&quot;591&quot;&gt;rejection sampling을 거친 후, SFT를 수행한다. (reasoning data와 non-reasoning 데이터 모두 사용)&lt;/li&gt;
&lt;li data-section-id=&quot;1iwjckz&quot; data-start=&quot;558&quot; data-end=&quot;591&quot;&gt;▶ reasoning 성능을 유지하면서, writing 능력을 향상시킨다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;두 번째 RL :
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;human preference alignment를 수행한다.&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;▶&lt;span&gt; helpfulness, harmlessness, reasoning refinement의 안정적인 성능을 기대한다.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3.1 Model-based Rewards&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Heplful Reward Model&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;답변이 얼마나 유용한가 (66,000 pair)
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;각 response에 대해 score를 계산한다.&lt;/li&gt;
&lt;li&gt;모델이 더 좋은 응답에 더 높은 score를 주도록 학습한다.&lt;/li&gt;
&lt;li&gt;최종적으로 어느 응답이 더 좋은지 판단한다.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Safety Reward Model&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;답변이 안전한가 (106,000 prompt)
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;모델이 응답 하나에 대해 안전 점수를 계산한다.&lt;/li&gt;
&lt;li&gt;각 응답에 대해 label을 붙여 binary classification를 진행한다.&lt;/li&gt;
&lt;li&gt;최종적으로 응답이 얼마나 안전한지 score로 평가한다.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3.2.&amp;nbsp;Training&amp;nbsp;Details&lt;/h3&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.2.1.&amp;nbsp;Training&amp;nbsp;Details&amp;nbsp;of&amp;nbsp;the&amp;nbsp;First&amp;nbsp;RL&amp;nbsp;Stage&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;목적&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;168&quot; data-start=&quot;154&quot; data-section-id=&quot;n6ca59&quot;&gt;추론 능력 강화&lt;/li&gt;
&lt;li data-end=&quot;184&quot; data-start=&quot;169&quot; data-section-id=&quot;1oi2lh1&quot;&gt;언어 일관성 개선&lt;/li&gt;
&lt;li data-end=&quot;210&quot; data-start=&quot;185&quot; data-section-id=&quot;lbslsw&quot;&gt;모델의 reasoning 품질 향상&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;학습 설정&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;알고리즘 : GRPO&lt;/li&gt;
&lt;li&gt;주요 하이퍼파라미터
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;305&quot; data-start=&quot;280&quot; data-section-id=&quot;k3kx49&quot;&gt;learning rate: 3e-6&lt;/li&gt;
&lt;li data-end=&quot;333&quot; data-start=&quot;306&quot; data-section-id=&quot;10bb7nz&quot;&gt;KL coefficient: 0.001&lt;/li&gt;
&lt;li data-end=&quot;365&quot; data-start=&quot;334&quot; data-section-id=&quot;132fe6y&quot;&gt;GRPO clip ratio &lt;span&gt;&lt;span&gt;&amp;epsilon;&amp;epsilon;&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;epsilon;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;: 10&lt;/li&gt;
&lt;li data-end=&quot;395&quot; data-start=&quot;366&quot; data-section-id=&quot;ygotn0&quot;&gt;sampling temperature: 1&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;각 step마다 32개의 질문, batch size = 512&lt;/li&gt;
&lt;li&gt;400 step마다 reference model을 최신 policy로 업데이트한다.&lt;/li&gt;
&lt;li&gt;훈련을 가속하기 위해 각 rollout에서 8,192 outputs 생성하고, 이를 16 mini-batch로 나누어 학습한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Language Consistency Reward&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;language mixing 문제를 줄이기 위해 language consistency reward를 추가한다. &lt;br /&gt;(reasoning data와 non-reasoning 데이터 모두 사용)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;691&quot; data-origin-height=&quot;142&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/U7q0U/dJMcaaYM9vs/EKKJTuOwAaGVLDFylEzdv1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/U7q0U/dJMcaaYM9vs/EKKJTuOwAaGVLDFylEzdv1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/U7q0U/dJMcaaYM9vs/EKKJTuOwAaGVLDFylEzdv1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FU7q0U%2FdJMcaaYM9vs%2FEKKJTuOwAaGVLDFylEzdv1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;62&quot; data-origin-width=&quot;691&quot; data-origin-height=&quot;142&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.2.2. Training Details of the Second RL Stage&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;목적&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1152&quot; data-start=&quot;1138&quot; data-section-id=&quot;tfgbdo&quot;&gt;추론 능력 유지&lt;/li&gt;
&lt;li data-end=&quot;1173&quot; data-start=&quot;1153&quot; data-section-id=&quot;1amdsix&quot;&gt;helpfulness 향상&lt;/li&gt;
&lt;li data-end=&quot;1189&quot; data-start=&quot;1174&quot; data-section-id=&quot;wv0w6o&quot;&gt;safety 향상&lt;/li&gt;
&lt;li data-end=&quot;1222&quot; data-start=&quot;1190&quot; data-section-id=&quot;qvq6kz&quot;&gt;human preference alignment&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;학습 데이터&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Reasoning Data
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;reward : rule-based reward&lt;/li&gt;
&lt;li&gt;대상 : math, coding, logical reasoning&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;General Data
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;reeard : reward model 기반 reward&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;최종 reward&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1773231229159&quot; class=&quot;ini&quot; data-ke-type=&quot;codeblock&quot; data-ke-language=&quot;bash&quot;&gt;&lt;code&gt;Reward = Reward_reasoning + Reward_general + Reward_language&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;학습 설정&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;temperature = 0.7 제외하고는 first RL stage와 동일하다.&lt;/li&gt;
&lt;li&gt;총 1700 training steps, 마지막 400 stps에는 instruction data, preference reward 적용한다.&lt;br /&gt;&amp;rarr; reward model 기반 학습을 오해 하면, reward hacking이 발생하기 때문에 preference reward는 마지막 단계에서만 적용한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;4. Experiment&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1079&quot; data-origin-height=&quot;920&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Mxmva/dJMcaiWNunj/zOOab3Kr49YsjTSfbocQJ1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Mxmva/dJMcaiWNunj/zOOab3Kr49YsjTSfbocQJ1/img.png&quot; data-alt=&quot;Table 3&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Mxmva/dJMcaiWNunj/zOOab3Kr49YsjTSfbocQJ1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FMxmva%2FdJMcaiWNunj%2FzOOab3Kr49YsjTSfbocQJ1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;597&quot; data-origin-width=&quot;1079&quot; data-origin-height=&quot;920&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table 3&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;실험 결과&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;RL은 reasoning 능력을 크게 향상시킨다.&lt;/li&gt;
&lt;li&gt;RL만으로는 general task가 부족하다.&lt;/li&gt;
&lt;li&gt;일반 데이터 SFT가 필요하다.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;5.&amp;nbsp;Ethics&amp;nbsp;and&amp;nbsp;Safety&amp;nbsp;Statement&lt;/h2&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;6. Conclusion, Limitation, and Future Work&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;의의&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;인간 CoT 데이터 없이도 reasoning을 학습할 수 있다.&lt;/li&gt;
&lt;li&gt;reasoning 능력은 RL로 크게 향상된다.&lt;/li&gt;
&lt;li&gt;reasoning을 위한 핵심은 human data가 아니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;한계&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;모델의 한계
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Structured Output 및 Tool 사용 부족&lt;/li&gt;
&lt;li&gt;Token Efficiency 문제&lt;/li&gt;
&lt;li&gt;Language Mixing&lt;/li&gt;
&lt;li&gt;Prompt 민감성&lt;/li&gt;
&lt;li&gt;Software Engineering Task 성능&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;RL 방법론 자체의 한계
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Reward Hacking&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;향후 연구 방향&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;더 강력한 Reward Model&lt;/li&gt;
&lt;li&gt;Tool-augmented Reasoning&lt;/li&gt;
&lt;li&gt;Verifiable Task 중심 RL 확장&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/19</guid>
      <comments>https://yuha933.tistory.com/19#entry19comment</comments>
      <pubDate>Wed, 11 Mar 2026 21:43:11 +0900</pubDate>
    </item>
    <item>
      <title>[DPO] 논문 리뷰</title>
      <link>https://yuha933.tistory.com/17</link>
      <description>&lt;h2 data-ke-size=&quot;size26&quot;&gt;1. Introduction&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 LLM의 문제 : 방대한 데이터로 학습해서 뛰어난 능력을 가지지만, 그 데이터에는 원치 않는 행동이나 잘못된 정보도 섞여 있을 수 있음&lt;br /&gt;&amp;rarr; 모든 걸 그대로 쓰게 하면 위험할 수 있기 때문에, 안전하고 유용한 방식으로만 반응하도록 조정해야 함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 문제 해결 방법 : PPO기반 RLHF
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;RLHF : 인간 선호도 데이터셋을 보상 모델에 적합시키고, PPO와 같은 RL을 사용하여 원래 모델에서 과도하게 벗어나지 않으면서 높은 보상을 할당받는 응답을 생성하도록 언어 모델 정책을 최적화시키는 방법&lt;/li&gt;
&lt;li&gt;문제 : 지도 학습보다 훨씬 복잡하며, 상당한 계산 비용이 발생함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;715&quot; data-origin-height=&quot;152&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bKK17E/btsPSr60unS/CyhR7zOjtuej5vDGiMUsmK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bKK17E/btsPSr60unS/CyhR7zOjtuej5vDGiMUsmK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bKK17E/btsPSr60unS/CyhR7zOjtuej5vDGiMUsmK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbKK17E%2FbtsPSr60unS%2FCyhR7zOjtuej5vDGiMUsmK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;715&quot; height=&quot;152&quot; data-origin-width=&quot;715&quot; data-origin-height=&quot;152&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;본 논문의 아이디어 : DPO
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO : 보상 모델링이나 강화 학습을 없애고 인간 선호도에 부합하도록 언어 모델을 직접 최적화하는 방법&amp;nbsp;&lt;/li&gt;
&lt;li&gt;방법 : 인간 선호쌍에서 도출된 상대적 로그 확률 차이를 기준 모델과의 KL 제약 하에 직접 최적화하여, 명시적 보상 모델이나 강화학습 없이 선호 정렬을 수행하는 방법 &lt;br /&gt;&amp;rarr; 학습 과정이 단순하고 안정적이며, RLHF와 비슷한 성능을 유지함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;2. Related Work&lt;/h2&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;3. Preliminaries&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Ziegler et al. 이후의 RLHF 파이프라인
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;지도학습 미세조정 (SFT)&lt;/li&gt;
&lt;li&gt;선호도 샘플링 및 보상모델 학습&lt;/li&gt;
&lt;li&gt;RL 기반 최적화&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;SFT (Supervised Fine-Tuning) 단계&lt;/b&gt;&lt;br /&gt;: 다운스트림 작업에 대한 고품질 데이터에 대한 지도 학습으로 사전 훈련된 LM을 미세 조정하여 모델 &amp;pi;SFT를 얻는 단계&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Reward Modeling Phase 단계&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;응답 쌍 생성&lt;br /&gt;: SFT 모델 &amp;pi;SFT(y|x)에 프롬프트 x를 넣어 두 개의 답변 (y1, y2) 를 생성함&lt;/li&gt;
&lt;li&gt;인간 선호 수집&lt;br /&gt;: 두 답변을 사람에게 보여주고, 더 좋은 답변 y&lt;span style=&quot;color: #333333; text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;w&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; 과 덜 좋은 답변 y&lt;span style=&quot;color: #333333; text-align: left;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; 을 선택하게 함 ( &lt;span&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;w&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;≻&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;x )&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;span&gt;보상 함수 가정&lt;br /&gt;: 선호는 우리가 직접 접근할 수 없는 잠재적 보상 모델 r*(x,y)에서 나온다고 가정함&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;span&gt;선호 모델링&lt;br /&gt;: Bradley-Terry 모델&lt;br /&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;p&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;lowast;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;( &lt;/span&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;≻&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;∣&lt;/span&gt;&lt;span&gt;x &lt;/span&gt;&lt;span&gt;) &lt;/span&gt;&lt;span&gt;= exp(&lt;span style=&quot;color: #333333; text-align: left;&quot;&gt;&lt;span&gt;r&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;lowast;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;(x,&lt;span style=&quot;color: #333333; text-align: left;&quot;&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;)) / &lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;exp&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;&lt;span&gt;r&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;lowast;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;))&lt;/span&gt;&lt;span&gt;+&lt;/span&gt;&lt;span&gt;exp&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;&lt;span&gt;r&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;lowast;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;보상 모델 학습&lt;br /&gt;: ​ 비교 데이터셋 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;D&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 사용해 매개변수화된 보상 모델 &lt;span&gt;&lt;span&gt;rϕ(x,y)&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;을 학습함&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;(이진 분류 형태 손실)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;span&gt;구현&lt;br /&gt;: SFT 모델의 마지막 Transformer 레이어 위에 스칼라 출력 레이어를 얹어서 &lt;span style=&quot;color: #333333; text-align: left;&quot;&gt;&lt;span&gt;r&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;ϕ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;(x,y) 생성함&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;RL Fine-Tuning Phase 단계&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span&gt;&lt;span&gt; 학습된 보상 모델 &lt;span&gt;&lt;span&gt;rϕ(x,y) &lt;/span&gt;&lt;/span&gt;이 준 점수를 기준으로, 언어 모델 정책 &lt;span&gt;&lt;span&gt;&amp;pi;&amp;theta;&lt;/span&gt;&lt;/span&gt;가 더 높은 보상을 받는 응답을 생성하도록 조정하는 단계&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;최적화 과정에서 KL 제약을 넣는 이유&lt;br /&gt;: 점수만 올리려다 보면 한 가지 답만 반복하는 편향이 생길 수 있어서, 다양성을 유지하고 안정성을 보장하기 위함&lt;/li&gt;
&lt;li&gt;구현&lt;br /&gt;: PPO라는 강화학습 기법으로 최적화함&lt;/li&gt;
&lt;li&gt;최적화 수식&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;97&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/UnN0W/btsPRyZTQhg/mXvkfOB2gGKkkfZKeKetr1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/UnN0W/btsPRyZTQhg/mXvkfOB2gGKkkfZKeKetr1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/UnN0W/btsPRyZTQhg/mXvkfOB2gGKkkfZKeKetr1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FUnN0W%2FbtsPRyZTQhg%2FmXvkfOB2gGKkkfZKeKetr1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;69&quot; data-origin-width=&quot;842&quot; data-origin-height=&quot;97&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;4. Direct Preference Optimization (DPO)&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목표 : 선호도를&amp;nbsp;직접&amp;nbsp;사용하여&amp;nbsp;정책&amp;nbsp;최적화를&amp;nbsp;위한&amp;nbsp;간단한&amp;nbsp;접근&amp;nbsp;방식을&amp;nbsp;도출하는&amp;nbsp;것&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Deriving&amp;nbsp;the&amp;nbsp;DPO&amp;nbsp;objective&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 RLHF 방식에서의 최적 정책 형태&lt;br /&gt;&amp;rarr; &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;Z&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;) 즉, &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;분할 함수는 가능한 모든 답변을 합쳐서 정규화하는 값이기 때문에 계산이 매우 비쌈&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;640&quot; data-origin-height=&quot;124&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bZAnF4/btsPRsZ6AiU/woM8jYcnrKSRLhMI0zva1K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bZAnF4/btsPRsZ6AiU/woM8jYcnrKSRLhMI0zva1K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bZAnF4/btsPRsZ6AiU/woM8jYcnrKSRLhMI0zva1K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbZAnF4%2FbtsPRsZ6AiU%2FwoM8jYcnrKSRLhMI0zva1K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;450&quot; height=&quot;87&quot; data-origin-width=&quot;640&quot; data-origin-height=&quot;124&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO의 해결 방법&lt;br /&gt;: 최적 정책 식을 로그 변환해서 단순화함&lt;br /&gt;&amp;rarr; Bradley&amp;ndash;Terry 모델은 보상 차이만 사용하므로 &lt;span&gt;&lt;span&gt;log⁡Z(x)\log Z(x)&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span&gt;Z&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; 항이 자동으로 사라지기 때문에 분할 함수 계산과 보상 모델 학습을 건너뛸 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;617&quot; data-origin-height=&quot;127&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/8EECW/btsPR6oH5LQ/Lnr8hV8pVg2sdbcRRZqeJ1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/8EECW/btsPR6oH5LQ/Lnr8hV8pVg2sdbcRRZqeJ1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/8EECW/btsPR6oH5LQ/Lnr8hV8pVg2sdbcRRZqeJ1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F8EECW%2FbtsPR6oH5LQ%2FLnr8hV8pVg2sdbcRRZqeJ1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;450&quot; height=&quot;93&quot; data-origin-width=&quot;617&quot; data-origin-height=&quot;127&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;선호 확률 표현&lt;br /&gt;: 좋은 답(&lt;span&gt;&lt;span&gt;yw&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;)과 나쁜 답(&lt;span&gt;&lt;span&gt;yl&lt;/span&gt;&lt;/span&gt;)의 확률 차이를 참조 모델과 비교해서 계산하는 과정&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;872&quot; data-origin-height=&quot;121&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/y0q39/btsPSEL2c71/Hlbe7W7k52m4T2VkzgTyZK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/y0q39/btsPSEL2c71/Hlbe7W7k52m4T2VkzgTyZK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/y0q39/btsPSEL2c71/Hlbe7W7k52m4T2VkzgTyZK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fy0q39%2FbtsPSEL2c71%2FHlbe7W7k52m4T2VkzgTyZK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;83&quot; data-origin-width=&quot;872&quot; data-origin-height=&quot;121&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;최종 DPO 손실&lt;br /&gt;: 보상 모델 없이, 참조 모델 대비 좋은 답 확률은 올리고 나쁜 답 확률은 내리는 방향으로 학습함&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1111&quot; data-origin-height=&quot;116&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/w99sD/btsPR6PKpk6/SKdCHL83MJdPkk0fmKA6hk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/w99sD/btsPR6PKpk6/SKdCHL83MJdPkk0fmKA6hk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/w99sD/btsPR6PKpk6/SKdCHL83MJdPkk0fmKA6hk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fw99sD%2FbtsPR6PKpk6%2FSKdCHL83MJdPkk0fmKA6hk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;850&quot; height=&quot;89&quot; data-origin-width=&quot;1111&quot; data-origin-height=&quot;116&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO 모델의 장점&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;보상 모델이 불필요하기 때문에&amp;nbsp;파이프라인 단순함&lt;/li&gt;
&lt;li&gt;강화학습 루프가 불필요하기 때문에 PPO 등 복잡한 최적화가 생략됨&lt;/li&gt;
&lt;li&gt;분할 함수 계산이 없어 속도는 빠르고 비용은 줄어듦&lt;/li&gt;
&lt;li&gt;KL 제약이 암묵적으로 들어 있어 안정성을 유지함&lt;/li&gt;
&lt;li&gt;기존 RLHF와 비슷한 성능을 훨씬 간단하게 달성함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;What does the DPO update do?&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;모델이 어떻게 업데이트될지 (gradient 방향)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;좋은 답 yw &amp;rarr; 확률 &amp;uarr;&lt;/li&gt;
&lt;li&gt;나쁜 답 yl &amp;rarr; 확률 &amp;darr;&lt;/li&gt;
&lt;li&gt;가중치 : 현재 모델이 나쁜 답을 잘못 높게 평가할수록 더 강하게 업데이트 ( &lt;span&gt;&lt;span&gt;&amp;beta;&lt;/span&gt;&lt;/span&gt;로 조정 )&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1431&quot; data-origin-height=&quot;158&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c5lD6U/btsPQvJw96b/WEr1NBle73kQREJwcEYH11/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c5lD6U/btsPQvJw96b/WEr1NBle73kQREJwcEYH11/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c5lD6U/btsPQvJw96b/WEr1NBle73kQREJwcEYH11/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc5lD6U%2FbtsPQvJw96b%2FWEr1NBle73kQREJwcEYH11%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1431&quot; height=&quot;158&quot; data-origin-width=&quot;1431&quot; data-origin-height=&quot;158&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;현재 모델이 참조 모델 대비 해당 답을 얼마나 선호하는지 나타내는 암묵적 보상 값&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;365&quot; data-origin-height=&quot;82&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/uSWKP/btsPSs597pK/jcZ2eKaahk2UxWK5ui0fK1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/uSWKP/btsPSs597pK/jcZ2eKaahk2UxWK5ui0fK1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/uSWKP/btsPSs597pK/jcZ2eKaahk2UxWK5ui0fK1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FuSWKP%2FbtsPSs597pK%2FjcZ2eKaahk2UxWK5ui0fK1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;250&quot; height=&quot;56&quot; data-origin-width=&quot;365&quot; data-origin-height=&quot;82&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;DPO&amp;nbsp;outline&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;Data : 참조 모델이 생성한 두 개의 응답을 인간이 비교해 더 나은 것과 덜 나은 것을 라벨링하여 선호도 데이터셋 &lt;span&gt;&lt;span&gt;D&lt;/span&gt;&lt;/span&gt;를 만듦&lt;/li&gt;
&lt;li&gt;Training : 주어진 참조 모델 &lt;span&gt;&lt;span&gt;&amp;pi;_ref&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, 데이터셋 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;D&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, 하이퍼파라미터 &lt;span&gt;&lt;span&gt;&amp;beta;&lt;/span&gt;&lt;/span&gt;를 사용해 모델 &lt;span&gt;&lt;span&gt;&amp;pi;&amp;theta;&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;가 DPO 손실 &lt;span&gt;&lt;span&gt;L_DPO&lt;/span&gt;&lt;/span&gt;를 최소화하도록 학습함&lt;/li&gt;
&lt;/ol&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;직접 데이터 수집 대신 공개 선호도 데이터를 재사용하고, 참조 모델은 가능하면 SFT 모델을 사용하며, 없을 경우 yw 응답 확률을 최대화하는 방식으로 초기화함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;5. Theoretical&amp;nbsp;Analysis&amp;nbsp;of&amp;nbsp;DPO&lt;/h2&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;5.1 Your&amp;nbsp;Language&amp;nbsp;Model&amp;nbsp;Is&amp;nbsp;Secretly&amp;nbsp;a&amp;nbsp;Reward&amp;nbsp;Model&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 방식&lt;br /&gt;: 보통 보상 모델 &lt;span&gt;&lt;span&gt;rϕ(x,y)&lt;/span&gt;&lt;/span&gt;을 먼저 학습한 뒤, 이 보상을 기준으로 정책 &lt;span&gt;&lt;span&gt;&amp;pi;&amp;theta;&lt;/span&gt;&lt;/span&gt;를 강화학습(PPO 등)으로 최적화함&lt;/li&gt;
&lt;li&gt;Definition 1 (보상 함수의 동치성) &lt;br /&gt;: 두 보상 함수 &lt;span&gt;&lt;span&gt;r(x,y)r(x, y)&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;r&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;와 &lt;span&gt;&lt;span&gt;r&amp;prime;(x,y)r'(x, y)&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;r&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;prime;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;y&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;가 다음을 만족하면 동등(equivalent)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;749&quot; data-origin-height=&quot;88&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bfuufn/btsPSrl77pl/ZCAJnk9IKbrkw9zA2GkKu0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bfuufn/btsPSrl77pl/ZCAJnk9IKbrkw9zA2GkKu0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bfuufn/btsPSrl77pl/ZCAJnk9IKbrkw9zA2GkKu0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbfuufn%2FbtsPSrl77pl%2FZCAJnk9IKbrkw9zA2GkKu0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;47&quot; data-origin-width=&quot;749&quot; data-origin-height=&quot;88&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Lemma 1&lt;br /&gt;: Plackett-Luce/Bradley-Terry 모델에서 같은 동치 클래스의 보상 함수는 같은 선호도 분포를 만든다. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Lemma 2&lt;br /&gt;: 같은 동치 클래스의 보상 함수는 같은 최적 정책을 만든다. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;기존 방식의 문제점&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;: 보상 모델 학습과 RL 최적화가 별도 단계로 존재하기 때문에, 구현이 복잡하고, 시간이 많이 들 뿐만 아니라, 여러 보상 함수 중 하나를 선택하는 과정이 불필요하게 복잡함&lt;/li&gt;
&lt;li&gt;Theorem 1 (핵심 재매개변수화) &lt;br /&gt;: 약한 가정 하에, 모든 동치 클래스 보상 함수는 다음 형태로 표현이 가능함&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;441&quot; data-origin-height=&quot;124&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bOvqCr/btsPP4yVtTT/dOKiC8d73hsoRKj8SuKxx1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bOvqCr/btsPP4yVtTT/dOKiC8d73hsoRKj8SuKxx1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bOvqCr/btsPP4yVtTT/dOKiC8d73hsoRKj8SuKxx1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbOvqCr%2FbtsPP4yVtTT%2FdOKiC8d73hsoRKj8SuKxx1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;250&quot; height=&quot;70&quot; data-origin-width=&quot;441&quot; data-origin-height=&quot;124&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO의 해결&lt;br /&gt;: 보상 모델과 정책 학습을 하나의 MLE(최대 가능도) 최적화로 합침 &lt;br /&gt;&amp;rarr; 보상은 참조 정책 대비 현재 정책의 확률 비율로 표현됨&lt;br /&gt;&amp;rarr; 모든 동치 보상 클래스를 표현할 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1000&quot; data-origin-height=&quot;134&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bvtlov/btsPQtkYk7U/ruPhCMnLhSqMp7Yg4k4ucK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bvtlov/btsPQtkYk7U/ruPhCMnLhSqMp7Yg4k4ucK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bvtlov/btsPQtkYk7U/ruPhCMnLhSqMp7Yg4k4ucK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbvtlov%2FbtsPQtkYk7U%2FruPhCMnLhSqMp7Yg4k4ucK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;550&quot; height=&quot;74&quot; data-origin-width=&quot;1000&quot; data-origin-height=&quot;134&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;장점&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;별도의 보상 모델 학습이 불필요함&lt;/li&gt;
&lt;li&gt;참조 정책만 있으면 바로 정책 업데이트가 가능함&lt;/li&gt;
&lt;li&gt;최적 정책을 정확히 복구하면서도 구현이 간단해짐&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;5.2 Instability of Actor-Critic Algorithms&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 방식&lt;br /&gt;: 보상 함수로 만든 최적 정책과 현재 정책의 KL 거리 최소화를 목표로 함&lt;/li&gt;
&lt;li&gt;기존 방식의 문제점&lt;br /&gt;: 보상 정규화 항이 없으면 기울기 분산이 커져 학습 불안정해짐&lt;/li&gt;
&lt;li&gt;DPO의 해결&lt;br /&gt;: 재매개변수화로 정규화된 보상을 직접 계산해 기준선을 불필요하게 함&lt;/li&gt;
&lt;li&gt;장점
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;PPO보다 안정적인 학습이 가능함&lt;/li&gt;
&lt;li&gt;기준선/정규화 과정 제거하여 구현이 간단하고 오차가 감소함&lt;/li&gt;
&lt;li&gt;RL 단계 없이도 RLHF와 같은 효과를 냄&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;6. Experiments&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO가 선호도 학습에서 보상 최대화와 참조 정책과의 KL-divergence 최소화를 PPO 등 기존 알고리즘보다 효율적으로 달성하는지 평가하고자 함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Task&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Controlled Sentiment Generation&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;x : IMDb 영화 리뷰 접두사&lt;/li&gt;
&lt;li&gt;정책 : 긍정적인 감정을 담은 &lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;/span&gt;&amp;nbsp;생성&lt;/li&gt;
&lt;li&gt;사전 훈련된 감정 분류기로 평가 ( p(positive|x, yw) &amp;gt; p(positive|x, yl) )&lt;/li&gt;
&lt;li&gt;SFT: IMDb 훈련 데이터를 사용해 GPT-2-large를 수렴까지 미세조정함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Summarization
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;x : Reddit 포럼 게시물&lt;/li&gt;
&lt;li&gt;정책 : 주요 요점을 요약한 y 생성&lt;/li&gt;
&lt;li&gt;데이터셋 : Reddit TL;DR + Stiennon et al.의 인간 선호도 데이터&lt;/li&gt;
&lt;li&gt;SFT : TRLX 프레임워크로 인간 작성 요약에 미세조정된 모델을 사용함&lt;/li&gt;
&lt;li&gt;인간 선호도는 다른 SFT 모델 샘플에서 수집함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Single-turn Dialogue
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;x : 다양한 인간 쿼리&lt;/li&gt;
&lt;li&gt;정책 : 매력적이고 유용한 응답 &lt;span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; 생성&lt;/li&gt;
&lt;li&gt;데이터셋 : Anthropic Helpful and Harmless 대화 데이터&lt;/li&gt;
&lt;li&gt;SFT : 선호된 응답에 대해서만 기성 언어 모델을 미세조정하여 생성함&lt;/li&gt;
&lt;li&gt;각 대화 말미에 두 개의 LLM 응답과 인간의 선호 레이블 포함함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Evaluation&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;두 가지 평가 접근 방식
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;Controlled Sentiment Generation &lt;br /&gt;: 각 알고리즘이 보상 최대화와 레퍼런스 정책과의 KL-divergence 최소화를 얼마나 잘 균형 잡는지 평가&lt;br /&gt;&amp;rarr; 감정 분류기라는 기본 진리 보상 함수를 알고 있기 때문에 프론티어를 계산할 수 있음&lt;/li&gt;
&lt;li&gt;Summarization &amp;amp; Single-turn Dialogue &lt;br /&gt;&amp;rarr; &amp;nbsp;실제 환경에서는 기본 진리 보상 함수가 없으므로, GPT-4를 인간 평가의 프록시로 사용함&lt;br /&gt;(요약 baseline : 테스트 세트 레퍼런스 요약 / 대화 baseline : 선호 응답)&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;GPT-4 사용 근거&lt;br /&gt;: GPT-4에 대한 인간의 동의 수준이 인간 주석가 간의 동의 수준과 유사하거나 더 높음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Methods&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Prompting 기반 접근
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;요약 : GPT-J를 사용한 zero-shot prompting&lt;/li&gt;
&lt;li&gt;대화 : Pythia-2.8B를 사용한 2-shot prompting&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Supervised Fine-Tuning 기반 접근
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;SFT : 제어된 감성/요약 작접에서는 SFT 모델 사용&lt;/li&gt;
&lt;li&gt;Preferred-FT : 대화 작업에서는 일반 LM에 대해, 선호된 completion yw에 지도학습으로 파인튜닝&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Pseudo-supervised 방법 (Unlikelihood)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;yw의 확률은 최대화, yl의 확률은 최소화하는 방식 (unlikelihood 항에 선택적 계수 &amp;alpha; 사용)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;RL 기반 접근
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;PPO : 선호도 데이터에서 학습된 보상 함수 사용&lt;/li&gt;
&lt;li&gt;PPO-GT (Oracle) : 제어된 감성에서 사용 가능한 ground truth 보상 함수 사용&lt;br /&gt;(ver1 : 기본 버전 / ver2 : 보상 정규화 및 하이퍼파라미터 추가 조정으로 성능 개선)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Best of N&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;SFT 모델에서 N개의 응답 샘플을 뽑아 보상 모델 점수가 가장 높은 응답 선택&lt;br /&gt;&amp;rarr; 장점 : PPO 최적화와 보상 모델 품질을 분리할 수 있음&lt;br /&gt;&amp;rarr; 단점 : 테스트 시 모든 쿼리에 대해 N개를 생성하기 때문에 계산량이 매우 큼&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;6.1 How well can DPO optimize the RLHF objective?&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 목표&lt;br /&gt;: RLHF에서는 보상 극대화와 참조 정책과의 차이(KL-divergence) 최소화를 동시에 달성해야 하는데, DPO가 이 두 가지를 얼마나 효율적으로 균형 잡을 수 있는지를 검증하고자 함&lt;/li&gt;
&lt;li&gt;아이디어&lt;br /&gt;: 보상과 KL 사이의 관계를 보상-KL 프론티어로 비교하면서 , 이 프론티어가 낮은 KL에서 높은 보상을 달성할 수 있는지 실험&lt;/li&gt;
&lt;li&gt;실험 방식&lt;br /&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;task : Controlled Sentiment Generation&lt;/li&gt;
&lt;li&gt;비교 대상 : DPO, PPO, PPO-GT(정답 보상 사용), Unlikelihood, Preferred-FT 등&lt;/li&gt;
&lt;li&gt;총 22번 학습 실행, 각 실행에서 매 100스텝마다 테스트 프롬프트로 평가함&lt;br /&gt;&amp;rarr; 평균 보상과 평균 KL 계산함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;실험 결과
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO가 모든 구간에서 낮은 KL을 유지하면서 가장 높은 보상 달성함&lt;/li&gt;
&lt;li&gt;PPO와 같은 목표를 쓰지만 효율이 훨씬 높으며, 심지어 PPO가 ground truth 보상(PPO-GT)을 사용할 때보다도 더 좋은 프론티어를 형성함을 보임&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;6.2 Can&amp;nbsp;DPO&amp;nbsp;scale&amp;nbsp;to&amp;nbsp;real&amp;nbsp;preference&amp;nbsp;datasets?&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 목표&lt;br /&gt;: DPO가 실제 인간 선호 데이터가 포함된 다양한 작업에서도 높은 성능을 유지할 수 있는지 검증하고자 함&lt;/li&gt;
&lt;li&gt;아이디어&lt;br /&gt;: 자동 평가 지표(ROUGE 등)는 인간 선호도와 상관이 낮으므로, GPT-4 기반 승률 평가를 사용해 비교함&lt;/li&gt;
&lt;li&gt;실험 방식&amp;nbsp;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;task1 : Summarization
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Model : GPT-J SFT&lt;/li&gt;
&lt;li&gt;비교 대상 : DPO, PPO, Preferred-FT, Best of N&lt;/li&gt;
&lt;li&gt;동일한 SFT 모델을 각 방법으로 fine-tuning 후, 다양한 샘플링 온도(0.0~1.0)에서 생성&lt;br /&gt;&amp;rarr; 참조 요약 대비 평균 승률 계산&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;task2 : Single-turn Dialogue
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Model : Pythia-2.8B (SFT 없음)&lt;/li&gt;
&lt;li&gt;비교 대상 : DPO, Best of 128, Pythia-2.8B 2-shot prompting, PPO 기반 RLHF 모델&lt;/li&gt;
&lt;li&gt;Preferred-FT로 사전 fine-tuning 후 DPO 학습&lt;br /&gt;&amp;rarr; GPT-4 평가로 생성 응답의 선호 승률 계산&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;실험 결과
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Summarization
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO(온도 0.0) 승률 약 61%로, PPO(57%) 및 Best of N보다 우위임을 보임&lt;/li&gt;
&lt;li&gt;샘플링 온도 변화에도 PPO보다 성능 저하가 적음 (더 안정적임)&lt;/li&gt;
&lt;li&gt;Preferred-FT는 SFT 대비 큰 향상 없음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Single-turn Dialogue
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO가 Best of 128 수준의 성능을 효율적으로 달성함 (계산량이 훨씬 적음)&lt;/li&gt;
&lt;li&gt;PPO 기반 RLHF 모델은 기본 모델보다 개선된 프롬프트/온도를 찾지 못함&lt;/li&gt;
&lt;li&gt;DPO가 선호 응답 품질에서 안정적 우위 확보함&lt;/li&gt;
&lt;li&gt;학습 속도도 빠르게 수렴함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;6.3 Generalization to a new input distribution&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 목표&lt;br /&gt;: 학습 데이터와 다른 분포(CNN/DailyMail 뉴스 기사)에서 DPO의 일반화 성능 평가를 하고자 함&lt;/li&gt;
&lt;li&gt;평가 방식&lt;br /&gt;: GPT-4로 ground truth 요약 대비 승률 계산, 샘플링 온도 0 / 0.25에서 측정&lt;/li&gt;
&lt;li&gt;결과
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;DPO: 0.36(온도 0), 0.31(온도 0.25)&lt;/li&gt;
&lt;li&gt;PPO: 0.26(온도 0), 0.23(온도 0.25) &lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;DPO가 PPO보다 새로운 분포에서도 높은 승률을 기록하며 일반화 성능 우수함을 보임&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;6.4 Validating GPT-4 judgments with human judgments&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실험 목표&lt;br /&gt;: GPT-4의 자동 평가 결과가 인간 판단과 얼마나 일치하는지 검증하기 위해 TL;DR 요약 실험에서 인간 평가를 수행하고자 함&lt;/li&gt;
&lt;li&gt;실험 방식
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;두 가지 GPT-4 프롬프트를 사용함
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;GPT-4 (S): 중요한 정보를 더 잘 요약한 쪽 선택&lt;/li&gt;
&lt;li&gt;GPT-4 (C): 더 간결한 요약 선택&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;비교 대상 : DPO(온도 0.25, 최고 성능), SFT(온도 0.25, 중간 성능), PPO-1(온도 1.0, 최저 성능)&lt;/li&gt;
&lt;li&gt;인간 응답자 수: DPO 272명, SFT 122명, PPO-1 199명&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;실험 결과&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;330&quot; data-origin-height=&quot;196&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bMJ4Wf/btsPRD8A6xZ/OirreXBk7Dq1QkkWR9T9mK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bMJ4Wf/btsPRD8A6xZ/OirreXBk7Dq1QkkWR9T9mK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bMJ4Wf/btsPRD8A6xZ/OirreXBk7Dq1QkkWR9T9mK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbMJ4Wf%2FbtsPRD8A6xZ%2FOirreXBk7Dq1QkkWR9T9mK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;330&quot; height=&quot;196&quot; data-origin-width=&quot;330&quot; data-origin-height=&quot;196&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;GPT-4는 인간 평가의 합리적인 대체 지표로 활용 가능하며, 특히 GPT-4 (C) 프롬프트가 인간 판단을 더 잘 반영함을 보임&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;7. Discussion&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Limitation&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;Out-of-distribution generalization &lt;br /&gt;: 초기 실험에선 PPO와 비슷한 성능을 보였지만, 더 광범위한 검증이 필요함&lt;/li&gt;
&lt;li&gt;Self-labeling &lt;br /&gt;: DPO가 레이블 없는 프롬프트를 효과적으로 활용 가능한지 검증되지 않음&lt;/li&gt;
&lt;li&gt;Over-optimization &lt;br /&gt;: 성능 저하 사례가 보상 과잉 최적화 때문인지 불분명함&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Future Work&lt;/b&gt;&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;고품질 자동 평가 프롬프트 설계 연구&lt;/li&gt;
&lt;li&gt;언어 모델 외에도 다른 생성 모델 및 모달리티에 DPO 적용 가능성 연구&lt;/li&gt;
&lt;/ol&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/17</guid>
      <comments>https://yuha933.tistory.com/17#entry17comment</comments>
      <pubDate>Wed, 13 Aug 2025 16:29:46 +0900</pubDate>
    </item>
    <item>
      <title>[LLaMA] 논문 리뷰</title>
      <link>https://yuha933.tistory.com/16</link>
      <description>&lt;h3 data-ke-size=&quot;size23&quot;&gt;1.&amp;nbsp; Introduction&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 연구 방향&lt;br /&gt;- 대규모 텍스트 말뭉치로 사전학습된 언어 모델을 few-shot으로 새로운 작업에 활용함&lt;br /&gt;- 모델 크기를 키울수록 성능이 향상된다는 관찰 아래, 수천억~수조개 파라미터급 초대형 모델 개발을 진행함&lt;/li&gt;
&lt;li&gt;기존 연구의 한계&lt;br /&gt;- 주어진 학습 예산 내에서 더 큰 모델 보다 더 많은 토큰으로 학습한 상대적으로 작은 모델이 효율적일 수 있음&lt;br /&gt;- 서비스 환경에서 중요한 추론 속도와 비용을 충분히 반영하지 않아, 대형 모델이 실사용에 적합하지 않을 수 있음&lt;br /&gt;- 대부분의 대형 모델이 비공개 또는 문서화되지 않은 데이터에 의존함&lt;/li&gt;
&lt;li&gt;본 논문에서의 아이디어&lt;br /&gt;- 7B, 13B, 33B, 65B 파라미터 모델을 포함한 5가지 크기 범위에서, 기존보다 훨씬 많은 토큰으로 학습한 언어 모델인 LLaMA를 제안함&lt;br /&gt;- 완전 공개 소스로만 학습하여, 누구나 재현 가능하고 검증 용이한 오픈소스 모델을 지향함&lt;/li&gt;
&lt;li&gt;본 논문 아이디어만의 메리트&lt;br /&gt;- 경량화 대비 고성능을 보임&lt;br /&gt;- 단일 GPU 환경에서도 원활히 실행 가능해, 추론 비용 및 지연 시간을 크게 절감함&lt;br /&gt;- 공개 데이터 기반 학습으로, 누구나 손쉽게 모델을 재현 및 확장할 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;2. Approach&lt;/h3&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2.1 Pre-training Data&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- English CommonCrawl [67%]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- C4 [15]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Github [4.5%]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Wikipedia [4.5%]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Gutenbreg and Books3 [4.5%]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- ArXiv [2.5%]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Stack Exchange [2%]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Tokenizer : SentencePiece 구현을 활용한 BPE 사용&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 style=&quot;color: #000000;&quot; data-ke-size=&quot;size20&quot;&gt;2.2 Architecture&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Transformer 아키텍처 기반&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Pre-normalization [GPT3]&lt;br /&gt;&amp;gt;&amp;gt; 각 서브레이어 입력을 정규화하여 학습 안정성 강화&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- SwiGLU activation function [PaLM]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;gt;&amp;gt; ReLU 대신 Shazeer의 SwiGLU 도입으로 비선형성 및 표현력 향상&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Rotary Embeddings [GPTNeo]&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;gt;&amp;gt; 절대 위치 임베딩 제거&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Hyperparameter&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;847&quot; data-origin-height=&quot;200&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/MaBN2/btsPHOJ8Fzc/4mPRpKRLKVh5kejl5nkZKK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/MaBN2/btsPHOJ8Fzc/4mPRpKRLKVh5kejl5nkZKK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/MaBN2/btsPHOJ8Fzc/4mPRpKRLKVh5kejl5nkZKK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FMaBN2%2FbtsPHOJ8Fzc%2F4mPRpKRLKVh5kejl5nkZKK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;142&quot; data-origin-width=&quot;847&quot; data-origin-height=&quot;200&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2.3 Optimizer&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 최적화 기법 : AdamW&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 모멘텀 하이퍼파라미터 : beta1 - 0.9 / beta2 - 0.95&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 학습률 스케줄 : 코사인 스케줄&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 정규화 : weight decay = 0.1&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 클리핑 : Max(gradient clipping) = 1.0&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 워밍업 : 2,000 step&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2.4 Efficient Implementation&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- casual multi-head attention&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;gt;&amp;gt; backward를 활용하여 attention 가중치 미저장 및 마스크된 key/query 점수 계산 생략&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- check pointing&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;gt;&amp;gt; 비용이 큰 activation만 저장하고 나머지는 재계산&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- model &amp;amp; sequence parallelism&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;gt;&amp;gt; 모델 병렬 및 시퀀스 병렬 사용으로 GPU 메모리 사용량 최소화&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3. Main results&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Zero-shot : 태스크에 대한 텍스트 설명과 테스트 예제만 입력해서 모델은 개방형 생성을 통해 답변을 생성하거나, 제안된 답변 후보들을 순위 매기는 방식&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Few-shot : 태스크마다 1개에서 64개 사이의 input&amp;ndash;output 쌍과 테스트 예제를 제공해서 모델은 예제와 테스트를 한 번에 입력받아 답변을 생성하거나, 후보들을 순위 매기는 방식&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 비교 모델&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;539&quot; data-start=&quot;400&quot;&gt;비공개 대형 모델: GPT-3, Gopher, Chinchilla, PaLM&lt;/li&gt;
&lt;li data-end=&quot;651&quot; data-start=&quot;542&quot;&gt;오픈 소스 모델: OPT 계열, GPT-J, GPT-Neo&lt;/li&gt;
&lt;li data-end=&quot;752&quot; data-start=&quot;654&quot;&gt;Instruction-tuned 모델: OPT-IML, Flan-PaLM&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Multiple choice task&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;: 주어진 컨텍스트 아래에서 가장 높은 likelihood를 가진 completion 선택하는 방식&lt;br /&gt;(&amp;lt;-&amp;gt; 일반 태스크 : 문자 수로 정규화된 likelihood 사용)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.1 Common Sense Reasoning&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 벤치마크&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;103&quot; data-start=&quot;69&quot;&gt;BoolQ&lt;/li&gt;
&lt;li data-end=&quot;140&quot; data-start=&quot;108&quot;&gt;PIQA&lt;/li&gt;
&lt;li data-end=&quot;176&quot; data-start=&quot;145&quot;&gt;SIQA&lt;/li&gt;
&lt;li data-end=&quot;221&quot; data-start=&quot;181&quot;&gt;HellaSwag&lt;/li&gt;
&lt;li data-end=&quot;269&quot; data-start=&quot;226&quot;&gt;WinoGrande&lt;/li&gt;
&lt;li data-end=&quot;323&quot; data-start=&quot;274&quot;&gt;ARC easy &amp;amp; challenge&lt;/li&gt;
&lt;li data-end=&quot;370&quot; data-start=&quot;328&quot;&gt;OpenBookQA&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 설정 : Zero-shot으로 평가&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;967&quot; data-origin-height=&quot;383&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b1EMSh/btsPKfGdCQA/Cd9r9VrVLcvzyH5mWS6E6k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b1EMSh/btsPKfGdCQA/Cd9r9VrVLcvzyH5mWS6E6k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b1EMSh/btsPKfGdCQA/Cd9r9VrVLcvzyH5mWS6E6k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb1EMSh%2FbtsPKfGdCQA%2FCd9r9VrVLcvzyH5mWS6E6k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;277&quot; data-origin-width=&quot;967&quot; data-origin-height=&quot;383&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;763&quot; data-start=&quot;727&quot;&gt;경량 모델임에도 대형 모델과 견줄 만한 상식추론 성능 확보함&lt;/li&gt;
&lt;li data-is-last-node=&quot;&quot; data-end=&quot;804&quot; data-start=&quot;766&quot;&gt;특히 중간 규모(13B, 65B) 모델의 효율성과 경쟁력 입증함&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.2 Closed-book Question Answering&amp;nbsp;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 벤치마크 : Natural Questions, TriviaQA&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 설정 : Zero-shot 및 Few-shot 모두에서 정확히 일치하는지 성능 평가&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;476&quot; data-origin-height=&quot;383&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bsLah8/btsPIH4RbT0/ZTCU8NVpvU2RkE02jaYQi1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bsLah8/btsPIH4RbT0/ZTCU8NVpvU2RkE02jaYQi1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bsLah8/btsPIH4RbT0/ZTCU8NVpvU2RkE02jaYQi1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbsLah8%2FbtsPIH4RbT0%2FZTCU8NVpvU2RkE02jaYQi1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;241&quot; data-origin-width=&quot;476&quot; data-origin-height=&quot;383&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;482&quot; data-origin-height=&quot;256&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bfex32/btsPJeac6J8/6cl29W3PKZqrQKwzQPwDak/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bfex32/btsPJeac6J8/6cl29W3PKZqrQKwzQPwDak/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bfex32/btsPJeac6J8/6cl29W3PKZqrQKwzQPwDak/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbfex32%2FbtsPJeac6J8%2F6cl29W3PKZqrQKwzQPwDak%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;186&quot; data-origin-width=&quot;482&quot; data-origin-height=&quot;256&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.3 Reading Comprehension&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 벤치마크 : RACE&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 설정: Language Models are Few-Shot Learners (Brown et al.) 방식&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;485&quot; data-origin-height=&quot;333&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bvnU9A/btsPKGKkVzz/KkfVaK8snmRKC6mHTuViTk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bvnU9A/btsPKGKkVzz/KkfVaK8snmRKC6mHTuViTk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bvnU9A/btsPKGKkVzz/KkfVaK8snmRKC6mHTuViTk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbvnU9A%2FbtsPKGKkVzz%2FKkfVaK8snmRKC6mHTuViTk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;206&quot; data-origin-width=&quot;485&quot; data-origin-height=&quot;333&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.4 Mathematical reasoning&lt;/h4&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 벤치마크 : MATH, GSM8k&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 비교 대상 : PaLM, Minerva&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 평가 지표 : maj1@k&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;469&quot; data-origin-height=&quot;389&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bVBQRh/btsPJiwT5Y8/KbiEmkYvsFyAtTk8gbzdY0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bVBQRh/btsPJiwT5Y8/KbiEmkYvsFyAtTk8gbzdY0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bVBQRh/btsPJiwT5Y8/KbiEmkYvsFyAtTk8gbzdY0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbVBQRh%2FbtsPJiwT5Y8%2FKbiEmkYvsFyAtTk8gbzdY0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;290&quot; data-origin-width=&quot;469&quot; data-origin-height=&quot;389&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.5 Code generation&lt;/h4&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 벤치마크 : HumanEval, MBPP&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 설정 : 자연어 설명 + I/O 예제, HumanEval은 함수 시그니처 및 docstring 포함&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 평가 지표 : pass@1 (temperature=0.1), pass@100/80 (temperature=0.8), Chen et al.(2021) 방식의 unbiased 추정&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;499&quot; data-origin-height=&quot;375&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ofStj/btsPK1nagB9/lR3ogv7TcX9lV1d8jKB8dK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ofStj/btsPK1nagB9/lR3ogv7TcX9lV1d8jKB8dK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ofStj/btsPK1nagB9/lR3ogv7TcX9lV1d8jKB8dK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FofStj%2FbtsPK1nagB9%2FlR3ogv7TcX9lV1d8jKB8dK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;263&quot; data-origin-width=&quot;499&quot; data-origin-height=&quot;375&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;※ 코드 전용 파인튜닝으로 성능 향상 가능하나, 본 연구 범위 밖임&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.6 Massive Multitask Language Understanding&lt;/h4&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 벤치마크 : MMLU&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 설정 : 5-shot 평가, 제공된 예제 활용&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;801&quot; data-origin-height=&quot;414&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b45Yzf/btsPIePnbEb/xjIBUKnPX0YUwyDfOE66M0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b45Yzf/btsPIePnbEb/xjIBUKnPX0YUwyDfOE66M0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b45Yzf/btsPIePnbEb/xjIBUKnPX0YUwyDfOE66M0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb45Yzf%2FbtsPIePnbEb%2FxjIBUKnPX0YUwyDfOE66M0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;258&quot; data-origin-width=&quot;801&quot; data-origin-height=&quot;414&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.7 Evolution of performance during training&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;979&quot; data-origin-height=&quot;619&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bQKOKs/btsPKSKz9nb/3yVNWKokk9HwcxKUUxSduk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bQKOKs/btsPKSKz9nb/3yVNWKokk9HwcxKUUxSduk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bQKOKs/btsPKSKz9nb/3yVNWKokk9HwcxKUUxSduk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbQKOKs%2FbtsPKSKz9nb%2F3yVNWKokk9HwcxKUUxSduk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;379&quot; data-origin-width=&quot;979&quot; data-origin-height=&quot;619&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 거의 모든 그래프에서, 더 많은 토큰으로 학습할수록(&amp;rarr; 퍼플렉서티 감소) 더 높은 정확도를 달성함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 모델 크기가 클수록 초기부터 더 빠르게, 더 높은 성능을 냄&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 점선을 넘은 구간에서는 LLaMA가 Chinchilla를 앞지르며, 특히 TriviaQA&amp;middot;HellaSwag&amp;middot;NaturalQuestions 등에서 빠르게 역전함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 아래와 같은 예외적 상황을 보임&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;735&quot; data-start=&quot;687&quot;&gt;SIQA: 곡선이 들쑥날쑥해 벤치마크 자체의 안정성이 낮을 가능성을 시사&lt;/li&gt;
&lt;li data-end=&quot;805&quot; data-start=&quot;739&quot;&gt;WinoGrande: 성능 향상 정도가 퍼플렉서티 감소(학습 진행)와 뚜렷하게 상관관계가 없음을 보여 줌&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 중대형 모델(33B, 65B)은 학습 후반부에 유사한 성능 궤적을 보이며, 규모 확장이 가져다주는 이득이 점차 감소함을 암시함&lt;br /&gt;&lt;br /&gt;- 의의 : 더 많은 학습 토큰과 더 큰 모델이 대부분의 벤치마크에서 더 나은 성능을 빠르게 확보하지만, 일부 태스크는 별도의 요인(벤치마크 신뢰도, 태스크 특성 등)에 의해 학습 곡선이 예외적일 수 있다 !&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4. Instruction Finetuning&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 소량의 instruction 데이터로 파인튜닝이 MMLU 성능을 얼마나 빠르게 끌어올리는지 평가&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 파인튜닝 프로토콜 : Chung et al.의 방식을 그대로 적용하며, 단일 실험만 수행하여 LLaMA-I (65B)를&amp;nbsp;학습함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 비교 대상&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;361&quot; data-start=&quot;266&quot;&gt;중간 규모 instruction-tuned 모델: OPT-IML, Flan-PaLM 시리즈&lt;/li&gt;
&lt;li data-end=&quot;428&quot; data-start=&quot;364&quot;&gt;최첨단 instruction 모델: GPT code-davinci-002&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;375&quot; data-origin-height=&quot;377&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/s5fxx/btsPJkhiyIL/y6eGEOnOENgodZyi2Tt7yk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/s5fxx/btsPJkhiyIL/y6eGEOnOENgodZyi2Tt7yk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/s5fxx/btsPJkhiyIL/y6eGEOnOENgodZyi2Tt7yk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fs5fxx%2FbtsPJkhiyIL%2Fy6eGEOnOENgodZyi2Tt7yk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;270&quot; height=&quot;271&quot; data-origin-width=&quot;375&quot; data-origin-height=&quot;377&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;5. Bias, Toxicity and Misinformation&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- LLaMA-65B가 생성할 수 있는 유해 콘텐츠 및 고정관념 수준을 측정하여, 모델의 윤리적 리스크를 가늠하고자 함.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 벤치마크 : toxicity&amp;middot;bias&amp;middot;misinformation 평가 도구들을 일부 채택하였으나, 이들 시험만으로는 모델이 내포한 모든 위험 요소를 완전히 진단하기 어렵다는 한계 인식를 인식함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;5.1 Real Toxicity Prompts&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 언어 모델이 생성할 수 있는 모욕&amp;middot;혐오&amp;middot;위협 등 유해 언어를 평가함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Real Toxicity Prompts : 약 10만 개의 프롬프트로 구성된 벤치마크&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 평가 방법 : Real Toxicity Prompts의 각 프롬프트에 대해 모델이 탐욕적 생성한 문장에 PerspectiveAPI로 유해성 점수(0~1)를 산출함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;398&quot; data-origin-height=&quot;198&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/kAETL/btsPHgzXe74/ezSBCzAcsG5kkVBAevKBRk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/kAETL/btsPHgzXe74/ezSBCzAcsG5kkVBAevKBRk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/kAETL/btsPHgzXe74/ezSBCzAcsG5kkVBAevKBRk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FkAETL%2FbtsPHgzXe74%2FezSBCzAcsG5kkVBAevKBRk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;149&quot; data-origin-width=&quot;398&quot; data-origin-height=&quot;198&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 시사점 : 정중한 프롬프트에서 모델 크기 증가에 따라 유해성 점수가 상승하는 경향을 관찰함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 한계점 : 외부 PerspectiveAPI 의존으로 동일 평가 파이프라인을 재현&amp;middot;비교하기 어려움&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;5.2&amp;nbsp;CrowS-Pairs&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 고정관념 문장과 반고정관념 문장 중 어떤 문장을 모델이 더 선호하는지 측정하여 사회&amp;middot;문화적 편향을 평가함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- CrowS-Pairs :&amp;nbsp;성별, 종교, 인종&amp;middot;피부색, 성적 지향, 나이, 국적, 장애, 외모, 사회경제적 지위 등 9개 범주의 편향을 다루는 문장 쌍 데이터셋&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 평가 방법 : CrowS-Pairs의 9개 사회범주별로 고정관념 문장 vs 반고정관념 문장에 대한 모델의 선호도를 제로샷으로 측정하여 편향 점수를 산출함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;492&quot; data-origin-height=&quot;378&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cQEOl3/btsPIsNcD01/773EzpsDXLkcaw2aZnwFM1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cQEOl3/btsPIsNcD01/773EzpsDXLkcaw2aZnwFM1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cQEOl3/btsPIsNcD01/773EzpsDXLkcaw2aZnwFM1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcQEOl3%2FbtsPIsNcD01%2F773EzpsDXLkcaw2aZnwFM1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;307&quot; data-origin-width=&quot;492&quot; data-origin-height=&quot;378&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 시사점 : LLaMA는 평균적으로 GPT-3&amp;middot;OPT-175B보다 약간 더 편향적이며, 특히 종교(OPT 대비 +10%), 그다음 나이&amp;middot;성별 범주에서 편향이 두드러짐&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 한계점 : 문장 쌍 선호만으로 국한된 편향 측정이므로, 실제 다양한 텍스트 생성 맥락에서 나타나는 복합적&amp;middot;암묵적 편향을 완전하게 포착하기 어려움&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;5.3 WinoGender&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- WinoGender (Rudinger et al., 2018) &amp;ndash; Winograd 스타일의 공참조 해결 문제로 성별 편향을 측정함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- WinoGender :&amp;nbsp;Winograd schema 기반의 공참조 해결 데이터셋&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 평가 방법 : 제로샷으로 세 가지 대명사 각각에 대한 공참조 해결 정확도를 측정하고, 특히 &amp;ldquo;gotcha&amp;rdquo;(직업의 다수 성별과 반대 대명사가 정답인) 사례에서 오류율을 비교함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;505&quot; data-origin-height=&quot;272&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/rEZLx/btsPKhYi7fn/HebWuJWs2UKkfpI7sxiTL1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/rEZLx/btsPKhYi7fn/HebWuJWs2UKkfpI7sxiTL1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/rEZLx/btsPKhYi7fn/HebWuJWs2UKkfpI7sxiTL1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrEZLx%2FbtsPKhYi7fn%2FHebWuJWs2UKkfpI7sxiTL1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;189&quot; data-origin-width=&quot;505&quot; data-origin-height=&quot;272&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 시사점 : 모델이 문맥 증거보다 직업의 통계적 성별(예: &amp;lsquo;간호사=여성&amp;rsquo;)을 우선 사용하여 공참조를 수행, 직업 기반 성별 고정관념을 재현함을 보여 줌&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-한계점 : Winograd schema 유형의 공참조 문제에 한정된 평가로, 실제 다양한 대화&amp;middot;생성 상황에서 나타나는 성별 편향을 포괄적으로 진단하기 어려움&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;5.4 TruthfulQA&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 잘못된 정보나 허위 응답 생성 위험을 측정함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- TruthfulQA :&amp;nbsp;모델이 &amp;ldquo;실제 세계에 대한 문자 그대로의 진실&amp;rdquo;을 얼마나 잘 식별하고 설명하는지를 평가하는 벤치마크&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 평가 방법 : 각 질문에 대해 모델이 생성한 답변의 진실성(true)과 진실하면서도 유용한 정보(truthful &amp;and; useful) 교집합 정확도를 계산함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과 비교&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;427&quot; data-origin-height=&quot;291&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bENX2v/btsPJEsL6Ac/IXXQAGIwPeemHYqqZNYbNK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bENX2v/btsPJEsL6Ac/IXXQAGIwPeemHYqqZNYbNK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bENX2v/btsPJEsL6Ac/IXXQAGIwPeemHYqqZNYbNK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbENX2v%2FbtsPJEsL6Ac%2FIXXQAGIwPeemHYqqZNYbNK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;204&quot; data-origin-width=&quot;427&quot; data-origin-height=&quot;291&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 시사점 : LLaMA는 기존 모델보다 진실성 면에서 개선되었으나, 여전히 허위 정보 생성 위험이 상당히 남아 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 한계점 : 제한된 수의 적대적 질문 세트로만 평가하므로, 실제 다양한 오정보 시나리오에서의 성능을 완전히 진단하기 어려움&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;6. Carbon footprint&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 대규모 LLM 훈련에 소요된 총 에너지와 CO₂ 배출량을 정량화하여 환경적 영향을 평가함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 평가 방법 : GPU 전력 소비량을 PUE=1.1로 조정해 MWh로 환산한 뒤, 미국 평균 탄소 집약도 계수(0.385 kg CO₂eq/kWh)를 곱해 총 CO₂eq 배출량을 산출함&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;690&quot; data-origin-height=&quot;67&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nCXLc/btsPIymj9TI/q4OJcIymucIT7a9kfmRFT1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nCXLc/btsPIymj9TI/q4OJcIymucIT7a9kfmRFT1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nCXLc/btsPIymj9TI/q4OJcIymucIT7a9kfmRFT1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnCXLc%2FbtsPIymj9TI%2Fq4OJcIymucIT7a9kfmRFT1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;39&quot; data-origin-width=&quot;690&quot; data-origin-height=&quot;67&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 결과&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;936&quot; data-origin-height=&quot;292&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cQnGpT/btsPIpQsbko/czkGkQNrm44wEEba296wq0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cQnGpT/btsPIpQsbko/czkGkQNrm44wEEba296wq0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cQnGpT/btsPIpQsbko/czkGkQNrm44wEEba296wq0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcQnGpT%2FbtsPIpQsbko%2FczkGkQNrm44wEEba296wq0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;218&quot; data-origin-width=&quot;936&quot; data-origin-height=&quot;292&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 의의 : 이미 훈련된 모델 공개로, 후속 연구자는 대규모 재훈련 없이 고성능 LLM 활용이 가능하며, 추가 탄소 배출 절감 기대함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 한계 : 미국 평균 계수만 사용해 지역별 전력망 특성(청정 에너지 비율 등)을 반영하지 못함 -&amp;gt; 실제 배출량은 다를 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;8. Conclusion&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 한계&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;516&quot; data-start=&quot;433&quot;&gt;Instruction 파인튜닝 범위 제한: 단일 실험만 수행되어, 다양한 튜닝 데이터&amp;middot;프로토콜에서의 일반화 가능성이 아직 불확실함&lt;/li&gt;
&lt;li data-end=&quot;599&quot; data-start=&quot;517&quot;&gt;편향&amp;middot;유해성 평가의 불완전성: 외부 API 의존&amp;middot;제한된 벤치마크 활용으로, 실제 사용 맥락에서의 위험 요소를 모두 포착하지 못함&lt;/li&gt;
&lt;li data-end=&quot;670&quot; data-start=&quot;600&quot;&gt;탄소 발자국 추정의 단순화: 미국 평균 전력 계수만 사용해 지역별&amp;middot;기관별 실제 배출량 차이를 반영하지 못함&lt;/li&gt;
&lt;li data-is-last-node=&quot;&quot; data-end=&quot;755&quot; data-start=&quot;671&quot;&gt;일부 벤치마크 성능 격차: MMLU 등 특정 태스크에서 최첨단 모델과 격차가 존재하며, 대용량 도서&amp;middot;학술 데이터 부족이 요인일 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 향후 연구 계획&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;134&quot; data-start=&quot;15&quot;&gt;Instruction 파인튜닝 심화: Chung et al.(2022) 방식 외에도 다양한 프로토콜과 데이터셋으로 소량의 instruction 학습이 모델 성능에 미치는 영향을 체계적으로 분석함&lt;/li&gt;
&lt;li data-end=&quot;242&quot; data-start=&quot;135&quot;&gt;더 큰 규모의 모델 출시: 현재 65B보다 더 많은 파라미터와 방대한 프리트레이닝 코퍼스를 활용한 대형 LLaMA 모델을 개발하여, 스케일링에 따른 성능 향상 여부를 검증함&lt;/li&gt;
&lt;li data-end=&quot;339&quot; data-start=&quot;243&quot;&gt;편향&amp;middot;유해성 완화 기법 연구: 현재 평가된 toxicity&amp;middot;bias 지표를 바탕으로, 데이터 필터링&amp;middot;디버깅&amp;middot;후처리 등의 구체적 완화 전략을 설계&amp;middot;테스트함&lt;/li&gt;
&lt;li data-end=&quot;422&quot; data-start=&quot;340&quot;&gt;도메인&amp;middot;언어 확장: 법률&amp;middot;의료&amp;middot;과학 등 특정 분야와 비영어권 언어에 특화된 추가 파인튜닝 및 평가를 통해 범용성과 적용성을 높임&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/16</guid>
      <comments>https://yuha933.tistory.com/16#entry16comment</comments>
      <pubDate>Thu, 7 Aug 2025 11:18:40 +0900</pubDate>
    </item>
    <item>
      <title>[CoT] 논문 리뷰</title>
      <link>https://yuha933.tistory.com/15</link>
      <description>&lt;h3 data-ke-size=&quot;size23&quot;&gt;1. Introduction&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;연구 배경 : 언어 모델의 크기를 키우는 것은 성능 향상과 샘플 효율성 증진 등의 다양한 이점들이 있지만, 단순히 모델 크기를 키운다고 해서 어려운 과제에서 높은 성능을 달성하지는 못함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;대형 언어 모델의 추론 능력을 끌어낼 수 있는 기존 방법과 한계&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. 자연어 기반 추론 과정 생성 (우리가 푸는 과정 자체를 일일이 자연어로 모델에게 가르치는 것 !)&lt;br /&gt;-&amp;gt; 단순한 입력-출력 쌍보다 훨씬 복잡한 고품질의 추론 데이터를 대량으로 만드는 데 많은 비용이 듦&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. 문맥 기반의 few-shot 학습 (예시 몇 개 던져주고, 알아서 따라해보라고 하는 것 !)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; 추론 능력이 요구되는 작업에서 잘 작동하지 않으며, 모델 규모를 키워도 성능 향상이 제한적임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문의 아이디어&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;: 입력, 사고의 흐름(Chain of Thought), 출력⟩ 형태로 구성된 프롬프트를 제시함으로써, 대형 언어 모델이 few-shot prompting만으로 추론이 필요한 과제를 수행할 수 있는지를 탐색하는 방식&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 연구의 의의&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;: 대형 언어 모델이 대규모 학습 데이터 없이도, 과제에 대한 자연어 기반의 소수의 예시만으로 학습할 수 있음을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;2. Chain-of-Thought Prompting &lt;/h3&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;781&quot; data-origin-height=&quot;395&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ZtAmx/btsPxkmTHnb/6yX67dHn8NFDWL9zWUG6aK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ZtAmx/btsPxkmTHnb/6yX67dHn8NFDWL9zWUG6aK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ZtAmx/btsPxkmTHnb/6yX67dHn8NFDWL9zWUG6aK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FZtAmx%2FbtsPxkmTHnb%2F6yX67dHn8NFDWL9zWUG6aK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;303&quot; data-origin-width=&quot;781&quot; data-origin-height=&quot;395&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;CoT : 사고의 흐름으로, 문제의 최종 답에 이르기까지의 일련의 일관된 중간 추론 과정을 생성하도록 유도하는 방식&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;CoT의 장점&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. 복잡한 문제를 단계별로 분해할 수 있어, 더 많은 추론이 필요한 문제에 계산 자원을 효과적으로 배분할 수 있음&lt;br /&gt;2. 모델의 추론 과정을 해석할 수 있어, 답이 도출된 이유를 파악하거나 오류 디버깅이 가능함&lt;br /&gt;3. 수학, 상식, 기호 추론 등 다양한 언어 기반 과제에 적용 가능하며, 원칙적으로 인간이 언어로 해결할 수 있는 모든 작업에 활용될 수 있음&lt;br /&gt;4. 예시만 추가하면 쉽게 유도 가능하여, 별도 학습 없이도 대형 언어 모델에서 바로 적용할 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3. Arithmetic Reasoning&lt;/h3&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.1 Experimental Setup&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;사용한 다섯 가지 수학 문장제 벤치마크&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. GSM8K&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. SVAMP&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3. ASDiv&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4. AQuA&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;5. MAWPS&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Prompting 비교&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Standard prompting : 문제와 정답으로 이루어진 간단한 입력&amp;ndash;출력 예시 몇 개를 제시하여, 모델이 별다른 사고 과정 없이 곧바로 정답을 예측하도록 유도하는 방식&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Chain-of-Thought prompting : 문제와 함께 중간 추론 과정을 포함한 8개의 사고 흐름 예시를 입력으로 주어, 모델이 이를 따라 단계적으로 사고하며 정답에 도달하도록 유도하는 방식&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;사용한 다섯 개의 대형 언어 모델&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. GPT-3&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. LaMDA&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3. PaLM&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4. UL2&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;5. Codex&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.2 Results&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;489&quot; data-origin-height=&quot;755&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cRWxMZ/btsPuYlTzsD/7KtGa0EFRL20uM995fEkd1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cRWxMZ/btsPuYlTzsD/7KtGa0EFRL20uM995fEkd1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cRWxMZ/btsPuYlTzsD/7KtGa0EFRL20uM995fEkd1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcRWxMZ%2FbtsPuYlTzsD%2F7KtGa0EFRL20uM995fEkd1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;463&quot; data-origin-width=&quot;489&quot; data-origin-height=&quot;755&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;첫 번째 결론 : CoT prompting은 작은 모델에서는 성능 향상에 도움이 되지 않으며, 약 100B의 대규모 모델에서만 성능이 향상됨&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;두 번째 결론 : 복잡한 문제일수록 성능 향상 폭이 큼&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;세 번째 결론 : GPT-3 175B와 PaLM 540B를 통한 CoT prompting은 기존에 라벨링된 데이터셋으로 파인튜닝한 SOTA 성능을 뛰어넘음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;정성적 결과 분석&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 정답의 경우, 사고 흐름이 우연히가 아닌 수학적으로 정확했음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 오답의 경우, 약 46%는 계산 실수나 단순 누락 등 사소한 오류, 54%는 의미 해석 및 사고 흐름상의 구조적 오류로 인해 나타남&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- PaLM 62B &amp;rarr; 540B로 확장 시, 사고 단계 누락과 의미 오류가 대부분 해결됨&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.3 Ablation Study &lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;가설 1 : CoT가 수학적 수식을 생성하는 능력 때문에 효과적인 것일 수 있다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 실험 : 정답 대신 문제에서 파생된 수식만 출력하도록 prompting을 구성함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 결과 : 복잡한 문제에서는 효과 없는 것을 확인함으로써, 단순 수식 생성만으로는 충분하지 않다는 것을 확인함 (다만, 단순 문제에서는 약간의 효과를 보임)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;가설 2 : CoT는 단순히 더 많은 연산량을 사용하기 때문에 효과적인 것일 수 있다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&amp;rarr; 실험 :&lt;/span&gt; 정답과 관련된 수식의 문자 수만큼 점(&amp;hellip;)만 출력하도록 prompting&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 결과 : 성능은 baseline 수준인 것을 확인함으로써, 연산량 증가만으로는 CoT의 효과를 설명할 수 없음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;가설 3 : CoT는 정답 이후 지식을 활성화하는 데만 도움을 주는 것일 수 있다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 실험 : 정답 먼저 출력한 뒤, 사고의 흐름을 출력하도록 구성하여 순서를 바꿈&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 결과 : baseline 수준 성능임으로 확인함으로써, CoT는 단순한 지식 활성화가 아니라 실제 reasoning 구조에 기여함을 확인함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.4 Robustness of Chain of Thought&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;가설 : few-shot prompting처럼 CoT prompting도 예시 순서, 문체 작성자에 민감할 것이다&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 실험 : 작성자 A, B, C는 동일한 문제에 대해 각자 사고의 흐름을 작성했고, A는 추가로 간결한 문체 버전도 만들어 문체와 작성자에 따른 성능 차이를 함께 분석함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 결과 : 작성자, 문체, 예시 순서가 달라도 모든 경우에서 CoT prompting은 standard prompting보다 consistently 높은 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4. Commonsense Reasoning&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아이디어 : 수학 문제 외에도 상식 추론 과제에서도 CoT prompting이 효과적인지 알아보자 !&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;사용된 다섯 가지 벤치마크&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. CSQA : 상식 질문&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. StrategyQA : 다단계 전략 추론이 필요한 과제&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;3. Date Understanding : 주어진 문맥에서 날짜 추론하는 과제&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4. Sports Understanding : 스포츠 관련 문장이 타당한지 판단하는 과제&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;5. SayCan : 자연어 지시를 로봇의 이산적인 행동 시퀀스로 변환하는 과제&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Prompting&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;: 이전과 동일한 방식으로 few-shot 예시를 수동 구성하여 chain of thought 형식으로 제공함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 결과&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;896&quot; data-origin-height=&quot;263&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/8ATTD/btsPwHwdmsj/jZQfKUezwgTxexNLnKP1j1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/8ATTD/btsPwHwdmsj/jZQfKUezwgTxexNLnKP1j1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/8ATTD/btsPwHwdmsj/jZQfKUezwgTxexNLnKP1j1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F8ATTD%2FbtsPwHwdmsj%2FjZQfKUezwgTxexNLnKP1j1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;176&quot; data-origin-width=&quot;896&quot; data-origin-height=&quot;263&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- CoT prompting은 수학 문제뿐 아니라 상식 추론 과제 전반에서도 효과적으로 작동함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 특히, 대형 모델(PaLM 540B)에서는 SOTA와 사람 수준을 뛰어넘는 성과를 달성함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;5. Symbolic Reasoning&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아이디어 : 사람에게는 간단하지만 언어 모델에게는 잠재적으로 어려운 상징적 추론을 할 수 있는지 알아보자 !&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;과제&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;1. 이름의 각 단어에서 마지막 글자들만 이어붙이는 작업&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;2. 사람들이 동전을 던졌는지 여부에 따라 동전이 앞면인지 뒷면인지 예측하는 작업&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- In-domain : 훈련 예시와 동일한 단계 수의 문제&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;-&lt;/b&gt; Out-of-domain (OOD): 훈련보다 더 많은 단계가 필요한 문제&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 결과&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- In-domain에서는 PaLM 540B + CoT가 두 과제 모두 거의 100% 정답률 달성함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 작은 모델은 같은 예시를 따라하는 것조차 실패했으며, CoT 효과는 100B 파라미터 이상 모델에서 나타남&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- OOD에서는 CoT가 일반 프롬프트보다 성능이 더 좋았지만, 완벽하진 않은 것을 통해,&amp;nbsp; CoT는 훈련 예시보다 더 긴 추론에도 일정 수준 일반화 능력을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;6. Discussion&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;한계&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- CoT가 사람의 추론 방식을 모방하기는 하나, 실제 신경망이 이해하는지는 알 수 없음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- CoT도 정확하지 않은 경로로 추론해 잘못된 정답에 도달할 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 더 정교한 생성 메커니즘 개발 필요함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;7. Related Work&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;관련 연구 1 : 중간 추론 단계(intermediate steps)를 사용하는 연구&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 자연어로 구성된 추론 설명을 통해 문제 해결 단계들을 순차적으로 명시함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;harr; 기존 연구가 대부분 태스크별 훈련을 하거나, 데이터셋을 추가 제작했던 것에 반해, 본 연구에서는 사전학습된 언어모델을 추가 학습 없이 활용하며, Chain-of-Thought&amp;nbsp;프롬프트만으로도 성능 향상을 유도함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;관련 연구2 : Prompting&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 모델 입력부에 효과적인 예시, 설명, 명시적 지시어 등을 추가하여 추론 능력을 개선함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&amp;harr;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt; 기존 연구는 대부분 입력 프롬프트를 강화하는 방식이었으나, 본 연구는 출력부에 &amp;lsquo;추론 흐름&amp;rsquo;을 생성하도록 유도해 결과 출력에 구조적 사고 포함되도록 함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;8. Conclusions&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 언어 모델이 수행할 수 있는 추론 과제의 범위를 넓히는 것은 언어 기반 추론 접근 방식에 대한 후속 연구를 촉진할 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/15</guid>
      <comments>https://yuha933.tistory.com/15#entry15comment</comments>
      <pubDate>Thu, 24 Jul 2025 03:29:07 +0900</pubDate>
    </item>
    <item>
      <title>[Chinchilla] 논문 리뷰</title>
      <link>https://yuha933.tistory.com/14</link>
      <description>&lt;h3 data-ke-size=&quot;size23&quot;&gt;1. Introduction&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;연구 배경 : 최근 거대한 LLMs들이 개발되며, 모델 파라미터 수가 500B를 초과하는 수준에 도달함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 훈련에 막대한 연산량과 에너지 비용이 들어감 (훈련 가능한 연산량은 정해져 있는데 ...)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존 연구의 한계&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존 연구 : 모델 크기가 증가하면, 성능이 향상한다는 power-law 관계를 제시함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 연산량이 10배 증가 시, 모델 크기 5.5배 증가, 학습 토큰 수 1.8배 증가해야한다고 주장&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 즉, 모델 크기를 중심으로 확장해야 성능이 올라간다고 주장함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;(like )&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문의 아이디어&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;: 고정된 연산량(FLOPs) 예산이 주어졌을 때, 모델 크기와 훈련 토큰 수의 균형은 어떻게 설정해야 할까?&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 모델 크기와 토큰 수를 동일 비율로 증가시켜야 성능이 최적화된다고 주장함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;본 논문 실험&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;70M ~ 16B 규모의 400개 이상의 모델을 다양한 토큰 수 5B ~ 400B로 학습하여 실험함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 실험 결과, 기존 연구의 결과와 상반된 결론을 도출함을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 결과적으로, 모델 크기보다는 충분한 데이터 확보가 성능 향상에 핵심이라는 점을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;lt;Figure 1&amp;gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1007&quot; data-origin-height=&quot;503&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/das7YN/btsPi3HMtt1/N4IlknUBPa6ujjUCKWRRkk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/das7YN/btsPi3HMtt1/N4IlknUBPa6ujjUCKWRRkk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/das7YN/btsPi3HMtt1/N4IlknUBPa6ujjUCKWRRkk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fdas7YN%2FbtsPi3HMtt1%2FN4IlknUBPa6ujjUCKWRRkk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;300&quot; data-origin-width=&quot;1007&quot; data-origin-height=&quot;503&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;동일한 연산량 자원 예산이라면, 파라미터 수를 너무 크게 잡기 보다는 데이터 토큰 수를 충분히 확보하는 전략이 성능 향상에 더 효과적임을 시각적으로 보여줌&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;lt;Table 1&amp;gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;883&quot; data-origin-height=&quot;295&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/8tB7K/btsPk4dQePm/sx4XyIqDKje4t0qCIRcBH0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/8tB7K/btsPk4dQePm/sx4XyIqDKje4t0qCIRcBH0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/8tB7K/btsPk4dQePm/sx4XyIqDKje4t0qCIRcBH0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F8tB7K%2FbtsPk4dQePm%2Fsx4XyIqDKje4t0qCIRcBH0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;200&quot; data-origin-width=&quot;883&quot; data-origin-height=&quot;295&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;대형 언어모델의 성능을 최대화하려면, 모델 크기와 데이터 규모를 균형 있게 늘리며, 충분한 데이터 확보가 중요함을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3. Estimating the optimal parameter/training tokens allocation&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;목표 : 고정된 FLOPs 예산 내에서 모델 크기(N)와 훈련 토큰 수(D)를 어떻게 조합해야 최적의 성능을 내는지를 실험적으로 규명하고자 함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.1 Approach 1: Fix model sizes and vary number of training tokens&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 주제 : 특정 파라미터 수(N)를 가진 모델을 4가지 다른 학습 토큰 수(D)로 훈련시킴&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 주어진 FLOPs 수에 대해 달성 가능한 최소 손실값을 직접 추정할 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 설정 : 학습률은 10배 줄이되, 훈련 길이는 16배 범위로 변화시킴&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 이 FLOPs 예산 안에서 어떤 조합이 최저 손실을 주는가? 를 찾아볼 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 결과 : &lt;span&gt;&lt;span&gt;&lt;span&gt;N_&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;o&lt;/span&gt;&lt;span&gt;pt&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;prop;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;C^(&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0.5)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;D_&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;o&lt;/span&gt;&lt;span&gt;pt&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;prop;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;C^(&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0.5)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;rarr; 컴퓨팅 자원이 늘어나면, 모델 크기와 학습 데이터 양을 같은 비율로 증가시키는 것이 가장 좋다는 결과를 보임&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;(기존 연구에서 말했던 모델크기는 더 크게, 학습 데이터 양은 비교적 조금만 늘리는 것이 좋다고 했던 결과를 기각하는 셈)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;3.2 Approach 2: IsoFLOP profiles &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;실험 주제 : 총 FLOPs를 고정한 후, 다양한 모델 크기(100M ~ 30B 파라미터)를 훈련시켜 최종 손실을 측정함&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;rarr; FLOPs 예산이 고정된 상황에서 어떤 모델 크기가 가장 좋은지 직접적으로 판단할 수 있게 됨&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;실험 설정 : 각 FLOPs 예산에 대해 다양한 모델 크기를 실험하여 손실 곡선을 그리고, 해당 곡선에 포물선을 피팅해 최적 모델 크기와 토큰 수 추정&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 결과 : &lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&lt;span&gt;N_&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;o&lt;/span&gt;&lt;span&gt;pt&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;prop;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&lt;span&gt;C^(&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0.49)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;D_&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;o&lt;/span&gt;&lt;span&gt;pt&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&amp;prop;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&lt;span&gt;C^(&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0.51)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;rarr; 마찬가지로, 모델 크기와 학습 토큰 수를 거의 동일한 비율로 늘리는 것이 최적임을 보임&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;3.3 Approach 3: Fitting a parametric loss function &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;실험 주제 : Approach 1과 2의 실험 결과를 바탕으로, 파라미터 수와 훈련 토큰 수에 따른 손실을 수학적으로 모델링함&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;439&quot; data-origin-height=&quot;138&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bYltAZ/btsPltEpyyq/HSESFrE8rODdk2a4802T50/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bYltAZ/btsPltEpyyq/HSESFrE8rODdk2a4802T50/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bYltAZ/btsPltEpyyq/HSESFrE8rODdk2a4802T50/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbYltAZ%2FbtsPltEpyyq%2FHSESFrE8rODdk2a4802T50%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;94&quot; data-origin-width=&quot;439&quot; data-origin-height=&quot;138&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;- 첫째 항 : 아무리 좋은 모델이라도 언어 자체가 불확실해서 생기는 예측 불가능성의 한계를 나타냄&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;- 둘째 항 : 모델 크기가 작을수록 정확도가 떨어지는, 즉 모델 자체가 부족해서 생기는 손실을 나타냄 &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;- 셋째 항 : 학습에 사용한 데이터가 부족할수록 발생하는, 즉 공부량이 부족해서 생기는 손실을 나타냄&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;파라미터 피팅 방법&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;636&quot; data-origin-height=&quot;94&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/0J1zz/btsPjdwQdIF/hc0YR30NoybOl9TKS13O61/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/0J1zz/btsPjdwQdIF/hc0YR30NoybOl9TKS13O61/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/0J1zz/btsPjdwQdIF/hc0YR30NoybOl9TKS13O61/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F0J1zz%2FbtsPjdwQdIF%2Fhc0YR30NoybOl9TKS13O61%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;59&quot; data-origin-width=&quot;636&quot; data-origin-height=&quot;94&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;목적 : 5개의 파라미터 &lt;span&gt;&lt;span&gt;A,B,E,&amp;alpha;,&amp;beta;&lt;/span&gt;&lt;/span&gt;를 최적화하기 위함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;방법 : 실험에서 얻은 실제 손실값과 위 수식이 예측하는 손실값 사이의 로그 차이를 최소화함 (최적화 알고리즘 L-BFGS 사용)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; Huber 손실을 사용하는 이유 : MSE보다 이상값에 덜 민감하여 안정적인 파라미터 추정이 가능함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.4 Optimal model scaling&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Approach 1,2,3 의 공통적인 결론 : 계산 예산이 증가할수록 모델 크기와 학습 데이터의 양을 비슷한 비율로 함께 증가시켜야 함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 1,2 : 최적 모델 크기에 대해 매우 유사한 예측을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 3 : 계산량이 큰 경우에는 더 작은 모델이 오히려 최적일 수 있음을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;현재 LLMs의 문제 : 계산량 대비 너무 큰 모델 크기를 가지고 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 같은 계산 예산이라면 모델을 줄이는 대신, 토큰 수를 늘렸어야 성능이 더 좋았을 것임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4. Chinchilla&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 목적 : FLOPs 예산을 고정하고 작은 모델 + 더 많은 학습이 기존의 큰 모델보다 더 효율적인지 검증하기 위함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험 설정 : Chinchilla와 Gopher 및 다른 LLMs 모델 비교&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 총 연산량 : Gopher와 동일한 수준으로 맞춤&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 파라미터 수 : 70B&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 학습 토큰 수 : 1.4T&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;4.1 Model and training details&lt;/h4&gt;
&lt;table style=&quot;border-collapse: collapse; width: 76.2794%; height: 270px;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 12.1036%; height: 17px;&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td style=&quot;width: 40.6419%; height: 17px;&quot;&gt;Gopher&lt;/td&gt;
&lt;td style=&quot;width: 47.2544%; height: 17px;&quot;&gt;Chinchilla&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot;&gt;
&lt;td style=&quot;width: 12.1036%; height: 21px;&quot;&gt;데이터셋&lt;/td&gt;
&lt;td style=&quot;width: 40.6419%; height: 21px;&quot;&gt;MassiveText&lt;/td&gt;
&lt;td style=&quot;width: 47.2544%; height: 21px;&quot;&gt;동일한 데이터셋을 사용하나, 토큰 수 증가를 고려해 샘플링 분포를 일부 변경함&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 12.1036%; height: 17px;&quot;&gt;Optimizer&lt;/td&gt;
&lt;td style=&quot;width: 40.6419%; height: 17px;&quot;&gt;Adam&lt;/td&gt;
&lt;td style=&quot;width: 47.2544%; height: 17px;&quot;&gt;AdamW&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 17px;&quot;&gt;
&lt;td style=&quot;width: 12.1036%; height: 17px;&quot;&gt;Tokenizer&lt;/td&gt;
&lt;td style=&quot;width: 40.6419%; height: 17px;&quot;&gt;NFKC normalization을 적용한 SentencePiece 사용&lt;/td&gt;
&lt;td style=&quot;width: 47.2544%; height: 17px;&quot;&gt;NFKC normalization을 적용하지 않는, 조금 수정된 SentencePiece 사용&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot;&gt;
&lt;td style=&quot;width: 12.1036%; height: 21px;&quot;&gt;계산 정밀도&lt;/td&gt;
&lt;td style=&quot;width: 40.6419%; height: 21px;&quot;&gt;bfloat16 사용&lt;/td&gt;
&lt;td style=&quot;width: 47.2544%; height: 21px;&quot;&gt;bfloat16을 사용하지만, optimizer 상태에서는 float32 가중치 저장&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 21px;&quot;&gt;
&lt;td style=&quot;width: 12.1036%; height: 21px;&quot;&gt;프레임워크&lt;/td&gt;
&lt;td style=&quot;width: 40.6419%; height: 21px;&quot;&gt;TPUv3/TPUv4에서 JAX와 Haiku로 학습&lt;/td&gt;
&lt;td style=&quot;width: 47.2544%; height: 21px;&quot;&gt;동일하게 학습&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;779&quot; data-origin-height=&quot;111&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bdPG0a/btsPl9e1TEG/muZE9AN9y3ZPlRvevK33i0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bdPG0a/btsPl9e1TEG/muZE9AN9y3ZPlRvevK33i0/img.png&quot; data-alt=&quot;아키텍처 비교&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bdPG0a/btsPl9e1TEG/muZE9AN9y3ZPlRvevK33i0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbdPG0a%2FbtsPl9e1TEG%2FmuZE9AN9y3ZPlRvevK33i0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;779&quot; height=&quot;111&quot; data-origin-width=&quot;779&quot; data-origin-height=&quot;111&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;아키텍처 비교&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;788&quot; data-origin-height=&quot;194&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/seVeW/btsPknFTNke/gwKlWJUMoIBGhbQ2jPgIt1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/seVeW/btsPknFTNke/gwKlWJUMoIBGhbQ2jPgIt1/img.png&quot; data-alt=&quot;평가 기준&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/seVeW/btsPknFTNke/gwKlWJUMoIBGhbQ2jPgIt1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FseVeW%2FbtsPknFTNke%2FgwKlWJUMoIBGhbQ2jPgIt1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;788&quot; height=&quot;194&quot; data-origin-width=&quot;788&quot; data-origin-height=&quot;194&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;평가 기준&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;4.2 Results&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Language modelling&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;641&quot; data-origin-height=&quot;331&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bG6msH/btsPlMEr53b/qIhz5wX0dKso8PrfyF060K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bG6msH/btsPlMEr53b/qIhz5wX0dKso8PrfyF060K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bG6msH/btsPlMEr53b/qIhz5wX0dKso8PrfyF060K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbG6msH%2FbtsPlMEr53b%2FqIhz5wX0dKso8PrfyF060K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;258&quot; data-origin-width=&quot;641&quot; data-origin-height=&quot;331&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- The Pile 벤치마크 전 서브셋에서 Chinchilla가 Gopher보다 우수한 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- dm_mathematics와 ubuntu_irc에서는 성능 격차 적음 (데이터 누설 가능성)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Wikitext103에서 더 낮은 perplexity (Chinchilla: 7.16 vs Gopher: 7.75) &amp;rarr; 더 정확한 언어 모델임을 입증함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;MMLU&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;410&quot; data-origin-height=&quot;222&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bzgyPo/btsPkvKIhb2/OwnlAhCh0FyDCIofTcwxhk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bzgyPo/btsPkvKIhb2/OwnlAhCh0FyDCIofTcwxhk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bzgyPo/btsPkvKIhb2/OwnlAhCh0FyDCIofTcwxhk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbzgyPo%2FbtsPkvKIhb2%2FOwnlAhCh0FyDCIofTcwxhk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;300&quot; height=&quot;162&quot; data-origin-width=&quot;410&quot; data-origin-height=&quot;222&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;735&quot; data-origin-height=&quot;400&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Hthyv/btsPkodMhyX/3OZcdjN8vtKqBhmxj5RBOK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Hthyv/btsPkodMhyX/3OZcdjN8vtKqBhmxj5RBOK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Hthyv/btsPkodMhyX/3OZcdjN8vtKqBhmxj5RBOK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FHthyv%2FbtsPkodMhyX%2F3OZcdjN8vtKqBhmxj5RBOK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;327&quot; data-origin-width=&quot;735&quot; data-origin-height=&quot;400&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Chinchilla는 평균 정확도 67.6%로 Gopher보다 7.6%p 높으며, 전문가의 2023년 예측 정확도(63.4%)도 능가함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Chinchilla는 4개의 개별 과제에서 유일하게 90% 이상의 정확도를 달성함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 대부분의 과제에서 Chinchilla가 Gopher를 능가했으며, 4개 과제에서는 Gopher가 더 우수했고 2개 과제에서는 동일한 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Reading comprehension&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;608&quot; data-origin-height=&quot;129&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Slnr3/btsPlu5cJ91/k8oh5Lksm5VpeRRy9pK5Y0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Slnr3/btsPlu5cJ91/k8oh5Lksm5VpeRRy9pK5Y0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Slnr3/btsPlu5cJ91/k8oh5Lksm5VpeRRy9pK5Y0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FSlnr3%2FbtsPlu5cJ91%2Fk8oh5Lksm5VpeRRy9pK5Y0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;106&quot; data-origin-width=&quot;608&quot; data-origin-height=&quot;129&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- LAMBADA 데이터셋에서 Chinchilla는 77.4%의 정확도로 Gopher(74.5%)와 MT-NLG 530B(76.6%)보다 더 높은 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- RACE-h 및 RACE-m 벤치마크에서 Chinchilla는 Gopher 대비 정확도 10% 이상 향상되며 압도적인 성능 우위를 나타냄&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; BIG-bench&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;742&quot; data-origin-height=&quot;394&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/FshSN/btsPj74ojc3/FWr68sYa3JnvUUAeOQuls1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/FshSN/btsPj74ojc3/FWr68sYa3JnvUUAeOQuls1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/FshSN/btsPj74ojc3/FWr68sYa3JnvUUAeOQuls1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FFshSN%2FbtsPj74ojc3%2FFWr68sYa3JnvUUAeOQuls1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;550&quot; height=&quot;292&quot; data-origin-width=&quot;742&quot; data-origin-height=&quot;394&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Chinchilla는 BIG-bench에서 평균 65.1% 정확도를 기록하며, Gopher보다 10.7%p 높은 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 전체 62개 중 58개 태스크에서 Chinchilla가 Gopher를 능가함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 단 4개 과제(crash_blossom, dark_humor_detection, mathematical_induction, logical_args)에서 Gopher가 우위를 차지함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Common sense&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;657&quot; data-origin-height=&quot;168&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/MtHvU/btsPlWNHwcG/BGwEuplYyXQvcgemV4J2tk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/MtHvU/btsPlWNHwcG/BGwEuplYyXQvcgemV4J2tk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/MtHvU/btsPlWNHwcG/BGwEuplYyXQvcgemV4J2tk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FMtHvU%2FbtsPlWNHwcG%2FBGwEuplYyXQvcgemV4J2tk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;500&quot; height=&quot;128&quot; data-origin-width=&quot;657&quot; data-origin-height=&quot;168&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Chinchilla는 일반 상식 벤치마크에서 Gopher, GPT-3, MT-NLG 530B보다 전반적으로 우수한 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- TruthfulQA에서는 Chinchilla가 Gopher보다 모든 shot 설정(0, 5, 10-shot)에서 큰 정확도 향상을 보였으며, 특히 0-shot에서 14.1%p 높은 성능을 기록함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 더 나은 사전학습 데이터 모델링만으로도 성능을 크게 향상시킬 수 있음을 시사함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Closed-book question answering&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;754&quot; data-origin-height=&quot;276&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b14jyK/btsPj9A9C8V/GyCseI72sm2XKlM2kmBIhk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b14jyK/btsPj9A9C8V/GyCseI72sm2XKlM2kmBIhk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b14jyK/btsPj9A9C8V/GyCseI72sm2XKlM2kmBIhk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb14jyK%2FbtsPj9A9C8V%2FGyCseI72sm2XKlM2kmBIhk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;220&quot; data-origin-width=&quot;754&quot; data-origin-height=&quot;276&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Chinchilla는 Natural Questions에서 Gopher보다 높은 정확도(5-shot: 31.5%, 64-shot: 35.5%)를 기록함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- TriviaQA에서는 필터링된/비필터링된 셋 모두에서 Gopher를 능가함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 필터링된 셋에서는 오픈북 SOTA보다 7.9% 낮았고, 비필터링된 셋에서는 GPT-3를 초과하는 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 전반적으로 닫힌 책 QA에서도 Chinchilla는 강력한 성능을 입증함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Gender bias and toxicity&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;721&quot; data-origin-height=&quot;154&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bl0MU5/btsPkqJpDZd/CLtBMmJsLbWsiM2itlIOj0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bl0MU5/btsPkqJpDZd/CLtBMmJsLbWsiM2itlIOj0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bl0MU5/btsPkqJpDZd/CLtBMmJsLbWsiM2itlIOj0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbl0MU5%2FbtsPkqJpDZd%2FCLtBMmJsLbWsiM2itlIOj0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;700&quot; height=&quot;150&quot; data-origin-width=&quot;721&quot; data-origin-height=&quot;154&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- Chinchilla는 전반적으로 성별 편향을 덜 반영하지만, 향상 정도는 대명사 유형에 따라 다소 불균형하게 나타남&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 언어 모델의 손실 감소나 성능 향상과 유해성 생성 수준이 직접적인 상관관계가 없음을 시사하며, 더 나은 모델이 반드시 더 유해한 출력을 생성하는 것은 아님을 보여줌&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;5. Discussion &amp;amp; Conclusion&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 대규모 모델을 여러 번 훈련하기 어렵기 때문에 실험 규모는 제한적이었음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 토큰 수가 늘어나면서 log(N)에서 오목한 형태(concavity)가 나타나는 등 성능 향상에 한계가 있을 수 있음.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 본 연구는 오토레그레시브 모델에 초점을 맞췄지만, 다른 모달리티에서도 모델 크기와 데이터 양 간의&amp;nbsp;trade-off가 있을 것으로 기대됨&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- 제안된 방법론은 새로운 환경에서도 쉽게 재현 가능함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/14</guid>
      <comments>https://yuha933.tistory.com/14#entry14comment</comments>
      <pubDate>Wed, 16 Jul 2025 16:29:31 +0900</pubDate>
    </item>
    <item>
      <title>[LoRA] 논문 리뷰</title>
      <link>https://yuha933.tistory.com/13</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;1. Introduction&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;256&quot; data-start=&quot;23&quot;&gt;&lt;b&gt;배경&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;256&quot; data-start=&quot;38&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;126&quot; data-start=&quot;38&quot;&gt;현대 NLP 응용은 하나의 대규모 사전학습(pre-trained) 언어모델을 다양한 다운스트림 과제에 맞춰 fine-tuning하는 방식을 채택합니다.&lt;/li&gt;
&lt;li data-end=&quot;256&quot; data-start=&quot;130&quot;&gt;하지만 fine-tuning은 모델의 모든 파라미터를 업데이트하므로, 과제별로 모델 전체를 저장&amp;middot;배포해야 해 비용 및 관리 측면에서 비효율적이며, GPT-3(175B)와 같은 극대화된 모델에서는 현실적인 장애가 됩니다.&lt;/li&gt;
&lt;li data-end=&quot;256&quot; data-start=&quot;130&quot;&gt;그니까 정리하면,
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;256&quot; data-start=&quot;130&quot;&gt;범용 덱스트로 미리 커다란 모델 학습 -&amp;gt; 각 모델에 맞춰 추가 학습할 때, 원래 학습된 모델의 모든 파라미터를 그 도메인 데이터에 맞춰 파인튜닝 진행&amp;nbsp;&lt;/li&gt;
&lt;li data-end=&quot;256&quot; data-start=&quot;130&quot;&gt;여기서 문제 ! 모델 크기가 수십억 ~ 수백억 파라미터이다보니, 파인튜닝하다 모델 전체를 새로 저장하고 관리하는 데에 문제가 생김&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;529&quot; data-start=&quot;258&quot;&gt;&lt;b&gt;기존 파라미터 효율적 적응 기법의 한계&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;529&quot; data-start=&quot;292&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;369&quot; data-start=&quot;292&quot;&gt;&lt;b&gt;Adapter 삽입&lt;/b&gt;: 과제별 작은 모듈만 학습하지만, 네트워크 깊이를 늘려 추론 시 추가 지연(latency)이 발생합니다.&lt;/li&gt;
&lt;li data-end=&quot;437&quot; data-start=&quot;373&quot;&gt;&lt;b&gt;Prompt/Prefix 튜닝&lt;/b&gt;: 소수의 토큰만 학습하지만, 시퀀스 길이가 줄어들어 표현력이 제한되고,&lt;/li&gt;
&lt;li data-end=&quot;529&quot; data-start=&quot;441&quot;&gt;&lt;b&gt;기타 저파라미터 학습 기법&lt;/b&gt;: full fine-tuning 성능에 근접하지 못하는 경우가 많아 &amp;ldquo;효율&amp;rdquo;과 &amp;ldquo;품질&amp;rdquo; 간 트레이드오프가 존재합니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;800&quot; data-start=&quot;531&quot;&gt;&lt;b&gt;LoRA 제안 동기&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;800&quot; data-start=&quot;554&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;670&quot; data-start=&quot;554&quot;&gt;선행 연구(예: Li et al. 2018; Aghajanyan et al. 2020)는 과대파라미터화된 모델이 실제로는 낮은 차원(intrinsic dimension) 위에서 동작함을 보여주었습니다.&lt;/li&gt;
&lt;li data-end=&quot;800&quot; data-start=&quot;674&quot;&gt;본 논문은 &amp;ldquo;적응(adaptation) 중 모델 가중치 변화분(∆W) 역시 저(低)랭크 구조를 가질 것&amp;rdquo;이라 가정하고, 이를 효율적으로 학습하는 &lt;b&gt;Low-Rank Adaptation (LoRA)&lt;/b&gt; 기법을 제안합니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1311&quot; data-start=&quot;802&quot;&gt;&lt;b&gt;LoRA 핵심 아이디어&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1311&quot; data-start=&quot;827&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1036&quot; data-start=&quot;827&quot;&gt;사전학습된 가중치 &lt;span&gt;&lt;span&gt;W0W_0&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;는 고정(frozen)하고, 모델 업데이트 &lt;span&gt;&lt;span&gt;&amp;Delta;W\Delta W&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;Delta;&lt;/span&gt;&lt;span&gt;W&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;를 두 개의 저랭크 행렬 &lt;span&gt;&lt;span&gt;B&amp;isin;Rd&amp;times;rB \in \mathbb{R}^{d\times r}&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;B&lt;/span&gt;&lt;span&gt;&amp;isin;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;R&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;d&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;span&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, &lt;span&gt;&lt;span&gt;A&amp;isin;Rr&amp;times;kA \in \mathbb{R}^{r\times k}&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;A&lt;/span&gt;&lt;span&gt;&amp;isin;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;R&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;r&lt;/span&gt;&lt;span&gt;&amp;times;&lt;/span&gt;&lt;span&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;의 곱으로 표현:&lt;span&gt;&lt;span&gt;&lt;span&gt;&amp;Delta;W=B&amp;thinsp;A,r≪min⁡(d,k) \Delta W = B\,A,\quad r \ll \min(d,k)&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;Delta;&lt;/span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;B&lt;/span&gt;&lt;span&gt;A&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;r&lt;/span&gt;&lt;span&gt;≪&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;min&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;d&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;k&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;1125&quot; data-start=&quot;1040&quot;&gt;초기화: &lt;span&gt;&lt;span&gt;B=0B=0&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;B&lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;, &lt;span&gt;&lt;span&gt;A&amp;sim;N(0,&amp;sigma;2)A\sim\mathcal{N}(0,\sigma^2)&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;A&lt;/span&gt;&lt;span&gt;&amp;sim;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;N&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;0&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;&amp;sigma;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;로 설정해 학습 초기에 &lt;span&gt;&lt;span&gt;&amp;Delta;W=0\Delta W=0&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;Delta;&lt;/span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;이 되도록 함.&lt;/li&gt;
&lt;li data-end=&quot;1207&quot; data-start=&quot;1129&quot;&gt;순전파 시:&lt;span&gt;&lt;span&gt;&lt;span&gt;h=W0x+&amp;Delta;W&amp;thinsp;x=W0x+B&amp;thinsp;(Ax) h = W_0 x + \Delta W\,x = W_0 x + B\,(A x)&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;h&lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;+&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&amp;Delta;&lt;/span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;=&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;+&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;B&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;A&lt;/span&gt;&lt;span&gt;x&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;1311&quot; data-start=&quot;1211&quot;&gt;과제 전환 시 &lt;span&gt;&lt;span&gt;A,BA,B&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;A&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;B&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;만 교체하면 되어 저장&amp;middot;로드 비용이 극히 작고, 배포 시 &lt;span&gt;&lt;span&gt;W0+BAW_0 + BA&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;+&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;B&lt;/span&gt;&lt;span&gt;A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;로 병합(merge)해 &lt;b&gt;추론 지연 없이&lt;/b&gt; 사용할 수 있습니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1652&quot; data-start=&quot;1313&quot;&gt;&lt;b&gt;LoRA의 주요 장점&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1652&quot; data-start=&quot;1337&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1380&quot; data-start=&quot;1337&quot;&gt;&lt;b&gt;극소수 파라미터 학습&lt;/b&gt;: 전체 파라미터의 0.01%&amp;sim;1%만 최적화&lt;/li&gt;
&lt;li data-end=&quot;1436&quot; data-start=&quot;1384&quot;&gt;&lt;b&gt;저장 공간 절감&lt;/b&gt;: 과제별 모듈(A,B)만 저장 &amp;rarr; 디스크 사용량&amp;middot;전송 비용 감소&lt;/li&gt;
&lt;li data-end=&quot;1496&quot; data-start=&quot;1440&quot;&gt;&lt;b&gt;추론 효율 유지&lt;/b&gt;: 학습된 저랭크 행렬을 사전학습 가중치에 병합하여 추가 지연 없이 사용&lt;/li&gt;
&lt;li data-end=&quot;1579&quot; data-start=&quot;1500&quot;&gt;&lt;b&gt;학습 자원 절감&lt;/b&gt;: adaptive optimizer 사용 시 최적화해야 할 파라미터가 작아 GPU 메모리&amp;middot;연산량 최대 3배 절감&lt;/li&gt;
&lt;li data-end=&quot;1652&quot; data-start=&quot;1583&quot;&gt;&lt;b&gt;기타 기법과의 병행성&lt;/b&gt;: prefix-tuning 등 여타 parameter-efficient 방법과 결합 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1851&quot; data-start=&quot;1654&quot;&gt;&lt;b&gt;용어 및 구성&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1851&quot; data-start=&quot;1674&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1715&quot; data-start=&quot;1674&quot;&gt;Transformer 층 차원 &lt;span&gt;&lt;span&gt;dmodeld_{\text{model}}&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;d&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;model&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;1758&quot; data-start=&quot;1719&quot;&gt;자기어텐션 투영 행렬: &lt;span&gt;&lt;span&gt;Wq,Wk,Wv,WoW_q, W_k, W_v, W_o&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;q&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;v&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;&lt;span&gt;W&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;o&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;1795&quot; data-start=&quot;1762&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;Delta;W\Delta W&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;&amp;Delta;&lt;/span&gt;&lt;span&gt;W&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;: 적응 중 누적된 가중치 변화&lt;/li&gt;
&lt;li data-end=&quot;1830&quot; data-start=&quot;1799&quot;&gt;rank &lt;span&gt;&lt;span&gt;rr&lt;/span&gt;&lt;span aria-hidden=&quot;true&quot;&gt;&lt;span&gt;&lt;span&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;: LoRA 모듈의 저랭크 차원&lt;/li&gt;
&lt;li data-end=&quot;1851&quot; data-start=&quot;1834&quot;&gt;옵티마이저: Adam 등&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;2. Problem Statement&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1148&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/n8h1W/dJMcahXZ0Za/8xFfc4VTwKNpne4OpAjpt0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/n8h1W/dJMcahXZ0Za/8xFfc4VTwKNpne4OpAjpt0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/n8h1W/dJMcahXZ0Za/8xFfc4VTwKNpne4OpAjpt0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fn8h1W%2FdJMcahXZ0Za%2F8xFfc4VTwKNpne4OpAjpt0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;1148&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1148&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;3. Aren't Existing Solutions Good Enough?&lt;b&gt;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1165&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/l46ru/dJMcabcoupm/0h44pbi3YJkOcEpM2Wnvpk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/l46ru/dJMcabcoupm/0h44pbi3YJkOcEpM2Wnvpk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/l46ru/dJMcabcoupm/0h44pbi3YJkOcEpM2Wnvpk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fl46ru%2FdJMcabcoupm%2F0h44pbi3YJkOcEpM2Wnvpk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;1165&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1165&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1160&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bZyiUD/dJMcacPVKWj/rOqEeKxOfaLuQZC4EAQiz1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bZyiUD/dJMcacPVKWj/rOqEeKxOfaLuQZC4EAQiz1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bZyiUD/dJMcacPVKWj/rOqEeKxOfaLuQZC4EAQiz1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbZyiUD%2FdJMcacPVKWj%2FrOqEeKxOfaLuQZC4EAQiz1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;1160&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1160&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;4. Our Method&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;Low-Rank-Parameterized Update Matrices&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1215&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/D0SSp/dJMcafFTn8F/hR1rj3LJjZocucTgIifIS0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/D0SSp/dJMcafFTn8F/hR1rj3LJjZocucTgIifIS0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/D0SSp/dJMcafFTn8F/hR1rj3LJjZocucTgIifIS0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FD0SSp%2FdJMcafFTn8F%2FhR1rj3LJjZocucTgIifIS0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;1215&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1215&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;Applying LoRA To Transformer&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1172&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/buQGLg/dJMcahDGapA/IFjXb0IywMM2BguxrZvCu1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/buQGLg/dJMcahDGapA/IFjXb0IywMM2BguxrZvCu1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/buQGLg/dJMcahDGapA/IFjXb0IywMM2BguxrZvCu1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbuQGLg%2FdJMcahDGapA%2FIFjXb0IywMM2BguxrZvCu1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;1172&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1172&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;5. Empirical Experiments&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;Experimental Setup&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1192&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Txd4M/dJMcajhaBqY/cDDr6cxDJ2ugXbPEDUsqK1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Txd4M/dJMcajhaBqY/cDDr6cxDJ2ugXbPEDUsqK1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Txd4M/dJMcajhaBqY/cDDr6cxDJ2ugXbPEDUsqK1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FTxd4M%2FdJMcajhaBqY%2FcDDr6cxDJ2ugXbPEDUsqK1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;1192&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1192&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;RoBERTa &amp;amp; DeBERTa-XXL&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;963&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/scRLB/dJMcagrfAPV/8xn4XKkkczZFktyXSqOx3k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/scRLB/dJMcagrfAPV/8xn4XKkkczZFktyXSqOx3k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/scRLB/dJMcagrfAPV/8xn4XKkkczZFktyXSqOx3k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FscRLB%2FdJMcagrfAPV%2F8xn4XKkkczZFktyXSqOx3k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;963&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;963&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;GPT-2&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;866&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cCahBb/dJMcaaYRNpA/bmRGIvcNkrHHcyxl7uVa7K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cCahBb/dJMcaaYRNpA/bmRGIvcNkrHHcyxl7uVa7K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cCahBb/dJMcaaYRNpA/bmRGIvcNkrHHcyxl7uVa7K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcCahBb%2FdJMcaaYRNpA%2FbmRGIvcNkrHHcyxl7uVa7K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;866&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;866&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;GPT-3 175GB&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;762&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cz1tZi/dJMcabcouDQ/kOqKe3vJDYlHzDjGKPrBaK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cz1tZi/dJMcabcouDQ/kOqKe3vJDYlHzDjGKPrBaK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cz1tZi/dJMcabcouDQ/kOqKe3vJDYlHzDjGKPrBaK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcz1tZi%2FdJMcabcouDQ%2FkOqKe3vJDYlHzDjGKPrBaK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;762&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;762&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;Scalability and Task-Performance&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1089&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nSCxc/dJMcagx0ltr/z0vIhS2c2xlW9ffrM0KRj0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nSCxc/dJMcagx0ltr/z0vIhS2c2xlW9ffrM0KRj0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nSCxc/dJMcagx0ltr/z0vIhS2c2xlW9ffrM0KRj0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnSCxc%2FdJMcagx0ltr%2Fz0vIhS2c2xlW9ffrM0KRj0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;1089&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1089&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;6. Understanding The Low-Rank Updates&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2878&quot; data-origin-height=&quot;1148&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/YRajL/dJMcaivN5ax/lMvyKGHiAn90Wg77Uj8cm1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/YRajL/dJMcaivN5ax/lMvyKGHiAn90Wg77Uj8cm1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/YRajL/dJMcaivN5ax/lMvyKGHiAn90Wg77Uj8cm1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FYRajL%2FdJMcaivN5ax%2FlMvyKGHiAn90Wg77Uj8cm1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2878&quot; height=&quot;1148&quot; data-origin-width=&quot;2878&quot; data-origin-height=&quot;1148&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;903&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xKwpB/dJMcab4xVRt/ZSrJImb2drDoEZsjE3ZR0K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xKwpB/dJMcab4xVRt/ZSrJImb2drDoEZsjE3ZR0K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xKwpB/dJMcab4xVRt/ZSrJImb2drDoEZsjE3ZR0K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxKwpB%2FdJMcab4xVRt%2FZSrJImb2drDoEZsjE3ZR0K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;903&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;903&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;677&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cDj1Ir/dJMcab4xVRQ/tB9D5ve2IhEhQCB4V1hUe0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cDj1Ir/dJMcab4xVRQ/tB9D5ve2IhEhQCB4V1hUe0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cDj1Ir/dJMcab4xVRQ/tB9D5ve2IhEhQCB4V1hUe0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcDj1Ir%2FdJMcab4xVRQ%2FtB9D5ve2IhEhQCB4V1hUe0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;677&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;677&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1158&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cCKItw/dJMcagktxMq/wazpde4dBVgqk3nYIxGUx0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cCKItw/dJMcagktxMq/wazpde4dBVgqk3nYIxGUx0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cCKItw/dJMcagktxMq/wazpde4dBVgqk3nYIxGUx0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcCKItw%2FdJMcagktxMq%2Fwazpde4dBVgqk3nYIxGUx0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;1158&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;1158&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;939&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bT4x9t/dJMcabXM3kr/CrLuwkPUTRZrfafTjSo9tK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bT4x9t/dJMcabXM3kr/CrLuwkPUTRZrfafTjSo9tK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bT4x9t/dJMcabXM3kr/CrLuwkPUTRZrfafTjSo9tK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbT4x9t%2FdJMcabXM3kr%2FCrLuwkPUTRZrfafTjSo9tK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;939&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;939&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h2 data-ke-size=&quot;size26&quot;&gt;7. Conclusion And Future Work&lt;/h2&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;672&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/d1d2WS/dJMcaiiiDhM/H1jubcF5QZ33eE3IIlBNJk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/d1d2WS/dJMcaiiiDhM/H1jubcF5QZ33eE3IIlBNJk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/d1d2WS/dJMcaiiiDhM/H1jubcF5QZ33eE3IIlBNJk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fd1d2WS%2FdJMcaiiiDhM%2FH1jubcF5QZ33eE3IIlBNJk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2879&quot; height=&quot;672&quot; data-origin-width=&quot;2879&quot; data-origin-height=&quot;672&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/13</guid>
      <comments>https://yuha933.tistory.com/13#entry13comment</comments>
      <pubDate>Tue, 8 Jul 2025 16:23:16 +0900</pubDate>
    </item>
    <item>
      <title>[RAG] 논문 리뷰</title>
      <link>https://yuha933.tistory.com/12</link>
      <description>&lt;h3 data-ke-size=&quot;size23&quot;&gt;1. Introduction&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt;&lt;b&gt;RAG의 등장 배경&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존 모델 (ex. GPT, BERT, T5, BART 등)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;사전학습된 신경 언어 모델&lt;/b&gt; ( 외부 문서나 지식베이스를 실시간으로 검색하지 않아도,&lt;br /&gt;학습된 파라미터 속에 간접적으로 저장된 지식을 기반으로 답변을 생성하는 모델 )&lt;/li&gt;
&lt;li&gt;문제
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;쉽게 &lt;b&gt;확장&lt;/b&gt;하거나 &lt;b&gt;수정&lt;/b&gt;할 수 없음&lt;/li&gt;
&lt;li&gt;예측 결과에 대한 명확한 &lt;b&gt;근거&lt;/b&gt; 제공 어려움&lt;/li&gt;
&lt;li&gt;&lt;b&gt;사실이 아닌 내용&lt;/b&gt;을 생성할 수 있음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;부분적 해결책 (ex. REALM, ORQA)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;파라메트릭 메모리&lt;/b&gt;(모델의 파라미터 안에 저장된 지식)와 논파라메트릭 메모리 (외부 문서를 검색해서 활용하는 지식)를 결합한 하이브리드 모델&lt;/li&gt;
&lt;li&gt;대표 모델 : REALM, ORQA ( 마스킹 언어 모델 : 파라메트릭 모델 + 미분 가능한 검색기 : 논파라메트릭 모델)&lt;/li&gt;
&lt;li&gt;특징
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;지식 &lt;b&gt;수정, 확장 가능&lt;/b&gt; (외부 문서를 바꾸면 모델을 다시 학습하지 않아도 지식을 업데이트할 수 있음)&lt;/li&gt;
&lt;li&gt;검색된 지식 &lt;b&gt;점검, 해석 가능&lt;/b&gt; (답변에 사용된 문서를 즉, 출처를 보여줄 수 있음 -&amp;gt; 사람이 확인 가능)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;문제
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;개방형 도메인에서의 추출 기반 질문 응답 extractive QA(정답이 되는 문장이나 구절을 &lt;b&gt;그대로 &quot;찾아서&quot; 보여주는 방식&lt;/b&gt;)에만 적용 -&amp;gt; 언어모델이 가진 문장 생성, 추론 능력을 거의 쓰지 못했으며, 다양한 자연어 처리 과제에 활용할 수 없음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;해결책 (RAG)&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;RAG&lt;/b&gt; : 하이브리드 구조를 NLP의 핵심 모델인 Seq2seq구조에 도입&lt;/li&gt;
&lt;li&gt;문제 해결
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존 Seq2seq 모델의 문제 : 지식이 파라미터에 고정되어 있음 (수정, 출처 제공, 최신 정보 반영 불가)&lt;/li&gt;
&lt;li&gt;기존 하이브리드 모델의 문제 : 언어 모델의 문장 생성, 추론 능력 활용 부족&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;RAG는 두 방식의 &lt;b&gt;단점을 서로 보완&lt;/b&gt;하는 구조가 됨&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt;&lt;b&gt;RAG의 구조 및 작동 원리&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1184&quot; data-origin-height=&quot;508&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/WYdJx/btsOXkPjz8X/APFKBiHNbgWWVniQqfn370/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/WYdJx/btsOXkPjz8X/APFKBiHNbgWWVniQqfn370/img.png&quot; data-alt=&quot;RAG 구조&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/WYdJx/btsOXkPjz8X/APFKBiHNbgWWVniQqfn370/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FWYdJx%2FbtsOXkPjz8X%2FAPFKBiHNbgWWVniQqfn370%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;257&quot; data-origin-width=&quot;1184&quot; data-origin-height=&quot;508&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;RAG 구조&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;RAG&lt;/b&gt; : 사전학습된 &lt;span style=&quot;background-color: #ffc1c8;&quot;&gt;&lt;b&gt;생성 모델에 검색 기능을 결합&lt;/b&gt;&lt;/span&gt;하고, 검색과 생성을 함께 파인튜닝하여, 외부 문서를 실시간으로 참고해 더 정확하고 근거 있는 답변을 생성할 수 있게 만든 모델&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;구성 요소
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;파라메트릭 모델 : 사전학습된 seq2seq 트랜스포머&lt;/li&gt;
&lt;li&gt;논파라메트릭 메모리 : 위키피디아 문서의 밀집 벡터 인덱스&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;작동 원리
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;사전학습된 신경망 기반 검색기(DPR)를 통해 문서 접근 (논파라메트릭 메모리) : 벡터 유사도 기반으로 검색&lt;/li&gt;
&lt;li&gt;검색된 문서들은 잠재 변수로 취급 : 여러 문서를 확률적 후보군으로 간주함&lt;/li&gt;
&lt;li&gt;확률 기반 모델로 통합 (검색기와 생성기를 확률적으로 결합한 구조)&lt;/li&gt;
&lt;li&gt;top-K 근사 + 생성
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;(1) RAG-Sequence : 전체 답변이 &lt;b&gt;하나의 문서&lt;/b&gt;를 기반으로 생성됨&lt;/li&gt;
&lt;li&gt;(2) RAG-Token : &lt;b&gt;각 단어 토큰마다 다른 문서&lt;/b&gt;를 참고하여 생성 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;End-to-end 학습 : 검색기와 생성기는 end-to-end로 함께 학습 및 파인튜닝 가능&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;RAG는 모든 Seq2seq 작업에 파인튜닝 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;lt;기존 모델들과 비교&amp;gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;기존 모델들 : 외부 메모리, 즉 여러 문서 중에서 정답에 유효한 문서를 구별해서 선택하고 활용하는 작업을 위해서는 &quot;어떤 문서가 중요한지&quot;를 판단해야 함. 다시 말해서, &lt;b&gt;외부 메모리를 구별적으로 접근할 수 있는 모델&lt;/b&gt; 구조가 필요한데, 기존 모델인 BERT나 GPT 등의 모델들은 이를 &lt;b&gt;지원하지 않아&lt;/b&gt; 구조를 처음부터 새로 설계할 수밖에 없었음&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;작동 원리
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;검색기 구조 만들기&lt;/li&gt;
&lt;li&gt;그걸 학습시키기 (문서 유사도 학습)&lt;/li&gt;
&lt;li&gt;생성기 구조 만들기&lt;/li&gt;
&lt;li&gt;그걸 학습시키기 (문장 생성, 요약 등)&lt;/li&gt;
&lt;li&gt;검색기와 생성기를 연결하고 또 학습&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;태스크가 바뀔 때마다 구조도 처음부터 다시 만들어야 함&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;&lt;span style=&quot;background-color: #dddddd;&quot;&gt;RAG의 구조&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RAG는 이미 사전학습된 모델들을 활용해서, 새로 구조를 만들거나 처음부터 학습할 필요 없이 지금 있는 것들을 조합해서 곧바로 지식을 활용할 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;구조&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;검색기 (DPR)&lt;/b&gt; : 이미 위키피디아 기반으로 학습된 모델&lt;/li&gt;
&lt;li&gt;&lt;b&gt;생성기 (BART)&lt;/b&gt; : 이미 방대한 텍스트로 사전학습된 모델&lt;/li&gt;
&lt;li&gt;검색기와 생성기를 수학적으로 연결 (P(y|x) = &amp;Sigma;z P(y|x,z)P(z|x)) + 필요하면 약간의 파인튜닝&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;2. Method&lt;/h3&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2.1 Models&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; RAG-Sequence Model&lt;/b&gt; : 질문 입력 &lt;span&gt;&lt;span&gt;x&lt;/span&gt;&lt;/span&gt;에 대해 관련된 답변 &lt;span&gt;&lt;span&gt;y&lt;/span&gt;&lt;/span&gt;를 하나의 문서를 바탕으로 전체 문장을 생성&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;검색기(DPR)를 통해 &lt;b&gt;top-K 문서 검색&lt;/b&gt; (cosine similarity로 근접한 top-K 문서 선택)&lt;/li&gt;
&lt;li&gt;각각의 문서는 하나의 잠재 변수로 간주&lt;/li&gt;
&lt;li&gt;생성기(BART)를 통해 각 문서에 대한 출력 시퀀스 확률 계산 : p(y|x, z_k)&lt;/li&gt;
&lt;li&gt;각 문서가 관련 있을 확률 p(z_k|x)을 계산해서 두 확률을 곱하고, 모든 문서에 대해 더함 (&lt;b&gt;주변화&lt;/b&gt;)&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;*주변화 : 중간 매개인 z(문서)중 어떤 z가 맞는지 모르니까, z의 모든 후보에 대해 더해서 평균을 내는 것&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1226&quot; data-origin-height=&quot;145&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/blNWdj/btsO0H4CU6M/xC2mD8kp2r5AzZNKCqHKo1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/blNWdj/btsO0H4CU6M/xC2mD8kp2r5AzZNKCqHKo1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/blNWdj/btsO0H4CU6M/xC2mD8kp2r5AzZNKCqHKo1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FblNWdj%2FbtsO0H4CU6M%2FxC2mD8kp2r5AzZNKCqHKo1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;71&quot; data-origin-width=&quot;1226&quot; data-origin-height=&quot;145&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;RAG-Token Model&lt;/b&gt; : 질문 입력 x에 대해 관련된 답변 y를 각 토큰마다 다른 문서를 바탕으로 생성&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;RAG-Sequence Model 과정과 같게 흘러가되, 위의 과정을 &lt;b&gt;토큰마다 반복&lt;/b&gt;함&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;678&quot; data-origin-height=&quot;98&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/AKyFF/btsOZSTpzGG/TsKTqLv8S4hR10xLN2gSiK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/AKyFF/btsOZSTpzGG/TsKTqLv8S4hR10xLN2gSiK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/AKyFF/btsOZSTpzGG/TsKTqLv8S4hR10xLN2gSiK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FAKyFF%2FbtsOZSTpzGG%2FTsKTqLv8S4hR10xLN2gSiK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;58&quot; data-origin-width=&quot;678&quot; data-origin-height=&quot;98&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2.2 Retriever : DPR&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;DPR&lt;/b&gt; : Bi-Encoder 구조&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;d(z) = BERT_d(z) : 문서 인코더로 생성된 &lt;b&gt;문서의 밀집 벡터 표현&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;q(x) = BERT_q(x) : 쿼리 인코더로 생성된 &lt;b&gt;쿼리의 표현&lt;br /&gt;&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;검색 확률 : x에 대해 문서 z가 &lt;b&gt;정답 생성에 도움이 될 가능성&lt;/b&gt;을 확률적으로 나타낸 것&lt;br /&gt;-&amp;gt; d(z)와 q(x)의 내적의 지수화한 값에 비례&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;top-K 문서 검색 방식 : 최대 내적 검색 (MIPS)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;MIPS : 각 문서 z를 벡터로 표현한 후 쿼리 x 벡터와 내적 값을 계산해서 가장 큰 값을 가지는 top-k 문서를 찾는 작업&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2.3 Generator : BART&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;생성기 모델 : 4억 개의 파라미터를 가진 사전학습된 Seq2seq Transformer인 BART-large&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;입력값
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;x와 z의 단순 연결 값&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;노이즈에 강건하도록 사전학습됨&lt;/li&gt;
&lt;li&gt;성능
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;다양한 생성 작업에서 SOTA 성능 달성&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;*파라메트릭 메모리 : BART 생성기의 파라미터 &amp;theta;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2.4 Training&lt;/h4&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;학습 방향 : 정답에 대한 negative marginal log-likelihood 최소화&lt;/li&gt;
&lt;li&gt;최적화 : Adam optimizer를 사용한 SGD&lt;/li&gt;
&lt;li&gt;파인튜닝 : 문서 인코더와 색인은 고정하고, 쿼리 인코더와 BART만 파인튜닝&lt;br /&gt;(∵ 문서 색인을 갱신하는 작업은 비용이 크고, 반드시 필요하다고 판단되지는 않음)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;2.5 Decoding&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;RAG-Token&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;- &lt;b&gt;Beam Search&lt;/b&gt; : 토큰을 하나씩 생성하면서, 가능성 높은 후보들을 여러 개 유지하며 문장을 만들어가는 디코딩 방식&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; 각 토큰마다 확률을 알면 쓸 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; RAG-Token은 각 토큰에 대해 평균을 낸 확률을 구할 수 있고, 이로 인해 Beam Search에 잘 맞는 구조 !!&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;br /&gt;&lt;b&gt;RAG-Sequence&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RAG-Sequence는 토큰별 확률이 아니라, &lt;b&gt;전체 문장에 대한 확률만 존재&lt;/b&gt;함&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문제 : Beam Search는 단어 하나씩 생성하면서 확률을 누적해야 하는데, RAG-Sequence는 문장 전체가 있어야만 확률을 계산할 수 있음 !!!!
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;방법 1. &lt;b&gt;Thorough Decoding&amp;nbsp;&lt;/b&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;각 문서 z에 대해 별도로 Beam Search 수행해 y에 대한 후보 문장 세트 생성&lt;/li&gt;
&lt;li&gt;모든 문서 z에 대해 나온 문장 후보들에 대해 각각의 확률을 z별로 곱하고 평균냄 (주변화)&lt;/li&gt;
&lt;li&gt;가장 확률 높은 문장 y 선택&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;방법 2. &lt;b&gt;Fast Decoding&lt;/b&gt; (방법 1의 문제 : 정밀하지만 너무 느림)
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;Beam Search 중 생성되지 않은 y에 대해 확률을 0으로 가정하고 무시&lt;/li&gt;
&lt;li&gt;Beam Search로 생성된 y만 가지고 확률 평균냄&lt;/li&gt;
&lt;li&gt;가장 확률 높은 문장 y 선택&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;3. Experiments&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문서 데이터 : 2018년 12월 기준 위키피디아 덤프 사용 (각 문서를 100단어 단위로 분할, 총 약 2,100만 개 문서 생성)&lt;/li&gt;
&lt;li&gt;문서 색인화 : FAISS와 HNSW를 사용하여 색인화&lt;/li&gt;
&lt;li&gt;문서 검색 방식 : Top-K (학습 : 5 or 10 / 평가 : 적절한 k 설정) 개의 관련 문서를 검색하여 사용&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.1 Open-domain Question Answering&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Open-domain Question Answering&lt;/b&gt; : 모든 주제에 대해 외부 문서를 참고하거나 생성 모델을 통해 자연어로 답을 만들어내는 과제&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;학습 방향 : negative log-likelihood 최소화&lt;/li&gt;
&lt;li&gt;비교 대상&amp;nbsp;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;추출형 QA : 문서에서 정답이 포함된 구간을 추출하는 방식 (검색 O, 생성 X)&lt;/li&gt;
&lt;li&gt;폐쇄형 QA : 검색을 사용하지 않고, RAG처럼 답변을 생성하는 방식 (검색 X, 생성 O)&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;사용 데이터셋
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Natural Questions (NQ)&lt;/li&gt;
&lt;li&gt;TriviaQA (TQA) : TQA-Wiki 테스트 세트도 평가에 포함 (∵ T5와 공정한 비교를 위해)&lt;/li&gt;
&lt;li&gt;WebQuestions (WQ) : NQ로 학습된 RAG 모델로 초기화 (∵ 소규모이기 때문)&lt;/li&gt;
&lt;li&gt;CuratedTrec (CT) &amp;nbsp;:&amp;nbsp;NQ로 학습된 RAG 모델로 초기화&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;데이터 분할 : 기존 연구와 동일하게 train/dev/test split 사용&lt;/li&gt;
&lt;li&gt;성능 지표 : Exact Match (EM) 사용&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.2 Abstractive Question Answering&lt;/h4&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Abstractive Question Answering&lt;/b&gt; : 질문에 대해 자유형식의 문장 생성으로 답변하는 과제&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;비교 대상
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;MSMARCO 기반 모델들
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;일부 질문은 gold 문단 없이는 정답 생성이 어려움&lt;/li&gt;
&lt;li&gt;일부 질문은 위키피디아만으로는 답변이 불가능함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;RAG는 파라메트릭 메모리에 의존하여 답변 생성함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;사용 데이터셋
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;MSMARCO NLG v2.1
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;질문&lt;/li&gt;
&lt;li&gt;각 질문에 대해 검색 엔진에서 검색된 10개의 gold 문단&lt;/li&gt;
&lt;li&gt;해당 문단에서 주석된 완전한 문장 형태의 정답&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;실험
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;gold 문단은 사용하지 않고, 오직 질문과 정답만 활용&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.3 Jeopardy Question Generation&lt;/h4&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Jeopardy Question Generation&lt;/b&gt; : 정답이 주어지고, 해당 정답에 대한 사실 기반 문장으로 질문을 생성하는 과제&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목적 : RAG의 자연어 생성 능력 평가&lt;/li&gt;
&lt;li&gt;사용 데이터셋
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;SearchQA (Train : 100K, Dev : 14K, Test : 27K)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;비교 대상
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;BART 모델&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;평가 방법
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;자동 평가 : Q-BLEU-1
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;엔티티(중요 정보)가 얼마나 잘 포함했는지에 더 큰 가중치 부여&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;사람 평가
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;사실성 : 신뢰할 수 있는 외부 자료로 뒷받침할 수 있는지&lt;/li&gt;
&lt;li&gt;구체성 : 입력과 출력 사이의 상호 의존도가 높은지&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;평가 방식 : 쌍 비교
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;BART에서 생성한 질문 하나, RAG에서 생성한 질문 하나 제시&lt;br /&gt;-&amp;gt; 질문 A가 낫다 / 질문 B가 낫다 / 둘 다 좋다 / 둘 다 좋지 않다 中 1 선택&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;3.4 Fact Verification&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;FEVER&lt;/b&gt; : 자연어로 된 주장이 위키피디아에 의해 지지되는지, 반박되는지, 아니면 판단하기에 정보가 부족한지를 분류하는 과제이자, 위키피디아 문서를 바탕으로 주장이 사실인지, 거짓인지 혹은 위키피디아만으로는 판단 불가인지를 추론하는 과제&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;목적 : RAG의 분류 능력 평가&lt;/li&gt;
&lt;li&gt;실험
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;3분류 과제 : 지지 / 반박 / 정보 부족&lt;/li&gt;
&lt;li&gt;2분류 과제 : 지지 / 반박&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;RAG는 기존 모델들과 달리, 검색된 문서에 대한 감독 없이 학습함&lt;br /&gt;-&amp;gt; 더 일반적인 적용 가능성 !!&lt;/li&gt;
&lt;li&gt;성능 지표 : Label Accuracy&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;4. Results&lt;/h3&gt;
&lt;h4 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size20&quot;&gt;4.1 Open-domain Question Answering&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1340&quot; data-origin-height=&quot;759&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/5Owpe/btsO0EGQaio/E1XKErfMcY0lPZyGeNOk71/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/5Owpe/btsO0EGQaio/E1XKErfMcY0lPZyGeNOk71/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/5Owpe/btsO0EGQaio/E1XKErfMcY0lPZyGeNOk71/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F5Owpe%2FbtsO0EGQaio%2FE1XKErfMcY0lPZyGeNOk71%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;340&quot; data-origin-width=&quot;1340&quot; data-origin-height=&quot;759&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;문서를 재정렬하거나 특정 문장을 추출하지 않아도 SoTA&lt;/li&gt;
&lt;li&gt;RAG는 고비용 사전학습 없이도 우수한 성능&lt;/li&gt;
&lt;li&gt;정답이 없어도 추론 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;4.2 Abstractive Question Answering&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1043&quot; data-origin-height=&quot;403&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/JMnfV/btsOZF0Qfhj/xxdtDHVM0ipLRPe4SxeINk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/JMnfV/btsOZF0Qfhj/xxdtDHVM0ipLRPe4SxeINk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/JMnfV/btsOZF0Qfhj/xxdtDHVM0ipLRPe4SxeINk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FJMnfV%2FbtsOZF0Qfhj%2FxxdtDHVM0ipLRPe4SxeINk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;232&quot; data-origin-width=&quot;1043&quot; data-origin-height=&quot;403&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;주목할 만한 조건&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;기존의 다른 모델들은 gold 문단에 접근할 수 있는 상태에서 학습 및 생성 수행함&lt;/li&gt;
&lt;li&gt;많은 질문들이 gold 문단이 주어지지 않으면 정답 생성이 사실상 불가능함&lt;/li&gt;
&lt;li&gt;일부 질문들은 Wikipedia만으로는 절대 정답을 유도할 수 없음&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;-&amp;gt; 그럼에도 불구하고 RAG가 높은 성능을 보임&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1060&quot; data-origin-height=&quot;463&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/qRHBb/btsO0Mkkr8G/NRsijL2uxf3nphJJtoqk5k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/qRHBb/btsO0Mkkr8G/NRsijL2uxf3nphJJtoqk5k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/qRHBb/btsO0Mkkr8G/NRsijL2uxf3nphJJtoqk5k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FqRHBb%2FbtsO0Mkkr8G%2FNRsijL2uxf3nphJJtoqk5k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;463&quot; data-origin-width=&quot;1060&quot; data-origin-height=&quot;463&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;RAG가 BART보다 사실적으로 정확한 텍스트를 더 자주 생성함&lt;/li&gt;
&lt;li&gt;RAG가 BART보다 더 다양한 문장을 생성함&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;4.3 Jeopardy Question Generation&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;966&quot; data-origin-height=&quot;432&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/XaGK4/btsO0eaBEwa/X8sm9tn2tVvKT1NuMFkOn0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/XaGK4/btsO0eaBEwa/X8sm9tn2tVvKT1NuMFkOn0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/XaGK4/btsO0eaBEwa/X8sm9tn2tVvKT1NuMFkOn0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FXaGK4%2FbtsO0eaBEwa%2FX8sm9tn2tVvKT1NuMFkOn0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;268&quot; data-origin-width=&quot;966&quot; data-origin-height=&quot;432&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;564&quot; data-origin-height=&quot;356&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cbtwgk/btsO1JHsWGM/xxTy4fCIrH4QKHw7FC4Nxk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cbtwgk/btsO1JHsWGM/xxTy4fCIrH4QKHw7FC4Nxk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cbtwgk/btsO1JHsWGM/xxTy4fCIrH4QKHw7FC4Nxk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcbtwgk%2FbtsO1JHsWGM%2FxxTy4fCIrH4QKHw7FC4Nxk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;350&quot; height=&quot;221&quot; data-origin-width=&quot;564&quot; data-origin-height=&quot;356&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;974&quot; data-origin-height=&quot;497&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/vKxfu/btsOZ9UT4LU/cMTcVFB3Pn42TqpzJcmoF1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/vKxfu/btsOZ9UT4LU/cMTcVFB3Pn42TqpzJcmoF1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/vKxfu/btsOZ9UT4LU/cMTcVFB3Pn42TqpzJcmoF1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FvKxfu%2FbtsOZ9UT4LU%2FcMTcVFB3Pn42TqpzJcmoF1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;306&quot; data-origin-width=&quot;974&quot; data-origin-height=&quot;497&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1028&quot; data-origin-height=&quot;603&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b58lLk/btsOZkCUzCO/aQVDf45MX0kLBLL5lbOeak/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b58lLk/btsOZkCUzCO/aQVDf45MX0kLBLL5lbOeak/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b58lLk/btsOZkCUzCO/aQVDf45MX0kLBLL5lbOeak/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb58lLk%2FbtsOZkCUzCO%2FaQVDf45MX0kLBLL5lbOeak%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;352&quot; data-origin-width=&quot;1028&quot; data-origin-height=&quot;603&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;But, BART는 특정 문서에 의존하지 않고 내재적인 지식만으로 내용을 완성할 수 없음 !!&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ex. &lt;span style=&quot;background-color: #dddddd;&quot;&gt;&quot;The Sun Also Rises&quot; is a novel by this author of &quot;The Sun Also Rises&quot;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;color: #333333; text-align: start;&quot;&gt;ex.&lt;/span&gt; &lt;span style=&quot;background-color: #dddddd;&quot;&gt;&quot;The Sun Also Rises&quot; is a novel by this author of &quot;A Farewell to Arms&quot;&lt;/span&gt;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;4.4 Fact Verification&lt;/h4&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1070&quot; data-origin-height=&quot;845&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cyHzLO/btsOZgNZp4g/RwKB9cKqPBcysifFR1LW0k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cyHzLO/btsOZgNZp4g/RwKB9cKqPBcysifFR1LW0k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cyHzLO/btsOZgNZp4g/RwKB9cKqPBcysifFR1LW0k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcyHzLO%2FbtsOZgNZp4g%2FRwKB9cKqPBcysifFR1LW0k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;474&quot; data-origin-width=&quot;1070&quot; data-origin-height=&quot;845&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;가장 먼저 검색된 문서가 정답 문서일 확률 : 71%&lt;br /&gt;-&amp;gt; RAG는 대부분 정답 문서를 1순위로 정확히 찾아냄&lt;/li&gt;
&lt;li&gt;상위 10개 검색 결과 내 정답 문서 포함 비율 : 90%&lt;br /&gt;-&amp;gt; 거의 항상 정답 문서가 상위 검색 결과에 포함됨&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style6&quot; /&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;4.5 Additional Results&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Generation Diversity&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;757&quot; data-origin-height=&quot;547&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ppKX4/btsO0FFKPBY/vdkTwkCZxHaHq2JG0IGfrk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ppKX4/btsO0FFKPBY/vdkTwkCZxHaHq2JG0IGfrk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ppKX4/btsO0FFKPBY/vdkTwkCZxHaHq2JG0IGfrk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FppKX4%2FbtsO0FFKPBY%2FvdkTwkCZxHaHq2JG0IGfrk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;400&quot; height=&quot;289&quot; data-origin-width=&quot;757&quot; data-origin-height=&quot;547&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Retrieval Ablations&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1059&quot; data-origin-height=&quot;916&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/zulH2/btsOZBxzIfS/IaI9vfxmyfh3afoAoGhfi0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/zulH2/btsOZBxzIfS/IaI9vfxmyfh3afoAoGhfi0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/zulH2/btsOZBxzIfS/IaI9vfxmyfh3afoAoGhfi0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FzulH2%2FbtsOZBxzIfS%2FIaI9vfxmyfh3afoAoGhfi0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;519&quot; data-origin-width=&quot;1059&quot; data-origin-height=&quot;916&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Index hot-swapping&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;논파라메트릭 메모리 모델의 장점 : 테스트 시점에서도 지식을 쉽게 업데이트할 수 있음&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;실험&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;사용한 색인
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;2016년 12월 위키피디아 덤프 기반의 색인&lt;/li&gt;
&lt;li&gt;2018년 12월 위키피디아 덤프 기반의 색인&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;질문
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;독립 변수 - 질문의 대상 (2016년과 2018년 사이 바뀐 82명의 세계 지도자)&lt;/li&gt;
&lt;li&gt;통제 변수 - 질문 형식 (&quot;Who is {}?&quot;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;결과
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;2016년 질문 - 2016년 색인 : 70% 정답률&lt;/li&gt;
&lt;li&gt;2018년 질문 - 2018년 색인 : 68% 정답률&lt;/li&gt;
&lt;li&gt;2016년 질문 - 2018년 색인 : 12% 정답률&lt;/li&gt;
&lt;li&gt;2018년 질문 - 2016년 색인 : 4% 정답률&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RAG는 단순히 논파라메트릭 메모리를 교체함으로써 업데이트할 수 있음 !&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Effect of Retrieving more documents&amp;nbsp;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1056&quot; data-origin-height=&quot;494&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/KgxOy/btsO0BJ8tlA/teOsK0bnzXhAWxkhV9cag0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/KgxOy/btsO0BJ8tlA/teOsK0bnzXhAWxkhV9cag0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/KgxOy/btsO0BJ8tlA/teOsK0bnzXhAWxkhV9cag0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FKgxOy%2FbtsO0BJ8tlA%2FteOsK0bnzXhAWxkhV9cag0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;600&quot; height=&quot;281&quot; data-origin-width=&quot;1056&quot; data-origin-height=&quot;494&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;5. Related Work&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Single-Task Retrieval&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이전 연구들 : 검색이 개별적으로 고려된 다양한 NLP 과제에서 성능이 향상함을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RAG : 개별 과제에 검색을 통합한 기존의 성공 사례들을 통힙해, 하나의 검색기 기반 아키텍쳐로도 여러 과제에서 높은 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; General-Purpose Architectures for NLP&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이전 연구들 : 검색을 사용하지 않고, 사전학습된 모델을 파인튜닝하는 방식으로 높은 성능을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RAG : 사전학습된 생성 언어 모델을 보강하는 검색 모듈을 학습함으로써, 단일 통합 아키텍쳐로 더 많은 과제에 적용 가능함을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Learned Retrieval&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이전 연구들 : 서로 다른 검색 기반 아키텍쳐와 최적화 기법을 활용하여 단일 작업에서 높은 성능을 달성함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RAG : 하나의 검색 기반 아키텍쳐만으로도 여러 작업에 대해 높은 성능을 낼 수 있도록 파인튜닝할 수 있음을 보임&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Memory-based Architectures&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이전 연구들 : 문서 인덱스를 분산표현으로 사용하며, 학습된 임베딩을 기반으로 대화 텍스트를 생성함 / TF-IDF 방식으로 검색함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RAG : 원시텍스트를 기반으로 함으로써, 사람이 읽을 수 있고, 편집 가능함 / end-to-end로 학습된 검색기를 사용함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt; Retrieve-and-Edit approaches&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이전 연구들 : 유사한 입력-출력 쌍을 검색하여, 이를 편집해 최종 출력을 생성함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;RAG : 검색된 항목을 가볍게 편집하기보다는, 다수의 검색 결과를 종합하여 생성에 활용함&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;6. Discussion&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;파라메트릭 모델과 논파라메트릭 메모리를 결합한 &lt;b&gt;하이브리드 메모리 구조&lt;/b&gt; 제안함&amp;nbsp;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;학습 없이&lt;/b&gt;도 검색 인덱스를 교체할 수 있음&lt;/li&gt;
&lt;li&gt;생성기와 검색기를 처음부터 함께 학습하는 미래 연구 방향 제안&lt;/li&gt;
&lt;/ul&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;Broader Impact&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Positive&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;실제 지식 기반을 활용해 &lt;b&gt;환각 현상이 적고, 더 사실적이고 해석 가능성 높은 결과&lt;/b&gt;를 생성함&lt;/li&gt;
&lt;li&gt;다양한 분야에 적용하거나 오픈 도메일 질문 대응 등 사회적 활용 가능성이 높음&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Negative&lt;/b&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;외부 지식은 편향되거나 부정확할 수 있다는 &lt;b&gt;지식 출처의 한계&lt;/b&gt;가 있음&lt;/li&gt;
&lt;li&gt;&lt;b&gt;악용 가능성&lt;/b&gt;이 존재함 (허위&amp;middot;조작 정보 생성, 타인 사칭, 스팸&amp;middot;피싱 자동화)&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <author>yuha933</author>
      <guid isPermaLink="true">https://yuha933.tistory.com/12</guid>
      <comments>https://yuha933.tistory.com/12#entry12comment</comments>
      <pubDate>Mon, 30 Jun 2025 12:26:09 +0900</pubDate>
    </item>
  </channel>
</rss>