Sen(Qian)’s Memo

This website is Donglin Qian (Torin Sen)’s memo, especially about machine learning papers and competitive programming.

Case-Control2/2

2024-05-23

PU Cost-Sensitive SAR Bias Case-Control Paper

2019-ICLR-[PUSB]Learning from Positive and Unlabeled Data with a Selection Bias

→Read more

2024-05-22

PU Theoretical Analysis Bias SCAR Single-Training-Set Case-Control SAR EM-Algorithm Paper

2019-ECML PKDD-[PWE]Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

BiasつきのPUについて、数理的に考察をし手法も提案した論文。propensity scoreという量を導入し、それを損失関数の重みに寄与させることでbiasを考慮できるとした。それをRiskの式に導入したのちに、推定の手法として2つの変数があるので(propensity scoreと本体の推定器)、EMアルゴリズムで交互に最適化をしていた。

→Read more

2024-05-21

PU Cost-Sensitive Case-Control Gradient Ascent Paper

2017-NIPS-[nnPU] Positive-Unlabeled Learning with Non-Negative Risk Estimator

PUの訓練の式で経験損失がというか一定値以下にならないようにclipするといい感じ。実用的には、一定値以下となった時、損失関数全体が負となった原因の項(本文参照)を取り出し、そのgradientでgradient ascentすることで過学習を防いでいる。

→Read more

2024-05-12

PU Theoretical Analysis Case-Control 統計的機械学習 Paper

2016-NIPS-Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning

なぜ時たまPUはPNよりも性能が良くなるのか。その理論的な条件を示した。統計的学習理論の知識をふんだんに使うとこれが示せる。そのうえ、性能が良くなるのはどういうときか？を比で考察して評価をした。特に、Uデータが無限に取れる場合は、理論上はPNやるよりはPUかNUをやった方がエラー上界の収束が早い。

→Read more

2024-05-11

PU Multi-Label Case-Control SCAR Ranking Single-Training-Set Paper

2016-CVPR-Multi-label Ranking from Positive and Unlabeled Data

マルチラベルでのPUのフレームワークを提案している。1つのサンプルには複数のラベルがつくが、付いているラベル以外のラベルを含まないとは限らない。こういう条件で、Rank Lossの形をPU2014のように、Ramp損失を導入した。また、PUで計算するときの目的関数についても導出をした(マルチラベルでは結構重要なので見るといいかも)

→Read more

2023-12-20

PU PNU Cost-Sensitive Single-Training-Set Case-Control Resampling Survey Paper

2020-Survey-Learning from positive and unlabeled data: a survey

→Read more