Sveriges mest populära poddar
Linear Digressions

Data Contamination

21 min2 maj 2016
Supervised machine learning assumes that the features and labels used for building a classifier are isolated from each other--basically, that you can't cheat by peeking. Turns out this can be easier said than done. In this episode, we'll talk about the many (and diverse!) cases where label information contaminates features, ruining data science competitions along the way. Relevant links: https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_data_mining_Formulation_detection_and_avoidance/links/54418bb80cf2a6a049a5a0ca.pdf

Linear Digressions med Katie Malone finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.