Charla Profesor Pavlos Protopapas de Harvard University

Lunes 08, Enero 15.00 hrs.

Estimado (a), 

A nombre del Departamento de Ingeniería Informática y Ciencias de la Computación de la FI UdeC, me es muy grato invitarlos a la charla del Profesor Pavlos Protopapas, director científico del Máster en Data Science de Harvard University

La charla se llevará a cabo el día lunes 8 de enero a las 15:00 hrs. en el Auditorio Salvador Gálvez de la Facultad de Ingeniería

Pavlos estará visitándonos como parte de la "Harvard-Chile Data Science School”.

Guillermo Cabrera-Vives
Assistant Professor | Department of Computer Science | Universidad de Concepción

*Interesados, por favor inscribirse aquí: https://goo.gl/ycWnQn

Generating training data for a task like classification often involves surveying a panel of annotators. By obtaining multiple votes as to an item’s true label, we hope to harness the collective wisdom of a crowd of individuals who are each individually subject to error. Probabilistic annotation models show that by simultaneously inferring each annotator’s skill by category and weighting votes accordingly, one can recover true labels more accurately than by simple majority rule. In this talk, I will present extensions to the annotation models by experimenting with different type of queries and by acknowledging that annotators are not only differentially skilled but are also in flux. In particular, we focus on three scenarios: annotators are present with binary questions; fatigue when annotator credibility deteriorates with time; and insight when the annotator experiences a flash of understanding about the task after sufficient practice.  Extensive experiments were developed using synthetic and human annotators. We find that a mixture of supervised and unsupervised yields extremely good results and that binary questions are more efficient.  Finally, we find that our model reaches similar performance as standard baselines while converging faster and low-cost questions and experts' effort. We confirm that by probabilistically modeling the path of credibility yields more accurate recovered labels than majority rule on synthetic data and motivate further research on different forms of annotator flux.


Interesados por favor inscribirse aquí