Apr 11, 2022

Unreasonable Data

In this paper from 2009, Norwig, Halevy, and Pereira discuss challenging problems with NLP datasets. This work was revisited by other scientists over the years. One of the results at present might be a creation of a new industry that focuses on synthetic (large) data generation for ML modeling.  

#datasets #syntheticdata #syntheticsemantics #NLP #representationlearning #pretraining #bigdata #ml #data #ai


https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf




No comments:

Post a Comment

Note: Only a member of this blog may post a comment.