In this paper from 2009, Norwig, Halevy, and Pereira discuss challenging problems with NLP datasets. This work was revisited by other scientists over the years. One of the results at present might be a creation of a new industry that focuses on synthetic (large) data generation for ML modeling.
#datasets #syntheticdata #syntheticsemantics #NLP #representationlearning #pretraining #bigdata #ml #data #ai
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.