Preste atenção
a veloz evolução do processamento de linguagem natural e onde ela empaca
DOI:
https://doi.org/10.35699/2965-6931.2023.47510Palavras-chave:
modelos de linguagem, aprendizado profundo, processamento de linguagem natural, inteligência artificialResumo
Este artigo analisa a evolução dos modelos baseados em atenção no Processamento de Linguagem Natural (PLN) em um tom informal, começando em 2003 e culminando nas arquiteturas de “transformers" que conhecemos desde 2017. Explicamos como os “transformers" conseguiram resolver o importante "benchmark" para o raciocínio de senso comum em Inteligência Artificial devido ao seu pré-treinamento. Além disso, investigamos o paralelo entre o conceito de ‘gist' (“o que realmente importa”) na compreensão da linguagem humana, conforme proposto por Roger Schank, um veterano do PLN, e os “embeddings" agora empregados na aprendizagem de máquina. No final do artigo, discutimos um problema bem conhecido com esses modelos, as chamadas "alucinações". Este fenômeno destaca a luta dos modelos para discernir fato de ficção, necessitando de mais pesquisas para mitigar seu impacto. Enquadramos essa questão no contexto do trabalho de David Lewis, argumentando que representa um desafio fundamental para os modelos de linguagem.
Referências
BAHDANAU, D.; CHO, K.; BENGIO, Y. Neural Machine Translation by Jointly Learning to Align and Translate. 2016. Disponível em: https://arxiv.org/abs/1409.0473.
BENGIO, Y. et al. A neural probabilistic language model. Journal of Machine Learning Research, v. 3, n. Feb, p. 1137-1155, 2003.
BENGIO, Y.; DUCHARME, R.; VINCENT, P.; JANVIN, C. A neural probabilistic language model. Journal of Machine Learning Research, v. 3, p. 1137-1155, 2003.
BENNETT, S.W.; AONE, C.; LOVELL, C. Learning to tag multilingual texts through observation. In: SECOND CONFERENCE ON EMPIRICAL
METHODS IN NATURAL LANGUAGE PROCESSING, 2., 1997, Providence. Proceedings. Providence: Morgan Kaufmann Publishers, Inc, 1997. p. 109-116.
BOWMAN, S. R.; ANGELI, G.; POTTS, C.; MANNING, C. D. A large annotated corpus for learning natural language inference. 2015. Available at: http://arxiv.org/abs/1508.05326.
BROWN, T. B.; MANN, B.; RYDER, N.; SUBBIAH, M.; KAPLAN, J.; DHARIWAL, P.; NEELAKANTAN, A.; SHYAM, P.; SASTRY, G.; ASKELL, A.;
AGARWAL, S.; HERBERT-VOSS, A.; KRUEGER, G.; HENIGHAN, T.; CHILD, R.; RAMESH, A.; ZIEGLER, D. M.; WU, J.; WINTER, C.; HESSE, C.;
CHEN, M.; SIGLER, E.; LITWIN, M.; GRAY, S.; CHESS, B.; CLARK, J.; BERNER, C.; MCCANDLISH, S.; RADFORD, A.; SUTSKEVER, I.; AMODEI, D. Language models are few-shot learners. ArXiv:2005.14165, 2020.
DAGAN, Ido. Recognizing textual entailment: Rational, evaluation and approaches. Natural Language Engineering, v. 15, n. 4, p. i-xvii, 2009.
DAVIES, Ernest. Winograd schemas and machine translation. 2016. Disponível em: https://arxiv.org/abs/1608.01884.
DAVIES, Ernest; MORGENSTERN, Leora; ORTIZ, Charles L. The first Winograd Schema Challenge at IJCAI-16. AI Magazine, v. 38, n. 3, p. 97-98, 2017.
DEVLIN, J.; CHANG, M. W.; LEE, K.; TOUTANOVA, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. Available at: https://arxiv.org/abs/1810.04805.
GURURANGAN, Suchin; SWAYAMDIPTA, Swabha; LEVY, Omer; SCHWARTZ, Roy; BOWMAN, Samuel R.; SMITH, Noah A. Annotation Artifacts in Natural Language Inference Data. 2018. Available at: https://arxiv.org/abs/1803.02324.
LEVESQUE, Hector. The Winograd Schema Challenge. In: AAAI SPRING SYMPOSIUM, 2011, Palo Alto. Anais. Palo Alto: AAAI, 2011.
LEVESQUE, Hector. On our best behaviour. In: INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2013, Beijing. Anais. Beijing: IJCAI, 2013.
LEVESQUE, Hector. Common Sense, the Turing Test, and the Quest for Real AI. Cambridge, Massachusetts: The MIT Press, 2017.
LEVESQUE, Hector; DAVIES, Ernest; MORGENSTERN, Leora. The Winograd Schema Challenge. In: PRINCIPLES OF KNOWLEDGE
REPRESENTATION AND REASONING, 2012, Rome. Proceedings. Rome: KR, 2012.
LEVY, O.; GOLDBERG, Y. Neural word embedding as implicit matrix factorization. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. Anais. 2014. p. 2177-2185.
LEWIS, D. Counterpart theory and quantified modal logic. Journal of Philosophy, v. 65, n. 5, p. 113-126, 1968.
LEWIS, D. On the Plurality of Worlds. Oxford: Blackwell, 1986.
LEWIS, D. Truth in Fiction. American Philosophical Quarterly, v. 15, n. 1, p. 37-46, 1978.
LIU, Y. et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019. Available at: https://arxiv.org/abs/1907.11692.
MIKOLOV, T. et al. Distributed representations of words and phrases and their compositionality. In: ADVANCES IN NEURAL
INFORMATION PROCESSING SYSTEMS, Anais. 2013. p. 3111-3119.
MIKOLOV, T. et al. Efficient estimation of word representations in vector space. 2013. Disponível em: https://arxiv.org/abs/1301.3781.
OPEN AI. GPT4 Technical Report. ArXiv:2303.08774, 2023.
PENNINGTON, J.; SOCHER, R.; MANNING, C. Glove: Global vectors for word representation. In: CONFERENCE ON EMPIRICAL METHODS
IN NATURAL LANGUAGE PROCESSING, 2014. Proceedings. 2014. p. 1532-1543.
RADFORD, A.; WU, J.; CHILD, R.; LUAN, D.; AMODEI, D.; SUTSKEVER, I. Language models are unsupervised multitask learners. OpenAI Blog, v. 1, n. 8, 2019.
RADFORD, A.; WU, J.; CHILD, R.; LUAN, D.; AMODEI, D.; SUTSKEVER, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog, v. 1, n. 8, 2019. Available at: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
RAJPURKAR, Pranav; ZHANG, Jian; LOPYREV, Konstantin; LIANG, Percy. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In: CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2016, Austin, Texas. Anais. Austin, Texas: Association for Computational Linguistics, 2016. p. 2383-2392.
RYAN, M. L. Possible worlds, artificial intelligence, and narrative theory. Bloomington: Indiana University Press, 1991.
SAKAGUCHI, K.; LE BRAS, R.; BHAGAVATULA, C.; CHOI, Y. Winogrande: An adversarial Winograd Schema Challenge at scale. In: AAAI-20
TECHNICAL TRACKS 5, 34, 2019. p. 05.
SCHANK, R.; ABELSON, R.P. Scripts, plans, goals and understanding: An inquiry into human knowledge structures. New Jersey: Erlbaum, 1977.
SCHANK, R.C. Tell Me a Story: A New Look at Real and Artificial Memory. 1st ed. New York: Atheneum, 1990.
TAYLOR, W. "Cloze procedure": A new tool for measuring readability. Journalism Quarterly, v. 30, n. 4, p. 415-433, fall, 1953.
VASWANI, A. et al. Attention is all you need. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. Anais, 2017. p. 5998-6008.
WILLIAMS, A.; NANGIA, N.; BOWMAN, S. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In: CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE
TECHNOLOGIES, 2018, New Orleans. Anais. New Orleans: Association for Computational Linguistics, 2018. p. 1112-1122.