Pay attention: the high-speed evolution of NLP, and Where it Hits a Wall

Fabio Cozman; Hugo Neri

doi:10.35699/2965-6931.2023.47510

Autores/as

Fabio Cozman Universidade de São Paulo (USP) https://orcid.org/0000-0003-4077-4935
Hugo Neri Universidade de São Paulo (USP) https://orcid.org/0000-0001-6065-4661

DOI:

https://doi.org/10.35699/2965-6931.2023.47510

Palabras clave:

modelos de lenguaje, aprendizaje profundo, procesamiento natural del lenguaje, inteligencia artificial

Resumen

Este artículo analiza la evolución de los modelos basados en la atención en el procesamiento del lenguaje natural (PNL) en un tono informal, comenzando en 2003 y culminando en las arquitecturas de "transformadores" que conocemos desde 2017. Explicamos cómo los "transformadores" lograron resolver la importante "referencia" para el razonamiento de sentido común en Inteligencia Artificial debido a su pre-entrenamiento. Además, investigamos el paralelismo entre el concepto de "esencial" ("lo que realmente importa") en la comprensión del lenguaje humano, según lo propuesto por Roger Schank, un veterano de PLN, y las "incrustaciones" que ahora se emplean en el aprendizaje automático. En el artículo, discutimos un problema bien conocido con estos modelos, las llamadas "alucinaciones". Este fenómeno destaca la lucha de los modelos para distinguir los hechos de la ficción, lo que requiere más investigación para mitigar su impacto. Enmarcamos este problema en el contexto del trabajo de David Lewis, argumentando que plantea un desafío fundamental a los modelos de lenguaje.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Fabio Cozman, Universidade de São Paulo (USP)

Fabio G. Cozman é Professor Titular da Escola Politécnica da Universidade de São Paulo (USP), Diretor do Centro de Inteligência Artificial na USP, com foco em aprendizado de máquina e representação de conhecimento e incerteza. Engenheiro pela Escola Politécnica USP e PhD pela Carnegie Mellon University (EUA), serviu, entre outras atividades, como Program e General Chair da Conference on Uncertainty in Artificial Intelligence, Area Chair da International Joint Conference on Artificial Intelligence, Associate Editor dos periódicos Artificial Intelligence, Journal of Artificial Intelligence Research, e Journal of Approximate Reasoning. Foi também coordenador do Comitê Especial em Inteligência Artificial da Sociedade Brasileira de Computação, e recebeu o Prêmio de Mérito Científico em Inteligência Artificial concedido por aquela sociedade. Foi chefe do Departamento de Engenharia Mecatrônica e presidente da Comissão de Graduação da Escola Politécnica da USP.

Hugo Neri, Universidade de São Paulo (USP)

A researcher at the Center for Artificial Intelligence (C4AI), a visiting professor at Innsbruck Universität's Sociology Department, and an editorial board member of The American Sociologist Journal, he holds a Ph.D. in Philosophy, a Master's in Sociology, and a Bachelor's in Social Sciences from the University of São Paulo. His works include "The Risk Perception of Artificial Intelligence" (Lexington, 2020) and "Inteligência Artificial: Avanços e Tendências" (IEA-USP, 2021).

Citas

BAHDANAU, D.; CHO, K.; BENGIO, Y. Neural Machine Translation by Jointly Learning to Align and Translate. 2016. Disponível em: https://arxiv.org/abs/1409.0473.

BENGIO, Y. et al. A neural probabilistic language model. Journal of Machine Learning Research, v. 3, n. Feb, p. 1137-1155, 2003.

BENGIO, Y.; DUCHARME, R.; VINCENT, P.; JANVIN, C. A neural probabilistic language model. Journal of Machine Learning Research, v. 3, p. 1137-1155, 2003.

BENNETT, S.W.; AONE, C.; LOVELL, C. Learning to tag multilingual texts through observation. In: SECOND CONFERENCE ON EMPIRICAL

METHODS IN NATURAL LANGUAGE PROCESSING, 2., 1997, Providence. Proceedings. Providence: Morgan Kaufmann Publishers, Inc, 1997. p. 109-116.

BOWMAN, S. R.; ANGELI, G.; POTTS, C.; MANNING, C. D. A large annotated corpus for learning natural language inference. 2015. Available at: http://arxiv.org/abs/1508.05326.

BROWN, T. B.; MANN, B.; RYDER, N.; SUBBIAH, M.; KAPLAN, J.; DHARIWAL, P.; NEELAKANTAN, A.; SHYAM, P.; SASTRY, G.; ASKELL, A.;

AGARWAL, S.; HERBERT-VOSS, A.; KRUEGER, G.; HENIGHAN, T.; CHILD, R.; RAMESH, A.; ZIEGLER, D. M.; WU, J.; WINTER, C.; HESSE, C.;

CHEN, M.; SIGLER, E.; LITWIN, M.; GRAY, S.; CHESS, B.; CLARK, J.; BERNER, C.; MCCANDLISH, S.; RADFORD, A.; SUTSKEVER, I.; AMODEI, D. Language models are few-shot learners. ArXiv:2005.14165, 2020.

DAGAN, Ido. Recognizing textual entailment: Rational, evaluation and approaches. Natural Language Engineering, v. 15, n. 4, p. i-xvii, 2009.

DAVIES, Ernest. Winograd schemas and machine translation. 2016. Disponível em: https://arxiv.org/abs/1608.01884.

DAVIES, Ernest; MORGENSTERN, Leora; ORTIZ, Charles L. The first Winograd Schema Challenge at IJCAI-16. AI Magazine, v. 38, n. 3, p. 97-98, 2017.

DEVLIN, J.; CHANG, M. W.; LEE, K.; TOUTANOVA, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. Available at: https://arxiv.org/abs/1810.04805.

GURURANGAN, Suchin; SWAYAMDIPTA, Swabha; LEVY, Omer; SCHWARTZ, Roy; BOWMAN, Samuel R.; SMITH, Noah A. Annotation Artifacts in Natural Language Inference Data. 2018. Available at: https://arxiv.org/abs/1803.02324.

LEVESQUE, Hector. The Winograd Schema Challenge. In: AAAI SPRING SYMPOSIUM, 2011, Palo Alto. Anais. Palo Alto: AAAI, 2011.

LEVESQUE, Hector. On our best behaviour. In: INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2013, Beijing. Anais. Beijing: IJCAI, 2013.

LEVESQUE, Hector. Common Sense, the Turing Test, and the Quest for Real AI. Cambridge, Massachusetts: The MIT Press, 2017.

LEVESQUE, Hector; DAVIES, Ernest; MORGENSTERN, Leora. The Winograd Schema Challenge. In: PRINCIPLES OF KNOWLEDGE

REPRESENTATION AND REASONING, 2012, Rome. Proceedings. Rome: KR, 2012.

LEVY, O.; GOLDBERG, Y. Neural word embedding as implicit matrix factorization. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. Anais. 2014. p. 2177-2185.

LEWIS, D. Counterpart theory and quantified modal logic. Journal of Philosophy, v. 65, n. 5, p. 113-126, 1968.

LEWIS, D. On the Plurality of Worlds. Oxford: Blackwell, 1986.

LEWIS, D. Truth in Fiction. American Philosophical Quarterly, v. 15, n. 1, p. 37-46, 1978.

LIU, Y. et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019. Available at: https://arxiv.org/abs/1907.11692.

MIKOLOV, T. et al. Distributed representations of words and phrases and their compositionality. In: ADVANCES IN NEURAL

INFORMATION PROCESSING SYSTEMS, Anais. 2013. p. 3111-3119.

MIKOLOV, T. et al. Efficient estimation of word representations in vector space. 2013. Disponível em: https://arxiv.org/abs/1301.3781.

OPEN AI. GPT4 Technical Report. ArXiv:2303.08774, 2023.

PENNINGTON, J.; SOCHER, R.; MANNING, C. Glove: Global vectors for word representation. In: CONFERENCE ON EMPIRICAL METHODS

IN NATURAL LANGUAGE PROCESSING, 2014. Proceedings. 2014. p. 1532-1543.

RADFORD, A.; WU, J.; CHILD, R.; LUAN, D.; AMODEI, D.; SUTSKEVER, I. Language models are unsupervised multitask learners. OpenAI Blog, v. 1, n. 8, 2019.

RADFORD, A.; WU, J.; CHILD, R.; LUAN, D.; AMODEI, D.; SUTSKEVER, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog, v. 1, n. 8, 2019. Available at: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

RAJPURKAR, Pranav; ZHANG, Jian; LOPYREV, Konstantin; LIANG, Percy. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In: CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2016, Austin, Texas. Anais. Austin, Texas: Association for Computational Linguistics, 2016. p. 2383-2392.

RYAN, M. L. Possible worlds, artificial intelligence, and narrative theory. Bloomington: Indiana University Press, 1991.

SAKAGUCHI, K.; LE BRAS, R.; BHAGAVATULA, C.; CHOI, Y. Winogrande: An adversarial Winograd Schema Challenge at scale. In: AAAI-20

TECHNICAL TRACKS 5, 34, 2019. p. 05.

SCHANK, R.; ABELSON, R.P. Scripts, plans, goals and understanding: An inquiry into human knowledge structures. New Jersey: Erlbaum, 1977.

SCHANK, R.C. Tell Me a Story: A New Look at Real and Artificial Memory. 1st ed. New York: Atheneum, 1990.

TAYLOR, W. "Cloze procedure": A new tool for measuring readability. Journalism Quarterly, v. 30, n. 4, p. 415-433, fall, 1953.

VASWANI, A. et al. Attention is all you need. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. Anais, 2017. p. 5998-6008.

WILLIAMS, A.; NANGIA, N.; BOWMAN, S. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In: CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE

TECHNOLOGIES, 2018, New Orleans. Anais. New Orleans: Association for Computational Linguistics, 2018. p. 1112-1122.

Preste atención

la rápida evolución del procesamiento del lenguaje natural y dónde se quedó

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Biografía del autor/a

Fabio Cozman, Universidade de São Paulo (USP)

Hugo Neri, Universidade de São Paulo (USP)

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Categorías

Enviar un artículo

Open Journal Systems

Idioma

PRESERVAÇÃO DIGITAL

indexadores

mapa-acessos

Palabras clave

Revista da UFMG