# 참고문헌

1. Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (10 ), 1533–1545 (Oct 2014 ). <https://doi.org/10.1109/TASLP.2014.2339736>
2. Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., Bengio, Y.: An actor-critic algorithm for sequence prediction (2017 ), https\:// openreview\.net/forum?id=SJDaqqveg
3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015 )
4. Bajgar, O., Kadlec, R., Kleindienst, J.: Embracing data abundance: Booktest dataset for reading comprehension. CoRR abs/1610.00956 (2016 )
5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146 (dec 2017), <https://www.aclweb.org/anthology/Q17-> 1010
6. Chambers, N., Jurafsky, D.: Improving the use of pseudo-words for evaluating selectional preferences. In: Proceedings of the 48th Annual Meeting of the Associ- ation for Computational Linguistics. pp. 445–453. Association for Computational Linguistics, Uppsala, Sweden (Jul 2010), <https://www.aclweb>. org/anthology/P10- 1046
7. Chiu, C., Sainath, T.N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R.J., Rao, K., Gonina, E., Jaitly, N., Li, B., Chorowski, J., Bacchiani, M.: State-of-the-art speech recognition with sequence-to-sequence models. In: ICASSP. pp. 4774–4778. IEEE (2018 )
8. Church,K.,A.Gale, W.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech and Language 5, 19–54 (01 1991). <https://doi.org/10.1016/0885-> 2308 (91 )90016-J
9. Currey, A., Miceli Barone, A.V., Heafield, K.: Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the Second Conference on Machine Translation. pp. 148–156. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). <https://doi>. org/10.18653/v1/W17-4715, <https://www.aclweb.org/anthology/W17-4715>
10. Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-xl: Attentive language models beyond a fixed-length context. CoRR abs/1901.02860 (2019 ), <http://arxiv.org/abs/1901.02860>
11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018),<http://arxiv.org/abs/1810.04805>
12. Erk, K.: A simple, similarity-based model for selectional preferences. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. pp. 216–223. Association for Computational Linguistics, Prague, Czech Republic (Jun 2007 ), <https://www.aclweb.org/anthology/P07-1028>
13. Gao, J.: An introduction to deep learning for natural language processing. In: International Summer School on Deep Learning 2017 (2017), https\:// [www.microsoft.com/en-us/research/wp-content/uploads/2017/07/dl-](http://www.microsoft.com/en-us/research/wp-content/uploads/2017/07/dl-) summer- school-2017.-Jianfeng-Gao.v2.pdf
14. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y. N.: Convolutional sequence to sequence learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1243–1252. PMLR, International Convention Centre, Sydney, Australia (06–11 Aug 2017), http\:// proceedings.mlr.press/v70/gehring17a.html
15. Gong, C., He, D., Tan, X., Qin, T., Wang, L., Liu, T. Y.: Frage: Frequency- agnostic word representation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 1334–1345. Curran Associates, Inc. (2018), <http://papers.nips.cc/paper/7408-frage-frequency-agnostic-> word- representation.pdf
16. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672– 2680. Curran Associates, Inc. (2014), <http://papers.nips.cc/paper/5423-> generative-adversarial-nets.pdf
17. Grave, E., Joulin, A., Usunier, N.: Improving neural language models with a continuous cache. In: ICLR. OpenReview\.net (2017 )
18. Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwin ′ska, A., Go ′mez, S., Grefenstette, E., Ramalho, T., Agapiou, J., Puigdom\`enech Badia, A., Moritz Hermann, K., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538 (10 2016 ). <https://doi.org/10.1038/nature20101>
19. Gulcehre, C., Firat, Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H., Bougares, F., Schwenk, H., Bengio, Y.: On using monolingual corpora in neural machine trans- lation. CoRR abs/1503.03535 (2015 ), <http://arxiv.org/abs/1503.03535>
20. Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., Li, M., Liu, S., Liu, T., Luo, R., Menezes, A., Qin, T., Seide, F., Tan, X., Tian, F., Wu, L., Wu, S., Xia, Y., Zhang, D., Zhang, Z., Zhou, M.: Achieving human parity on automatic chinese to english news translation. CoRR abs/1803.05567 (2018), http\:// arxiv.org/abs/1803.05567
21. He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.Y., Ma, W\.Y.: Dual learning for machine translation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 820–828. Curran Associates, Inc. (2016a), <http://papers.nips.cc/> paper/6469-dual- learning-for-machine-translation.pdf
22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (June 2016). <https://doi.org/10.1109/> CVPR.2016.90
23. Hoang, L., Wiseman, S., Rush, A.: Entity tracking improves cloze-style reading comprehension. In: Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing. pp. 1049–1055. Association for Computational Linguistics, Brussels, Belgium (Oct-Nov 2018), <https://www>. aclweb.org/anthology/D18-1130
24. Isola, P., Zhu, J.Y., Zhou, T., Efros, A. A.: Image-to-image translation with conditional adversarial networks. arxiv (2016 )
25. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel- softmax. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings (2017 ), <https://openreview.net/forum?id=rkE3y85ee>
26. Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Vi  ́egas, F., Wattenberg, M., Corrado, G., Hughes, M., Dean, J.: Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5, 339–351 (2017),<https://www.aclweb.org/anthology/Q17-1024>
27. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. pp. 427–431. Association for Computational Linguistics, Valencia, Spain (Apr 2017 ), <https://www.aclweb.org/anthology/E17-2068>
28. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017), <http://arxiv.org/abs/1710.10196>
29. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018 ), <http://arxiv>. org/abs/1812.04948
30. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP). pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (Oct 2014). <https://doi.org/10.3115/v1/D14-1181>, <https://www.aclweb.org/anthology/D14-1181>
31. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http\:// arxiv.org/abs/1412.6980
32. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013 ), http\:// arxiv.org/abs/1312.6114, cite arxiv:1312.6114
33. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014), http\:// arxiv.org/abs/1312.6114
34. Kipyatkova, I., Karpov, A.: Recurrent neural network-based language model- ing for an automatic russian speech recognition system. In: 2015 Artificial In- telligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). pp. 33–38 (Nov 2015 ). <https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382966>
35. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing. vol. 1, pp. 181–184 vol.1 (May 1995). <https://doi.org/10.1109/> ICASSP.1995.479394
36. Lample, G., Conneau, A.: Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019),<http://arxiv.org/abs/1901.07291>
37. Lample, G., Conneau, A., Ranzato, M., Denoyer, L., J  ́egou, H.: Word translation without parallel data. In: ICLR. OpenReview\.net (2018 )
38. Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 469–477. Curran Associates, Inc. (2016), <http://papers.nips.cc/paper/6544-coupled-generative-adversarial-networks.pdf>
39. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention- based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (Sep 2015). <https://doi.org/10.18653/v1/D15-1166>, <https://www.aclweb.org/anthology/> D15-1166
40. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013), <http://papers.nips.cc/> paper/5021-distributed-representations- of-words-and-phrases-and-their- compositionality.pdf
41. Paperno, D., Kruszewski ,G. ,Lazaridou, A., Pham, N.Q., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., Fernandez, R.: The LAMBADA dataset: Word prediction requiring a broad discourse context. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1525– 1534. Association for Computational Linguistics, Berlin, Germany (Aug 2016 ). <https://doi.org/10.18653/v1/P16-1144>, https\:// [www.aclweb.org/anthology/P16-](http://www.aclweb.org/anthology/P16-) 1144
42. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (Oct 2014). <https://doi.org/10.3115/> v1/D14-1162, <https://www.aclweb.org/anthology/D14-1162>
43. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (Jun 2018). <https://doi.org/10.18653/v1/N18-1202>, <https://www.aclweb.org/anthology/N18-> 1202
44. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016 )
45. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018 )
46. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2018 )
47. Resnik, P.: Selectional preference and sense disambiguation. In: Tagging Text with Lexical Semantics: Why, What, and How? (1997), <https://www.aclweb>. org/anthology/W97-0209
48. Ruder, S.: Neural Transfer Learning for Natural Language Processing. Ph.D. thesis, National University of Ireland, Galway (2019 )
49. Sennrich, R., Birch, A., Currey, A., Germann, U., Haddow, B., Heafield, K., Miceli Barone, A.V., Williams, P.: The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the Second Conference on Machine Translation. pp. 389–399. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). <https://doi.org/10.18653/v1/W17-4739>, <https://www.aclweb.org/anthology/W17-4739>
50. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1715– 1725. Association for Computational Linguistics, Berlin, Germany (Aug 2016a). <https://doi.org/10.18653/v1/P16-1162>, <https://www.aclweb.org/> anthology/P16- 1162
51. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 86– 96. Association for Computational Linguistics, Berlin, Germany (Aug 2016b). <https://doi.org/10.18653/v1/P16-1009>, <https://www.aclweb.org/> anthology/P16- 1009
52. Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum risk training for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1683– 1692. Association for Computational Linguistics, Berlin, Germany (Aug 2016). <https://doi.org/10.18653/v1/P16-1159>, <https://www>. aclweb.org/anthology/P16- 1159
53. Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH. pp. 194–197. ISCA (2012 )
54. Sutskever, I., Vinyals, O., Le, Q. V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3104– 3112. Curran Associates, Inc. (2014), <http://papers.nips.cc/> paper/5346-sequence- to-sequence-learning-with-neural-networks.pdf
55. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Muller, K. (eds.) Advances in Neural Information Processing Systems 12, pp. 1057–1063. MIT Press (2000), <http://papers.nips.cc/> paper/1713-policy-gradient- methods-for-reinforcement-learning-with- function-approximation.pdf
56. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017 ), <http://papers.nips.cc/paper/7181-attention-> is-all-you-need.pdf
57. Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., liu, G., Liu, T. Y.: Dual transfer learning for neural machine translation with marginal distribution regularization. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (February 2018), <https://www.microsoft.com/> en-us/research/publication/dual- transfer-learning-neural-machine- translation-marginal-distribution-regularization/
58. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Lukasz Kaiser, Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Cor- rado, G., Hughes, M., Dean, J.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016 ), <http://arxiv.org/abs/1609.08144>
59. Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., Liu, T.: Dual supervised learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. pp. 3789–3798 (2017a ), <http://proceedings.mlr.press/v70/xia17a.html>
60. Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., Liu, T. Y.: Deliberation networks: Sequence generation beyond one-pass decoding. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 1784–1794. Curran Associates, Inc. (2017b), <http://papers.nips.cc/> paper/6775-deliberation-networks- sequence-generation-beyond-one-pass- decoding.pdf
61. Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 2852– 2858 (2017), <http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/> view/14344
62. Zhang, Z., Liu, S., Li, M., Zhou, M., Chen, E.: Joint training for neural machine translation models with monolingual data. In: AAAI. pp. 555–562. AAAI Press (2018)
63. Zhu, J.Y., Park, T., Isola, P., Efros, A. A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)
64. Cho, K.: Noisy parallel approximate decoding for conditional recurrent language model. CoRR abs/1605.03835 (2016), <http://arxiv.org/> abs/1605.03835
65. Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014), <http://arxiv.org/abs/1410.5401>
66. Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large tar- get vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Process- ing (Volume 1: Long Papers). pp. 1–10. Association for Computational Linguistics, Beijing, China (7 2015). <https://doi.org/10.3115/v1/P15-1001>, <https://www.aclweb>. org/anthology/P15-1001


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://kh-kim.gitbook.io/natural-language-processing-with-pytorch/references.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
