참고문헌

  1. Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (10 ), 1533–1545 (Oct 2014 ). https://doi.org/10.1109/TASLP.2014.2339736

  2. Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., Bengio, Y.: An actor-critic algorithm for sequence prediction (2017 ), https:// openreview.net/forum?id=SJDaqqveg

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015 )

  4. Bajgar, O., Kadlec, R., Kleindienst, J.: Embracing data abundance: Booktest dataset for reading comprehension. CoRR abs/1610.00956 (2016 )

  5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146 (dec 2017), https://www.aclweb.org/anthology/Q17- 1010

  6. Chambers, N., Jurafsky, D.: Improving the use of pseudo-words for evaluating selectional preferences. In: Proceedings of the 48th Annual Meeting of the Associ- ation for Computational Linguistics. pp. 445–453. Association for Computational Linguistics, Uppsala, Sweden (Jul 2010), https://www.aclweb. org/anthology/P10- 1046

  7. Chiu, C., Sainath, T.N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R.J., Rao, K., Gonina, E., Jaitly, N., Li, B., Chorowski, J., Bacchiani, M.: State-of-the-art speech recognition with sequence-to-sequence models. In: ICASSP. pp. 4774–4778. IEEE (2018 )

  8. Church,K.,A.Gale, W.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech and Language 5, 19–54 (01 1991). https://doi.org/10.1016/0885- 2308 (91 )90016-J

  9. Currey, A., Miceli Barone, A.V., Heafield, K.: Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the Second Conference on Machine Translation. pp. 148–156. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). https://doi. org/10.18653/v1/W17-4715, https://www.aclweb.org/anthology/W17-4715

  10. Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-xl: Attentive language models beyond a fixed-length context. CoRR abs/1901.02860 (2019 ), http://arxiv.org/abs/1901.02860

  11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018),http://arxiv.org/abs/1810.04805

  12. Erk, K.: A simple, similarity-based model for selectional preferences. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. pp. 216–223. Association for Computational Linguistics, Prague, Czech Republic (Jun 2007 ), https://www.aclweb.org/anthology/P07-1028

  13. Gao, J.: An introduction to deep learning for natural language processing. In: International Summer School on Deep Learning 2017 (2017), https:// www.microsoft.com/en-us/research/wp-content/uploads/2017/07/dl- summer- school-2017.-Jianfeng-Gao.v2.pdf

  14. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y. N.: Convolutional sequence to sequence learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1243–1252. PMLR, International Convention Centre, Sydney, Australia (06–11 Aug 2017), http:// proceedings.mlr.press/v70/gehring17a.html

  15. Gong, C., He, D., Tan, X., Qin, T., Wang, L., Liu, T. Y.: Frage: Frequency- agnostic word representation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 1334–1345. Curran Associates, Inc. (2018), http://papers.nips.cc/paper/7408-frage-frequency-agnostic- word- representation.pdf

  16. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672– 2680. Curran Associates, Inc. (2014), http://papers.nips.cc/paper/5423- generative-adversarial-nets.pdf

  17. Grave, E., Joulin, A., Usunier, N.: Improving neural language models with a continuous cache. In: ICLR. OpenReview.net (2017 )

  18. Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwin ′ska, A., Go ′mez, S., Grefenstette, E., Ramalho, T., Agapiou, J., Puigdom`enech Badia, A., Moritz Hermann, K., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538 (10 2016 ). https://doi.org/10.1038/nature20101

  19. Gulcehre, C., Firat, Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H., Bougares, F., Schwenk, H., Bengio, Y.: On using monolingual corpora in neural machine trans- lation. CoRR abs/1503.03535 (2015 ), http://arxiv.org/abs/1503.03535

  20. Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., Li, M., Liu, S., Liu, T., Luo, R., Menezes, A., Qin, T., Seide, F., Tan, X., Tian, F., Wu, L., Wu, S., Xia, Y., Zhang, D., Zhang, Z., Zhou, M.: Achieving human parity on automatic chinese to english news translation. CoRR abs/1803.05567 (2018), http:// arxiv.org/abs/1803.05567

  21. He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.Y., Ma, W.Y.: Dual learning for machine translation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 820–828. Curran Associates, Inc. (2016a), http://papers.nips.cc/ paper/6469-dual- learning-for-machine-translation.pdf

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (June 2016). https://doi.org/10.1109/ CVPR.2016.90

  23. Hoang, L., Wiseman, S., Rush, A.: Entity tracking improves cloze-style reading comprehension. In: Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing. pp. 1049–1055. Association for Computational Linguistics, Brussels, Belgium (Oct-Nov 2018), https://www. aclweb.org/anthology/D18-1130

  24. Isola, P., Zhu, J.Y., Zhou, T., Efros, A. A.: Image-to-image translation with conditional adversarial networks. arxiv (2016 )

  25. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel- softmax. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings (2017 ), https://openreview.net/forum?id=rkE3y85ee

  26. Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Vi ́egas, F., Wattenberg, M., Corrado, G., Hughes, M., Dean, J.: Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5, 339–351 (2017),https://www.aclweb.org/anthology/Q17-1024

  27. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. pp. 427–431. Association for Computational Linguistics, Valencia, Spain (Apr 2017 ), https://www.aclweb.org/anthology/E17-2068

  28. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017), http://arxiv.org/abs/1710.10196

  29. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018 ), http://arxiv. org/abs/1812.04948

  30. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP). pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (Oct 2014). https://doi.org/10.3115/v1/D14-1181, https://www.aclweb.org/anthology/D14-1181

  31. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http:// arxiv.org/abs/1412.6980

  32. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013 ), http:// arxiv.org/abs/1312.6114, cite arxiv:1312.6114

  33. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014), http:// arxiv.org/abs/1312.6114

  34. Kipyatkova, I., Karpov, A.: Recurrent neural network-based language model- ing for an automatic russian speech recognition system. In: 2015 Artificial In- telligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). pp. 33–38 (Nov 2015 ). https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382966

  35. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing. vol. 1, pp. 181–184 vol.1 (May 1995). https://doi.org/10.1109/ ICASSP.1995.479394

  36. Lample, G., Conneau, A.: Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019),http://arxiv.org/abs/1901.07291

  37. Lample, G., Conneau, A., Ranzato, M., Denoyer, L., J ́egou, H.: Word translation without parallel data. In: ICLR. OpenReview.net (2018 )

  38. Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 469–477. Curran Associates, Inc. (2016), http://papers.nips.cc/paper/6544-coupled-generative-adversarial-networks.pdf

  39. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention- based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (Sep 2015). https://doi.org/10.18653/v1/D15-1166, https://www.aclweb.org/anthology/ D15-1166

  40. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013), http://papers.nips.cc/ paper/5021-distributed-representations- of-words-and-phrases-and-their- compositionality.pdf

  41. Paperno, D., Kruszewski ,G. ,Lazaridou, A., Pham, N.Q., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., Fernandez, R.: The LAMBADA dataset: Word prediction requiring a broad discourse context. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1525– 1534. Association for Computational Linguistics, Berlin, Germany (Aug 2016 ). https://doi.org/10.18653/v1/P16-1144, https:// www.aclweb.org/anthology/P16- 1144

  42. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (Oct 2014). https://doi.org/10.3115/ v1/D14-1162, https://www.aclweb.org/anthology/D14-1162

  43. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (Jun 2018). https://doi.org/10.18653/v1/N18-1202, https://www.aclweb.org/anthology/N18- 1202

  44. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016 )

  45. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018 )

  46. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2018 )

  47. Resnik, P.: Selectional preference and sense disambiguation. In: Tagging Text with Lexical Semantics: Why, What, and How? (1997), https://www.aclweb. org/anthology/W97-0209

  48. Ruder, S.: Neural Transfer Learning for Natural Language Processing. Ph.D. thesis, National University of Ireland, Galway (2019 )

  49. Sennrich, R., Birch, A., Currey, A., Germann, U., Haddow, B., Heafield, K., Miceli Barone, A.V., Williams, P.: The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the Second Conference on Machine Translation. pp. 389–399. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). https://doi.org/10.18653/v1/W17-4739, https://www.aclweb.org/anthology/W17-4739

  50. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1715– 1725. Association for Computational Linguistics, Berlin, Germany (Aug 2016a). https://doi.org/10.18653/v1/P16-1162, https://www.aclweb.org/ anthology/P16- 1162

  51. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 86– 96. Association for Computational Linguistics, Berlin, Germany (Aug 2016b). https://doi.org/10.18653/v1/P16-1009, https://www.aclweb.org/ anthology/P16- 1009

  52. Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum risk training for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1683– 1692. Association for Computational Linguistics, Berlin, Germany (Aug 2016). https://doi.org/10.18653/v1/P16-1159, https://www. aclweb.org/anthology/P16- 1159

  53. Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH. pp. 194–197. ISCA (2012 )

  54. Sutskever, I., Vinyals, O., Le, Q. V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3104– 3112. Curran Associates, Inc. (2014), http://papers.nips.cc/ paper/5346-sequence- to-sequence-learning-with-neural-networks.pdf

  55. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Muller, K. (eds.) Advances in Neural Information Processing Systems 12, pp. 1057–1063. MIT Press (2000), http://papers.nips.cc/ paper/1713-policy-gradient- methods-for-reinforcement-learning-with- function-approximation.pdf

  56. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017 ), http://papers.nips.cc/paper/7181-attention- is-all-you-need.pdf

  57. Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., liu, G., Liu, T. Y.: Dual transfer learning for neural machine translation with marginal distribution regularization. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (February 2018), https://www.microsoft.com/ en-us/research/publication/dual- transfer-learning-neural-machine- translation-marginal-distribution-regularization/

  58. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Lukasz Kaiser, Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Cor- rado, G., Hughes, M., Dean, J.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016 ), http://arxiv.org/abs/1609.08144

  59. Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., Liu, T.: Dual supervised learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. pp. 3789–3798 (2017a ), http://proceedings.mlr.press/v70/xia17a.html

  60. Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., Liu, T. Y.: Deliberation networks: Sequence generation beyond one-pass decoding. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 1784–1794. Curran Associates, Inc. (2017b), http://papers.nips.cc/ paper/6775-deliberation-networks- sequence-generation-beyond-one-pass- decoding.pdf

  61. Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 2852– 2858 (2017), http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/ view/14344

  62. Zhang, Z., Liu, S., Li, M., Zhou, M., Chen, E.: Joint training for neural machine translation models with monolingual data. In: AAAI. pp. 555–562. AAAI Press (2018)

  63. Zhu, J.Y., Park, T., Isola, P., Efros, A. A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)

  64. Cho, K.: Noisy parallel approximate decoding for conditional recurrent language model. CoRR abs/1605.03835 (2016), http://arxiv.org/ abs/1605.03835

  65. Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014), http://arxiv.org/abs/1410.5401

  66. Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large tar- get vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Process- ing (Volume 1: Long Papers). pp. 1–10. Association for Computational Linguistics, Beijing, China (7 2015). https://doi.org/10.3115/v1/P15-1001, https://www.aclweb. org/anthology/P15-1001