참고문헌
  1. 1.
    Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (10 ), 1533–1545 (Oct 2014 ). https://doi.org/10.1109/TASLP.2014.2339736
  2. 2.
    Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., Bengio, Y.: An actor-critic algorithm for sequence prediction (2017 ), https:// openreview.net/forum?id=SJDaqqveg
  3. 3.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015 )
  4. 4.
    Bajgar, O., Kadlec, R., Kleindienst, J.: Embracing data abundance: Booktest dataset for reading comprehension. CoRR abs/1610.00956 (2016 )
  5. 5.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146 (dec 2017), https://www.aclweb.org/anthology/Q17- 1010
  6. 6.
    Chambers, N., Jurafsky, D.: Improving the use of pseudo-words for evaluating selectional preferences. In: Proceedings of the 48th Annual Meeting of the Associ- ation for Computational Linguistics. pp. 445–453. Association for Computational Linguistics, Uppsala, Sweden (Jul 2010), https://www.aclweb. org/anthology/P10- 1046
  7. 7.
    Chiu, C., Sainath, T.N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R.J., Rao, K., Gonina, E., Jaitly, N., Li, B., Chorowski, J., Bacchiani, M.: State-of-the-art speech recognition with sequence-to-sequence models. In: ICASSP. pp. 4774–4778. IEEE (2018 )
  8. 8.
    Church,K.,A.Gale, W.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech and Language 5, 19–54 (01 1991). https://doi.org/10.1016/0885- 2308 (91 )90016-J
  9. 9.
    Currey, A., Miceli Barone, A.V., Heafield, K.: Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the Second Conference on Machine Translation. pp. 148–156. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). https://doi. org/10.18653/v1/W17-4715, https://www.aclweb.org/anthology/W17-4715
  10. 10.
    Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-xl: Attentive language models beyond a fixed-length context. CoRR abs/1901.02860 (2019 ), http://arxiv.org/abs/1901.02860
  11. 11.
    Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018),http://arxiv.org/abs/1810.04805
  12. 12.
    Erk, K.: A simple, similarity-based model for selectional preferences. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. pp. 216–223. Association for Computational Linguistics, Prague, Czech Republic (Jun 2007 ), https://www.aclweb.org/anthology/P07-1028
  13. 13.
    Gao, J.: An introduction to deep learning for natural language processing. In: International Summer School on Deep Learning 2017 (2017), https:// www.microsoft.com/en-us/research/wp-content/uploads/2017/07/dl- summer- school-2017.-Jianfeng-Gao.v2.pdf
  14. 14.
    Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y. N.: Convolutional sequence to sequence learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1243–1252. PMLR, International Convention Centre, Sydney, Australia (06–11 Aug 2017), http:// proceedings.mlr.press/v70/gehring17a.html
  15. 15.
    Gong, C., He, D., Tan, X., Qin, T., Wang, L., Liu, T. Y.: Frage: Frequency- agnostic word representation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 1334–1345. Curran Associates, Inc. (2018), http://papers.nips.cc/paper/7408-frage-frequency-agnostic- word- representation.pdf
  16. 16.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672– 2680. Curran Associates, Inc. (2014), http://papers.nips.cc/paper/5423- generative-adversarial-nets.pdf
  17. 17.
    Grave, E., Joulin, A., Usunier, N.: Improving neural language models with a continuous cache. In: ICLR. OpenReview.net (2017 )
  18. 18.
    Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwin ′ska, A., Go ′mez, S., Grefenstette, E., Ramalho, T., Agapiou, J., Puigdom`enech Badia, A., Moritz Hermann, K., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538 (10 2016 ). https://doi.org/10.1038/nature20101
  19. 19.
    Gulcehre, C., Firat, Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H., Bougares, F., Schwenk, H., Bengio, Y.: On using monolingual corpora in neural machine trans- lation. CoRR abs/1503.03535 (2015 ), http://arxiv.org/abs/1503.03535
  20. 20.
    Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., Li, M., Liu, S., Liu, T., Luo, R., Menezes, A., Qin, T., Seide, F., Tan, X., Tian, F., Wu, L., Wu, S., Xia, Y., Zhang, D., Zhang, Z., Zhou, M.: Achieving human parity on automatic chinese to english news translation. CoRR abs/1803.05567 (2018), http:// arxiv.org/abs/1803.05567
  21. 21.
    He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.Y., Ma, W.Y.: Dual learning for machine translation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 820–828. Curran Associates, Inc. (2016a), http://papers.nips.cc/ paper/6469-dual- learning-for-machine-translation.pdf
  22. 22.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (June 2016). https://doi.org/10.1109/ CVPR.2016.90
  23. 23.
    Hoang, L., Wiseman, S., Rush, A.: Entity tracking improves cloze-style reading comprehension. In: Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing. pp. 1049–1055. Association for Computational Linguistics, Brussels, Belgium (Oct-Nov 2018), https://www. aclweb.org/anthology/D18-1130
  24. 24.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A. A.: Image-to-image translation with conditional adversarial networks. arxiv (2016 )
  25. 25.
    Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel- softmax. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings (2017 ), https://openreview.net/forum?id=rkE3y85ee
  26. 26.
    Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Vi ́egas, F., Wattenberg, M., Corrado, G., Hughes, M., Dean, J.: Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5, 339–351 (2017),https://www.aclweb.org/anthology/Q17-1024
  27. 27.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. pp. 427–431. Association for Computational Linguistics, Valencia, Spain (Apr 2017 ), https://www.aclweb.org/anthology/E17-2068
  28. 28.
    Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017), http://arxiv.org/abs/1710.10196
  29. 29.
    Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018 ), http://arxiv. org/abs/1812.04948
  30. 30.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP). pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (Oct 2014). https://doi.org/10.3115/v1/D14-1181, https://www.aclweb.org/anthology/D14-1181
  31. 31.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http:// arxiv.org/abs/1412.6980
  32. 32.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013 ), http:// arxiv.org/abs/1312.6114, cite arxiv:1312.6114
  33. 33.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014), http:// arxiv.org/abs/1312.6114
  34. 34.
    Kipyatkova, I., Karpov, A.: Recurrent neural network-based language model- ing for an automatic russian speech recognition system. In: 2015 Artificial In- telligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). pp. 33–38 (Nov 2015 ). https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382966
  35. 35.
    Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing. vol. 1, pp. 181–184 vol.1 (May 1995). https://doi.org/10.1109/ ICASSP.1995.479394
  36. 36.
    Lample, G., Conneau, A.: Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019),http://arxiv.org/abs/1901.07291
  37. 37.
    Lample, G., Conneau, A., Ranzato, M., Denoyer, L., J ́egou, H.: Word translation without parallel data. In: ICLR. OpenReview.net (2018 )
  38. 38.
    Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 469–477. Curran Associates, Inc. (2016), http://papers.nips.cc/paper/6544-coupled-generative-adversarial-networks.pdf
  39. 39.
    Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention- based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (Sep 2015). https://doi.org/10.18653/v1/D15-1166, https://www.aclweb.org/anthology/ D15-1166
  40. 40.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013), http://papers.nips.cc/ paper/5021-distributed-representations- of-words-and-phrases-and-their- compositionality.pdf
  41. 41.
    Paperno, D., Kruszewski ,G. ,Lazaridou, A., Pham, N.Q., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., Fernandez, R.: The LAMBADA dataset: Word prediction requiring a broad discourse context. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1525– 1534. Association for Computational Linguistics, Berlin, Germany (Aug 2016 ). https://doi.org/10.18653/v1/P16-1144, https:// www.aclweb.org/anthology/P16- 1144
  42. 42.
    Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (Oct 2014). https://doi.org/10.3115/ v1/D14-1162, https://www.aclweb.org/anthology/D14-1162
  43. 43.
    Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (Jun 2018). https://doi.org/10.18653/v1/N18-1202, https://www.aclweb.org/anthology/N18- 1202
  44. 44.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016 )
  45. 45.
    Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018 )
  46. 46.
    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2018 )
  47. 47.
    Resnik, P.: Selectional preference and sense disambiguation. In: Tagging Text with Lexical Semantics: Why, What, and How? (1997), https://www.aclweb. org/anthology/W97-0209
  48. 48.
    Ruder, S.: Neural Transfer Learning for Natural Language Processing. Ph.D. thesis, National University of Ireland, Galway (2019 )
  49. 49.
    Sennrich, R., Birch, A., Currey, A., Germann, U., Haddow, B., Heafield, K., Miceli Barone, A.V., Williams, P.: The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the Second Conference on Machine Translation. pp. 389–399. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). https://doi.org/10.18653/v1/W17-4739, https://www.aclweb.org/anthology/W17-4739
  50. 50.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1715– 1725. Association for Computational Linguistics, Berlin, Germany (Aug 2016a). https://doi.org/10.18653/v1/P16-1162, https://www.aclweb.org/ anthology/P16- 1162
  51. 51.
    Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 86– 96. Association for Computational Linguistics, Berlin, Germany (Aug 2016b). https://doi.org/10.18653/v1/P16-1009, https://www.aclweb.org/ anthology/P16- 1009
  52. 52.
    Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum risk training for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1683– 1692. Association for Computational Linguistics, Berlin, Germany (Aug 2016). https://doi.org/10.18653/v1/P16-1159, https://www. aclweb.org/anthology/P16- 1159
  53. 53.
    Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH. pp. 194–197. ISCA (2012 )
  54. 54.
    Sutskever, I., Vinyals, O., Le, Q. V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3104– 3112. Curran Associates, Inc. (2014), http://papers.nips.cc/ paper/5346-sequence- to-sequence-learning-with-neural-networks.pdf
  55. 55.
    Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Muller, K. (eds.) Advances in Neural Information Processing Systems 12, pp. 1057–1063. MIT Press (2000), http://papers.nips.cc/ paper/1713-policy-gradient- methods-for-reinforcement-learning-with- function-approximation.pdf
  56. 56.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017 ), http://papers.nips.cc/paper/7181-attention- is-all-you-need.pdf
  57. 57.
    Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., liu, G., Liu, T. Y.: Dual transfer learning for neural machine translation with marginal distribution regularization. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (February 2018), https://www.microsoft.com/ en-us/research/publication/dual- transfer-learning-neural-machine- translation-marginal-distribution-regularization/
  58. 58.
    Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Lukasz Kaiser, Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Cor- rado, G., Hughes, M., Dean, J.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016 ), http://arxiv.org/abs/1609.08144
  59. 59.
    Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., Liu, T.: Dual supervised learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. pp. 3789–3798 (2017a ), http://proceedings.mlr.press/v70/xia17a.html
  60. 60.
    Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., Liu, T. Y.: Deliberation networks: Sequence generation beyond one-pass decoding. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 1784–1794. Curran Associates, Inc. (2017b), http://papers.nips.cc/ paper/6775-deliberation-networks- sequence-generation-beyond-one-pass- decoding.pdf
  61. 61.
    Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 2852– 2858 (2017), http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/ view/14344
  62. 62.
    Zhang, Z., Liu, S., Li, M., Zhou, M., Chen, E.: Joint training for neural machine translation models with monolingual data. In: AAAI. pp. 555–562. AAAI Press (2018)
  63. 63.
    Zhu, J.Y., Park, T., Isola, P., Efros, A. A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)
  64. 64.
    Cho, K.: Noisy parallel approximate decoding for conditional recurrent language model. CoRR abs/1605.03835 (2016), http://arxiv.org/ abs/1605.03835
  65. 65.
    Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014), http://arxiv.org/abs/1410.5401
  66. 66.
    Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large tar- get vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Process- ing (Volume 1: Long Papers). pp. 1–10. Association for Computational Linguistics, Beijing, China (7 2015). https://doi.org/10.3115/v1/P15-1001, https://www.aclweb. org/anthology/P15-1001
Last modified 2yr ago
Copy link