참고문헌
Last updated
Last updated
Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (10 ), 1533–1545 (Oct 2014 ).
Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., Bengio, Y.: An actor-critic algorithm for sequence prediction (2017 ), https:// openreview.net/forum?id=SJDaqqveg
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015 )
Bajgar, O., Kadlec, R., Kleindienst, J.: Embracing data abundance: Booktest dataset for reading comprehension. CoRR abs/1610.00956 (2016 )
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146 (dec 2017), 1010
Chambers, N., Jurafsky, D.: Improving the use of pseudo-words for evaluating selectional preferences. In: Proceedings of the 48th Annual Meeting of the Associ- ation for Computational Linguistics. pp. 445–453. Association for Computational Linguistics, Uppsala, Sweden (Jul 2010), . org/anthology/P10- 1046
Chiu, C., Sainath, T.N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R.J., Rao, K., Gonina, E., Jaitly, N., Li, B., Chorowski, J., Bacchiani, M.: State-of-the-art speech recognition with sequence-to-sequence models. In: ICASSP. pp. 4774–4778. IEEE (2018 )
Church,K.,A.Gale, W.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech and Language 5, 19–54 (01 1991). 2308 (91 )90016-J
Currey, A., Miceli Barone, A.V., Heafield, K.: Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the Second Conference on Machine Translation. pp. 148–156. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). . org/10.18653/v1/W17-4715,
Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-xl: Attentive language models beyond a fixed-length context. CoRR abs/1901.02860 (2019 ),
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018),
Erk, K.: A simple, similarity-based model for selectional preferences. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. pp. 216–223. Association for Computational Linguistics, Prague, Czech Republic (Jun 2007 ),
Gao, J.: An introduction to deep learning for natural language processing. In: International Summer School on Deep Learning 2017 (2017), https:// www.microsoft.com/en-us/research/wp-content/uploads/2017/07/dl- summer- school-2017.-Jianfeng-Gao.v2.pdf
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y. N.: Convolutional sequence to sequence learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1243–1252. PMLR, International Convention Centre, Sydney, Australia (06–11 Aug 2017), http:// proceedings.mlr.press/v70/gehring17a.html
Gong, C., He, D., Tan, X., Qin, T., Wang, L., Liu, T. Y.: Frage: Frequency- agnostic word representation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 1334–1345. Curran Associates, Inc. (2018), word- representation.pdf
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672– 2680. Curran Associates, Inc. (2014), generative-adversarial-nets.pdf
Grave, E., Joulin, A., Usunier, N.: Improving neural language models with a continuous cache. In: ICLR. OpenReview.net (2017 )
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwin ′ska, A., Go ′mez, S., Grefenstette, E., Ramalho, T., Agapiou, J., Puigdom`enech Badia, A., Moritz Hermann, K., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538 (10 2016 ).
Gulcehre, C., Firat, Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H., Bougares, F., Schwenk, H., Bengio, Y.: On using monolingual corpora in neural machine trans- lation. CoRR abs/1503.03535 (2015 ),
Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., Li, M., Liu, S., Liu, T., Luo, R., Menezes, A., Qin, T., Seide, F., Tan, X., Tian, F., Wu, L., Wu, S., Xia, Y., Zhang, D., Zhang, Z., Zhou, M.: Achieving human parity on automatic chinese to english news translation. CoRR abs/1803.05567 (2018), http:// arxiv.org/abs/1803.05567
He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.Y., Ma, W.Y.: Dual learning for machine translation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 820–828. Curran Associates, Inc. (2016a), paper/6469-dual- learning-for-machine-translation.pdf
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (June 2016). CVPR.2016.90
Hoang, L., Wiseman, S., Rush, A.: Entity tracking improves cloze-style reading comprehension. In: Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing. pp. 1049–1055. Association for Computational Linguistics, Brussels, Belgium (Oct-Nov 2018), . aclweb.org/anthology/D18-1130
Isola, P., Zhu, J.Y., Zhou, T., Efros, A. A.: Image-to-image translation with conditional adversarial networks. arxiv (2016 )
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel- softmax. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings (2017 ),
Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Vi ́egas, F., Wattenberg, M., Corrado, G., Hughes, M., Dean, J.: Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5, 339–351 (2017),
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. pp. 427–431. Association for Computational Linguistics, Valencia, Spain (Apr 2017 ),
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017),
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018 ), . org/abs/1812.04948
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP). pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (Oct 2014). ,
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http:// arxiv.org/abs/1412.6980
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013 ), http:// arxiv.org/abs/1312.6114, cite arxiv:1312.6114
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014), http:// arxiv.org/abs/1312.6114
Kipyatkova, I., Karpov, A.: Recurrent neural network-based language model- ing for an automatic russian speech recognition system. In: 2015 Artificial In- telligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). pp. 33–38 (Nov 2015 ).
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing. vol. 1, pp. 181–184 vol.1 (May 1995). ICASSP.1995.479394
Lample, G., Conneau, A.: Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019),
Lample, G., Conneau, A., Ranzato, M., Denoyer, L., J ́egou, H.: Word translation without parallel data. In: ICLR. OpenReview.net (2018 )
Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 469–477. Curran Associates, Inc. (2016),
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention- based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (Sep 2015). , D15-1166
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013), paper/5021-distributed-representations- of-words-and-phrases-and-their- compositionality.pdf
Paperno, D., Kruszewski ,G. ,Lazaridou, A., Pham, N.Q., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., Fernandez, R.: The LAMBADA dataset: Word prediction requiring a broad discourse context. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1525– 1534. Association for Computational Linguistics, Berlin, Germany (Aug 2016 ). , https:// www.aclweb.org/anthology/P16- 1144
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (Oct 2014). v1/D14-1162,
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (Jun 2018). , 1202
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016 )
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018 )
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2018 )
Resnik, P.: Selectional preference and sense disambiguation. In: Tagging Text with Lexical Semantics: Why, What, and How? (1997), . org/anthology/W97-0209
Ruder, S.: Neural Transfer Learning for Natural Language Processing. Ph.D. thesis, National University of Ireland, Galway (2019 )
Sennrich, R., Birch, A., Currey, A., Germann, U., Haddow, B., Heafield, K., Miceli Barone, A.V., Williams, P.: The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the Second Conference on Machine Translation. pp. 389–399. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). ,
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1715– 1725. Association for Computational Linguistics, Berlin, Germany (Aug 2016a). , anthology/P16- 1162
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 86– 96. Association for Computational Linguistics, Berlin, Germany (Aug 2016b). , anthology/P16- 1009
Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum risk training for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1683– 1692. Association for Computational Linguistics, Berlin, Germany (Aug 2016). , . aclweb.org/anthology/P16- 1159
Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH. pp. 194–197. ISCA (2012 )
Sutskever, I., Vinyals, O., Le, Q. V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3104– 3112. Curran Associates, Inc. (2014), paper/5346-sequence- to-sequence-learning-with-neural-networks.pdf
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Muller, K. (eds.) Advances in Neural Information Processing Systems 12, pp. 1057–1063. MIT Press (2000), paper/1713-policy-gradient- methods-for-reinforcement-learning-with- function-approximation.pdf
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017 ), is-all-you-need.pdf
Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., liu, G., Liu, T. Y.: Dual transfer learning for neural machine translation with marginal distribution regularization. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (February 2018), en-us/research/publication/dual- transfer-learning-neural-machine- translation-marginal-distribution-regularization/
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Lukasz Kaiser, Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Cor- rado, G., Hughes, M., Dean, J.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016 ),
Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., Liu, T.: Dual supervised learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. pp. 3789–3798 (2017a ),
Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., Liu, T. Y.: Deliberation networks: Sequence generation beyond one-pass decoding. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 1784–1794. Curran Associates, Inc. (2017b), paper/6775-deliberation-networks- sequence-generation-beyond-one-pass- decoding.pdf
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 2852– 2858 (2017), view/14344
Zhang, Z., Liu, S., Li, M., Zhou, M., Chen, E.: Joint training for neural machine translation models with monolingual data. In: AAAI. pp. 555–562. AAAI Press (2018)
Zhu, J.Y., Park, T., Isola, P., Efros, A. A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)
Cho, K.: Noisy parallel approximate decoding for conditional recurrent language model. CoRR abs/1605.03835 (2016), abs/1605.03835
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014),
Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large tar- get vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Process- ing (Volume 1: Long Papers). pp. 1–10. Association for Computational Linguistics, Beijing, China (7 2015). , . org/anthology/P15-1001