Natural Language Processing with PyTorch
  • 소개글
  • 서문
  • Index
  • 딥러닝을 활용한 자연어 처리 개요
    • 자연어 처리란 무엇일까?
    • 딥러닝 소개
    • 왜 자연어 처리는 어려울까?
    • 무엇이 한국어 자연어 처리를 더욱 어렵게 만들까?
    • 자연어 처리의 최근 추세
  • 기초 수학
    • 서문
    • 랜덤 변수와 확률 분포
    • 쉬어가기: 몬티 홀 문제
    • 기대값과 샘플링
    • Maximum Likelihood Estimation
    • 정보 이론
    • 쉬어가기: MSE 손실 함수와 확률 분포 함수
    • 마치며
  • Hello 파이토치
    • 딥러닝을 시작하기 전에
    • 설치 방법
    • 짧은 튜토리얼
    • 쉬어가기: 윈도우즈 개발 환경 구축
  • 전처리
    • 전처리란
    • 코퍼스 수집
    • 코퍼스 정제
    • 분절
    • 병렬 코퍼스 정렬
    • 서브워드 분절
    • 분절 복원
    • 토치텍스트
  • 유사성과 모호성
    • 단어의 의미
    • One-hot 인코딩
    • 시소러스를 활용한 단어 의미 파악
    • 특징
    • 특징 추출하기: TF-IDF
    • 특징 벡터 만들기
    • 특징 유사도 구하기
    • 단어 중의성 해소
    • Selectional Preference
    • 마치며
  • 단어 임베딩
    • 들어가며
    • 차원 축소
    • 흔한 오해 1
    • Word2Vec
    • GloVe
    • Word2Vec 예제
    • 마치며
  • 시퀀스 모델링
    • 들어가며
    • Recurrent Neural Network
    • Long Short Term Memory
    • Gated Recurrent Unit
    • 그래디언트 클리핑
    • 마치며
  • 텍스트 분류
    • 들어가기
    • 나이브 베이즈를 활용하기
    • 흔한 오해 2
    • RNN을 활용하기
    • CNN을 활용하기
    • 쉬어가기: 멀티 레이블 분류
    • 마치며
  • 언어 모델링
    • 들어가며
    • n-gram
    • 언어 모델의 평가 방법
    • SRILM을 활용한 n-gram 실습
    • NNLM
    • 언어 모델의 활용
    • 마치며
  • 신경망 기계번역
    • 들어가며
    • Sequence-to-Sequence
    • Attention
    • Input Feeding
    • 자기회귀 속성과 Teacher Forcing 훈련 방법
    • 탐색(추론)
    • 성능 평가
    • 마치며
  • 신경망 기계번역 심화 주제
    • 다국어 신경망 번역
    • 단일 언어 코퍼스를 활용하기
    • 트랜스포머
    • 마치며
  • 강화학습을 활용한 자연어 생성
    • 들어가며
    • 강화학습 기초
    • 정책 기반 강화학습
    • 자연어 생성에 강화학습 적용하기
    • 강화학습을 활용한 지도학습
    • 강화학습을 활용한 비지도학습
    • 마치며
  • 듀얼리티 활용
    • 들어가며
    • 듀얼리티를 활용한 지도학습
    • 듀얼리티를 활용한 비지도학습
    • 쉬어가기: Back-translation을 재해석하기
    • 마치며
  • NMT 시스템 구축
    • 파이프라인
    • 구글의 NMT
    • 에딘버러 대학의 NMT
    • MS의 NMT
  • 전이학습
    • 전이학습이란?
    • 기존의 사전 훈련 방식
    • ELMo
    • BERT
    • GPT-2
    • XLNet
    • 마치며
  • 이 책을 마치며
  • 참고문헌
Powered by GitBook
On this page

참고문헌

Previous이 책을 마치며

Last updated 5 years ago

  1. Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (10 ), 1533–1545 (Oct 2014 ).

  2. Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., Bengio, Y.: An actor-critic algorithm for sequence prediction (2017 ), https:// openreview.net/forum?id=SJDaqqveg

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015 )

  4. Bajgar, O., Kadlec, R., Kleindienst, J.: Embracing data abundance: Booktest dataset for reading comprehension. CoRR abs/1610.00956 (2016 )

  5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146 (dec 2017), 1010

  6. Chambers, N., Jurafsky, D.: Improving the use of pseudo-words for evaluating selectional preferences. In: Proceedings of the 48th Annual Meeting of the Associ- ation for Computational Linguistics. pp. 445–453. Association for Computational Linguistics, Uppsala, Sweden (Jul 2010), . org/anthology/P10- 1046

  7. Chiu, C., Sainath, T.N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R.J., Rao, K., Gonina, E., Jaitly, N., Li, B., Chorowski, J., Bacchiani, M.: State-of-the-art speech recognition with sequence-to-sequence models. In: ICASSP. pp. 4774–4778. IEEE (2018 )

  8. Church,K.,A.Gale, W.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech and Language 5, 19–54 (01 1991). 2308 (91 )90016-J

  9. Currey, A., Miceli Barone, A.V., Heafield, K.: Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the Second Conference on Machine Translation. pp. 148–156. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). . org/10.18653/v1/W17-4715,

  10. Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-xl: Attentive language models beyond a fixed-length context. CoRR abs/1901.02860 (2019 ),

  11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018),

  12. Erk, K.: A simple, similarity-based model for selectional preferences. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. pp. 216–223. Association for Computational Linguistics, Prague, Czech Republic (Jun 2007 ),

  13. Gao, J.: An introduction to deep learning for natural language processing. In: International Summer School on Deep Learning 2017 (2017), https:// www.microsoft.com/en-us/research/wp-content/uploads/2017/07/dl- summer- school-2017.-Jianfeng-Gao.v2.pdf

  14. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y. N.: Convolutional sequence to sequence learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1243–1252. PMLR, International Convention Centre, Sydney, Australia (06–11 Aug 2017), http:// proceedings.mlr.press/v70/gehring17a.html

  15. Gong, C., He, D., Tan, X., Qin, T., Wang, L., Liu, T. Y.: Frage: Frequency- agnostic word representation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 1334–1345. Curran Associates, Inc. (2018), word- representation.pdf

  16. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672– 2680. Curran Associates, Inc. (2014), generative-adversarial-nets.pdf

  17. Grave, E., Joulin, A., Usunier, N.: Improving neural language models with a continuous cache. In: ICLR. OpenReview.net (2017 )

  18. Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwin ′ska, A., Go ′mez, S., Grefenstette, E., Ramalho, T., Agapiou, J., Puigdom`enech Badia, A., Moritz Hermann, K., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538 (10 2016 ).

  19. Gulcehre, C., Firat, Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H., Bougares, F., Schwenk, H., Bengio, Y.: On using monolingual corpora in neural machine trans- lation. CoRR abs/1503.03535 (2015 ),

  20. Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., Li, M., Liu, S., Liu, T., Luo, R., Menezes, A., Qin, T., Seide, F., Tan, X., Tian, F., Wu, L., Wu, S., Xia, Y., Zhang, D., Zhang, Z., Zhou, M.: Achieving human parity on automatic chinese to english news translation. CoRR abs/1803.05567 (2018), http:// arxiv.org/abs/1803.05567

  21. He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.Y., Ma, W.Y.: Dual learning for machine translation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 820–828. Curran Associates, Inc. (2016a), paper/6469-dual- learning-for-machine-translation.pdf

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (June 2016). CVPR.2016.90

  23. Hoang, L., Wiseman, S., Rush, A.: Entity tracking improves cloze-style reading comprehension. In: Proceedings of the 2018 Conference on Em- pirical Methods in Natural Language Processing. pp. 1049–1055. Association for Computational Linguistics, Brussels, Belgium (Oct-Nov 2018), . aclweb.org/anthology/D18-1130

  24. Isola, P., Zhu, J.Y., Zhou, T., Efros, A. A.: Image-to-image translation with conditional adversarial networks. arxiv (2016 )

  25. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel- softmax. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings (2017 ),

  26. Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Vi ́egas, F., Wattenberg, M., Corrado, G., Hughes, M., Dean, J.: Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5, 339–351 (2017),

  27. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. pp. 427–431. Association for Computational Linguistics, Valencia, Spain (Apr 2017 ),

  28. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017),

  29. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018 ), . org/abs/1812.04948

  30. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP). pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (Oct 2014). ,

  31. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http:// arxiv.org/abs/1412.6980

  32. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013 ), http:// arxiv.org/abs/1312.6114, cite arxiv:1312.6114

  33. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014), http:// arxiv.org/abs/1312.6114

  34. Kipyatkova, I., Karpov, A.: Recurrent neural network-based language model- ing for an automatic russian speech recognition system. In: 2015 Artificial In- telligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). pp. 33–38 (Nov 2015 ).

  35. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing. vol. 1, pp. 181–184 vol.1 (May 1995). ICASSP.1995.479394

  36. Lample, G., Conneau, A.: Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019),

  37. Lample, G., Conneau, A., Ranzato, M., Denoyer, L., J ́egou, H.: Word translation without parallel data. In: ICLR. OpenReview.net (2018 )

  38. Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 469–477. Curran Associates, Inc. (2016),

  39. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention- based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (Sep 2015). , D15-1166

  40. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013), paper/5021-distributed-representations- of-words-and-phrases-and-their- compositionality.pdf

  41. Paperno, D., Kruszewski ,G. ,Lazaridou, A., Pham, N.Q., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., Fernandez, R.: The LAMBADA dataset: Word prediction requiring a broad discourse context. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1525– 1534. Association for Computational Linguistics, Berlin, Germany (Aug 2016 ). , https:// www.aclweb.org/anthology/P16- 1144

  42. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (Oct 2014). v1/D14-1162,

  43. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (Jun 2018). , 1202

  44. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016 )

  45. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018 )

  46. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2018 )

  47. Resnik, P.: Selectional preference and sense disambiguation. In: Tagging Text with Lexical Semantics: Why, What, and How? (1997), . org/anthology/W97-0209

  48. Ruder, S.: Neural Transfer Learning for Natural Language Processing. Ph.D. thesis, National University of Ireland, Galway (2019 )

  49. Sennrich, R., Birch, A., Currey, A., Germann, U., Haddow, B., Heafield, K., Miceli Barone, A.V., Williams, P.: The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the Second Conference on Machine Translation. pp. 389–399. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). ,

  50. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1715– 1725. Association for Computational Linguistics, Berlin, Germany (Aug 2016a). , anthology/P16- 1162

  51. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 86– 96. Association for Computational Linguistics, Berlin, Germany (Aug 2016b). , anthology/P16- 1009

  52. Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum risk training for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1683– 1692. Association for Computational Linguistics, Berlin, Germany (Aug 2016). , . aclweb.org/anthology/P16- 1159

  53. Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH. pp. 194–197. ISCA (2012 )

  54. Sutskever, I., Vinyals, O., Le, Q. V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 3104– 3112. Curran Associates, Inc. (2014), paper/5346-sequence- to-sequence-learning-with-neural-networks.pdf

  55. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Muller, K. (eds.) Advances in Neural Information Processing Systems 12, pp. 1057–1063. MIT Press (2000), paper/1713-policy-gradient- methods-for-reinforcement-learning-with- function-approximation.pdf

  56. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017 ), is-all-you-need.pdf

  57. Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., liu, G., Liu, T. Y.: Dual transfer learning for neural machine translation with marginal distribution regularization. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (February 2018), en-us/research/publication/dual- transfer-learning-neural-machine- translation-marginal-distribution-regularization/

  58. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Lukasz Kaiser, Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Cor- rado, G., Hughes, M., Dean, J.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016 ),

  59. Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., Liu, T.: Dual supervised learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. pp. 3789–3798 (2017a ),

  60. Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., Liu, T. Y.: Deliberation networks: Sequence generation beyond one-pass decoding. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 1784–1794. Curran Associates, Inc. (2017b), paper/6775-deliberation-networks- sequence-generation-beyond-one-pass- decoding.pdf

  61. Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 2852– 2858 (2017), view/14344

  62. Zhang, Z., Liu, S., Li, M., Zhou, M., Chen, E.: Joint training for neural machine translation models with monolingual data. In: AAAI. pp. 555–562. AAAI Press (2018)

  63. Zhu, J.Y., Park, T., Isola, P., Efros, A. A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)

  64. Cho, K.: Noisy parallel approximate decoding for conditional recurrent language model. CoRR abs/1605.03835 (2016), abs/1605.03835

  65. Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014),

  66. Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large tar- get vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Process- ing (Volume 1: Long Papers). pp. 1–10. Association for Computational Linguistics, Beijing, China (7 2015). , . org/anthology/P15-1001

https://doi.org/10.1109/TASLP.2014.2339736
https://www.aclweb.org/anthology/Q17-
https://www.aclweb
https://doi.org/10.1016/0885-
https://doi
https://www.aclweb.org/anthology/W17-4715
http://arxiv.org/abs/1901.02860
http://arxiv.org/abs/1810.04805
https://www.aclweb.org/anthology/P07-1028
http://papers.nips.cc/paper/7408-frage-frequency-agnostic-
http://papers.nips.cc/paper/5423-
https://doi.org/10.1038/nature20101
http://arxiv.org/abs/1503.03535
http://papers.nips.cc/
https://doi.org/10.1109/
https://www
https://openreview.net/forum?id=rkE3y85ee
https://www.aclweb.org/anthology/Q17-1024
https://www.aclweb.org/anthology/E17-2068
http://arxiv.org/abs/1710.10196
http://arxiv
https://doi.org/10.3115/v1/D14-1181
https://www.aclweb.org/anthology/D14-1181
https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382966
https://doi.org/10.1109/
http://arxiv.org/abs/1901.07291
http://papers.nips.cc/paper/6544-coupled-generative-adversarial-networks.pdf
https://doi.org/10.18653/v1/D15-1166
https://www.aclweb.org/anthology/
http://papers.nips.cc/
https://doi.org/10.18653/v1/P16-1144
https://doi.org/10.3115/
https://www.aclweb.org/anthology/D14-1162
https://doi.org/10.18653/v1/N18-1202
https://www.aclweb.org/anthology/N18-
https://www.aclweb
https://doi.org/10.18653/v1/W17-4739
https://www.aclweb.org/anthology/W17-4739
https://doi.org/10.18653/v1/P16-1162
https://www.aclweb.org/
https://doi.org/10.18653/v1/P16-1009
https://www.aclweb.org/
https://doi.org/10.18653/v1/P16-1159
https://www
http://papers.nips.cc/
http://papers.nips.cc/
http://papers.nips.cc/paper/7181-attention-
https://www.microsoft.com/
http://arxiv.org/abs/1609.08144
http://proceedings.mlr.press/v70/xia17a.html
http://papers.nips.cc/
http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/
http://arxiv.org/
http://arxiv.org/abs/1410.5401
https://doi.org/10.3115/v1/P15-1001
https://www.aclweb