[NLP]Intro to NLP, Bag-of-Words

Notice

Recent Posts

Recent Comments

Link

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

크크루쿠쿠

[NLP]Intro to NLP, Bag-of-Words 본문

DeepLearning/부스트캠프 AI Tech

[NLP]Intro to NLP, Bag-of-Words

JH_KIM 2021. 9. 6. 14:24

Intro to Natural Language Processing(NLP)

Natural language processing (major conferences: ACL, EMNLP, NAACL)

- Low-level parsing

- Tokenization, Stemming(어근 추출)

- Word and phrase level

- NER(고유명사 인식), POS tagging (품사,성분),noun-phrase chunking, dependency parsing, conference resolution

- Sentence level

- Sentiment analysis, machine translation

- Multi-sentence and paragraph level

- Entailment prediction(두 문장과의 관계), question answering, dialog systems, summarizaion

Text mining(major conferences: KDD, The WebConf (formerly, WWW), WSDM, CIKM, ICWSM)

- Extract useful information and insights from text and document data

- Document clustering

- Highly related to computational social science

Information retrieval (major conferences: SIGIR, WSDM, CIKM, RecSys)

- Highly related to computational social science

Trends of NLP

- 자연어처리는 속도는 느리지만 꾸준히 발전해왔음

- RNN-family models -> attention 기반 transformer models 들로 바뀜

- 번역기술 -> 딥러닝 기반 model임

- 예전에는 NLP tasks 마다 다른 모델이 필요했지만 지금은 self-attention 기반 모델로 통일되어 발전해나가고 있다.

- 대규모의 자원과 data가 필요하다

Bag-of-Words

1. Constructing the vocabulary containg unique words

2. Encoding unique words to one-hot vectors

- 단어별 거리 -> 루트2

- 유사도 0

ex) “John really really loves this movie“

John + really + really + loves + this + movie: [1 2 1 1 1 0 0 0]

Bayes’ Rule Applied to Documents and Classes

d 라는 document와 c라는 class가 있을 때 C는 총 class

d가 c에 속할 확률

P(d)는 d가 고정된 상황이기 때문에 1이 된다.

P(d|c)의 경우는 카테고리 c 가 고정되었을때 d가 나타날 확률인데 이는

이렇게 표현할 수 있다

ex)

문제점 -> 처음 나오는 단어가 나올 시 0이된다.

저작자표시 비영리 변경금지

'DeepLearning > 부스트캠프 AI Tech' 카테고리의 다른 글

[NLP]Recurrent Neural Network and Language Modeling (0)	2021.09.07
[NLP]Word Embedding (0)	2021.09.06
Pytorch Troubleshooting (0)	2021.08.20
Pytorch Hyperparameter Tuning (0)	2021.08.20
Pytorch Multi-GPU (0)	2021.08.20

'DeepLearning/부스트캠프 AI Tech' Related Articles

Comments

크크루쿠쿠

[NLP]Intro to NLP, Bag-of-Words 본문

[NLP]Intro to NLP, Bag-of-Words

Intro to Natural Language Processing(NLP)

Natural language processing (major conferences: ACL, EMNLP, NAACL)

Text mining(major conferences: KDD, The WebConf (formerly, WWW), WSDM, CIKM, ICWSM)

Information retrieval (major conferences: SIGIR, WSDM, CIKM, RecSys)

Trends of NLP

Bag-of-Words

1. Constructing the vocabulary containg unique words

2. Encoding unique words to one-hot vectors

Bayes’ Rule Applied to Documents and Classes

'DeepLearning > 부스트캠프 AI Tech' 카테고리의 다른 글

티스토리툴바