Deep Learning/Transformer

Attention, Self-Attention, Transformer

  • -
728x90
반응형

1. Attention

1.1. 개념

  • input sequence가 길어지면 output sequence의 정확도가 떨어지는 것을 보정해주기 위한 등장한 기법
  • 데이터 전체를 살펴보고 집중해서 살펴볼 위치를 정하는 매커니즘
    • decoder에서 출력 단어를 예측하는 매 시점마다, encoder에서의 전체 입력 문장을 다시 참고
    • 단, 전체 input sequence를 전부 다 동일한 비율로 참고하는 것이 아니라, 해당 시점에서 예측해야할 output과 연관이 있는 input 부분을 좀 더 집중
    • 학습시키고자 하는 class에 해당하는 부분만 집중하는 효과를 나타낼 수 있음

  • Query, Key, Value로 구성되며, 일반적으로 Key와 Value를 같은 값을 가지게 함
    • Query : 찾고자 하는 대상으로, t시점의 decoder 셀에서의 hidden state
    • Key : 데이터를 찾고자 할 때 참조하는 값으로, 모든 시점의 encoder 셀의 hidden states
    • Value : Key에 대한 값으로, 모든 시점의 encoder 셀의 hidden states

  • query에 대한 Key를 찾아 Value를 계산 하는 과정을 거치는 것이 attention value를 구하는 것이며, key와 query의 유사도에 value를 곱한것을 더하는 방법을 이용
    • 값이 높으면 높은 연관성을 가짐

 

2. Self-Attention

2.1. 개념

  • encoder에서 이루어지는 attention 연산

 

3. Transformer

3.1. 개념

  • encoder-decoder 구조에서 RNN을 사용하지 않음
    • 입력값에 상대적인 위치정보를 더하여 이용
    • 행렬 곱으로 한 번에 모든 연산을 수행
  • one-hot encoding 대신 label smoothing 이용
    • 0과 1 대신, 0과 1이 아닌 그에 가까운 값으로 만드는 것

 

참고 링크

https://wikidocs.net/22893

https://melona94.tistory.com/8

https://glee1228.tistory.com/3

https://ys-cs17.tistory.com/46

https://aistudy9314.tistory.com/68

https://aistudy9314.tistory.com/63

https://velog.io/@nellholic108/computer-vision3-Auto-Encoder

https://medium.com/platfarm/어텐션-메커니즘과-transfomer-self-attention-842498fd3225

https://koreapy.tistory.com/1258

https://blog.promedius.ai/transformer

https://flonelin.wordpress.com/2019/09/12/stand-alone-self-attention-in-vision

https://deepseow.tistory.com/34

https://omicro03.medium.com/attention-is-all-you-need-transformer-paper-정리-83066192d9ab

https://wikidocs.net/22893

https://velog.io/@sjinu/개념정리-Attention-Mechanism

https://facerain.club/transformer-paper

728x90
반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.