Long-Context on CharmingGroot

Long-Context on CharmingGroot https://charminggroot.github.io/tags/long-context/ Recent content in Long-Context on CharmingGroot Hugo ko-kr Sun, 14 Jun 2026 00:00:00 +0000 086. FlashAttention — 어텐션 메모리 최적화 https://charminggroot.github.io/posts/086-flash-attention/ Sun, 14 Jun 2026 00:00:00 +0000 https://charminggroot.github.io/posts/086-flash-attention/ FlashAttention(2022)은 트랜스포머 어텐션의 메모리 병목을 IO-Aware 타일링으로 해결한다. 어텐션 행렬을 HBM에 저장하지 않고 SRAM에서 직접 계산해 메모리 사용량을 O(n)으로 줄이고 속도를 2~4배 높인다.