graham1@gekinzuku.comtoLinux@lemmy.ml•SHARE WITH THE CLASS: What aliases are you using?English
4·
1 year agowttr gang
wttr gang
I believe your “They use attention mechanisms to figure out which parts of the text are important” is just a restatement of my “break it into contextual chunks”, no?
Large language models literally do subspace projections on text to break it into contextual chunks, and then memorize the chunks. That’s how they’re defined.
Source: the paper that defined the transformer architecture and formulas for large language models, which has been cited in academic sources 85,000 times alone https://arxiv.org/abs/1706.03762
Definitely RE for me. I couldn’t sleep after the first time I saw a crimson head. The sharks were terrifying too