a sunny day at the beach. a sand sculpture of laptop computer. the word “Tech” is drawn in the sand –ar 5:3 –style raw
“You can now start building with Gemini 1.5 Flash and Pro models using our API pay-as-you-go service for developers. 🛠️ Flash is designed to be fast and efficient to serve – and we’re increasing the rate limit to 1000 requests per minute. Find out more →
“DPO(Direct Preference Optimization) can NOT be as good as PPO (Proximal Policy Optimization) – From latest Google research 🤔 It investigates why online reinforcement learning algorithms (like PPO) for aligning LLMs outperform offline algorithms (like DPO), despite both using
“1. Praktika (@PraktikaEnglish) Round: $35.5M Series A Praktika is a language learning app that uses AI avatars to simulate real-life conversational scenarios. Demo:
ICLR 2024 — Best Papers & Talks (ImageGen, Vision, Transformers, State Space Models) ft. Christian Szegedy, Ilya Sutskever
“🔁 Repetitions in LangSmith 🔁 You can now run multiple repetitions of your experiment in LangSmith. This helps smooth out noise from variability introduced by your application or from your LLM-as-a-judge evaluator, so you can build confidence in the results of your experiment.
“Grokked Transformers are Implicit Reasoners A Mechanistic Journey to the Edge of Generalization We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two
“Meta presents Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Features trained on their automatically curated datasets outperform ones trained on manually curated data
“Introducing @mem0ai – long-term memory for any LLM. ✨ Features: • Remembers and recalls contextually • Enhances personalization and relevance of responses • Compatible with any LLM 👇🏻 Sign up for early access in the next tweet
“Today we published our VMware Validated Solution – Private AI Ready Infrastructure for VMware Cloud Foundation. If you’re ready to dig into the technical details on how to architect, deploy and operate #PrivateAI infrastructure, this is a great place to start.” / X
“I’ve been asked by few first year PhD about how to start LLM research on X, say long context modeling. My number one suggestion — though it seems a bit of unconventional — is *not* to read any papers related to long-context, but to talk to the model – Talk to the model about a” / X
“Stacking Your Transformers A Closer Look at Model Growth for Efficient LLM Pre-Training LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of
“VeLoRA Memory Efficient Training using Rank-1 Sub-Token Projections Large language models (LLMs) have recently emerged as powerful tools for tackling many language-processing tasks. Despite their success, training and fine-tuning these models is still far too
“AutoCoder Enhancing Code Large Language Model with AIEV-Instruct We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (90.9% vs. 90.2%). In addition, AutoCoder offers a more
“So position embedding is this year’s linear attention, isn’t it? But i prefer this. Linear attention was about removing capacity from the model, which didn’t make sense long term. Position embedding is about adding missing capabilities to the model, which makes a lot more sense.” / X
“@danielhanchen I thought I didn’t have to deal with these, but already the 350M model (14 hours of 8 GPUs working) sometimes randomly hangs with a cryptic MPI error once in a while. So I have to put the whole optimization into a `while 1` loop and a script that watches the log file and sends” / X
1-bit LLMs Could Solve AI’s Energy Demands – IEEE Spectrum
“I tried adding external memory to positional embedding, ie a readout from an external memory system which interacts with the embeddings based on some function of the token embeddings (here, the mean across the sequence). CoPE might be good to try too, it’s a really cool idea.” / X
[21 May 2024] Life after DPO (for alignment) – Google Slides
[2405.16749v1] DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models
[AINews] FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you’re welcome) • Buttondown
“A 6 hr performance of shrink at nuit blanche in Paris 📹 Resimlerle sanat
The decline of the user interface | InfoWorld
The 350M model I trained last night was 30B tokens, 14 hours, ~$200. Convenientl… | Hacker News
Stanford CS 224N | Natural Language Processing with Deep Learning
Neural-Circuit-Diagrams/Guide/Guide.ipynb at main · vtabbott/Neural-Circuit-Diagrams · GitHub





Leave a Reply