Ethan B. Holland

Over 54,400 manually organized AI links and counting

a llama standing next to a large monitor full of computer code. large text label reads "Tech" --chaos 20 --ar 4:3 --style raw --personalize ytt1577 --v 6.1

Tech Papers, Training, and Development: Week Ending 07/26/2024

July 26, 2024

a llama standing next to a large monitor full of computer code. large text label reads “Tech” –chaos 20 –ar 4:3 –style raw –personalize ytt1577 –v 6.1

“Drowning in AI jargon? We’ve got you covered! 🌊📚 A good AI cheat sheet to boost your AI literacy today.

https://twitter.com/fdaudens/status/1815734403819041215

“🛠️ Seamless LangSmith tracing in LangGraph.js 🕸️ You can now use LangSmith to trace arbitrary functions and SDKs in LangGraph.js with no additional configuration! If you prefer using model SDKs directly, it’s now easier than ever to use LangSmith’s tracing, evaluation, and

https://twitter.com/LangChainAI/status/1815439685117993349

“🚀 Introducing the Model Drops Tracker! 🕵️‍♂️ Feeling overwhelmed by the AI model release frenzy? 🤯 You’re not alone! I built this simple tool to help us all keep up: – Filter recent models from the @huggingface Hub – Set minimum likes threshold – Choose how recent you want to go

https://twitter.com/fdaudens/status/1816192256967332090

Workera Launches New, Free Skill Assessments for

https://www.globenewswire.com/news-release/2024/07/17/2914614/0/en/Workera-Launches-New-Free-Skill-Assessments-for-Individuals-to-Verify-and-Benchmark-AI-Skills.html

“It’s been a little longer than usual since my last post, but I’ve been writing! My long-form writeup on everything you need to know about LLM-as-a-judge is out now… Why is LLM-as-a-Judge so popular? LLM-as-a-Judge evaluates the quality of an LLM’s output by prompting another,

https://twitter.com/cwolferesearch/status/1815405425866518846

“figuring out good prompts is only half the battle, the best AI implementations include multiple layers and feedback loops that need to work in concert with one another, all prone to breaking in weird ways” / X

https://twitter.com/nptacek/status/1816179089348427839

Working with AI (Part 2): Code Conversion

Working with AI (Part 2): Code Conversion

“Synthetic data can beat its teacher! The AI-MO team released their winning dataset with an additional fine-tuned @Alibaba_Qwen 2 model that approaches or surpasses @OpenAI GPT-4o and @AnthropicAI Claude 3.5 in match competitions. 👀 There was a sentiment that fine-tuned models

https://twitter.com/_philschmid/status/1814982420602421414

“A new paper suggests too much training on AI-produced content causes AI models to break. This is an ongoing discussion, with lots of research and discussion about when/if synthetic training data works. So a helpful paper, but likely not the final word.

https://twitter.com/emollick/status/1816512149621280887

“There is still no benchmark for LLM hallucination rates. Few benchmarks have comparisons to humans There are no common benchmarks that cover use cases in innovation, writing, persuasion, human interaction, education, creativity, etc. Yet LLMs are often built towards benchmarks” / X

https://twitter.com/emollick/status/1814513180632084789

How to Create High Quality Synthetic Data for Fine-Tuning LLMs

https://gretel.ai/blog/how-to-create-high-quality-synthetic-data-for-fine-tuning-llms

“Patronus AI announced the release of ‘Lynx’, a new open-source hallucination detection model They claim that it outperforms existing AI models such as GPT-4, Claude-3-Sonnet, and more An important challenge to solve

https://twitter.com/adcock_brett/status/1815055289864827304

Building A Generative AI Platform

https://huyenchip.com/2024/07/25/genai-platform.html

“”Intelligence Destruction Cycle” (IDC), a novel framework for understanding the rapid obsolescence and replacement of artificial intelligence models in the current AI research landscape. Drawing inspiration from Schumpeter’s theory of creative destruction, the IDC posits that the” / X

https://twitter.com/far__el/status/1816152435112464844

Three Archetypes of AI Application Startups

https://www.tanayj.com/p/three-archetypes-of-ai-application

“A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data Despite the availability of international prize-money competitions, scaled vehicles, and simulation environments, research on autonomous racing and the control of sports cars operating close to the limit of

https://twitter.com/_akhaliq/status/1815970911998095771

An Update on our Make Designs Feature | Figma Blog

https://www.figma.com/blog/inside-figma-a-retrospective-on-make-designs

[2403.19967v1] Rewrite the Stars

https://arxiv.org/abs/2403.19967v1

[2404.05218v1] Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning

https://arxiv.org/abs/2404.05218v1

[2407.13623] Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

https://arxiv.org/abs/2407.13623

[2407.15773v1] STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay

https://arxiv.org/abs/2407.15773v1

[2407.16312v1] MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning

https://arxiv.org/abs/2407.16312v1

[2407.16375v1] Ranking protein-protein models with large language models and graph neural networks

https://arxiv.org/abs/2407.16375v1

[2407.16957v1] Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal

https://arxiv.org/abs/2407.16957v1

[2407.16993v1] LoFormer: Local Frequency Transformer for Image Deblurring

https://arxiv.org/abs/2407.16993v1

[2407.17418v1] 3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities

https://arxiv.org/abs/2407.17418v1

“@Laz4rz Lack of standardization in the benchmarks. To be fair, MMLU is not that bad compared to many other evals” / X

https://twitter.com/maximelabonne/status/1816067644040118512

Optimizing LLMs for Cost and Quality with OctoAI’s Experts | OctoAI

https://octo.ai/cp/webinar-optimizing-llms

“Composable optimizers over modular NLP programs are the future! If familiar w/ DSPy lingo, you should compose BootstrapFewShot-based optimizers (like RS or bayesian MIPRO) with BootstrapFinetune! Follow @dilarafsoylu for her DSPy optimizer releases soon!

https://twitter.com/lateinteraction/status/1815423187418763308

“🚨When building LM systems for a task, should you explore finetuning or prompt optimization? Paper w/ @dilarafsoylu @ChrisGPotts finds that you should do both! New DSPy optimizers that alternate optimizing weights & prompts can deliver up to 26% gains over just optimizing one!

https://twitter.com/lateinteraction/status/1815423177272824022

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.

Be Sure To Read This Week’s Main Post:

This week’s executive overview and top links are here:

AI News #43: Week Ending 07/26/2024 with Executive Summary and Top 97 Links

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

a thankful robot extends a bouquet of flowers toward the camera --chaos 30 --ar 4:3 --style raw --personalize jczhn5o — a thankful robot extends a bouquet of flowers toward the camera –chaos 30 –ar 4:3 –style raw –personalize jczhn5o

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.