Tech Papers, Training, and Development: AI News Week Ending 12/13/2024

Tech Papers, Training, and Development: AI News Week Ending 12/13/2024

December 12, 2024

“today @mainframe is excited to share our $5.5m seed to build new AI interfaces co-led by @lachygroom and @stellation with participation from @basecasevc @weekendfund & more
https://x.com/jsngr/status/1866498187248443495

“Frontier LLMs have shrunk dramatically: GPT-4 had ~1.8T params, while GPT-4o likely has ~200B and Claude 3.5 Sonnet ~400B. Surging inference demand, Chinchilla scaling, distillation, test-time compute scaling, synthetic data all push toward smaller models. Will this trend hold?
https://x.com/tamaybes/status/1867718555049054344

“We’re thrilled to announce our investment in @StainlessAPI, the platform transforming how companies build and maintain high-quality APIs and SDKs. Backed by an impressive roster of clients like OpenAI, Anthropic, and Meta’s Llama Stack, Stainless is reshaping the future of API
https://x.com/a16z/status/1866517563523416366

“I -13x’d an AI Startup’s conversions in 12 hours. Here’s my exact blueprint for non-converting AI landing pages. 🔖 Bookmark it for later, it’s super handy!
https://x.com/michalmalewicz/status/1865357839839150368

“Periodic reminder that you don’t get to drop Attention without dropping several capabilities that rely on Attention” / X
https://x.com/teortaxesTex/status/1867292159415636443

“someone at NeurIPS asked me if they’d “missed the moment on AI” obviously not, unless you believe that a single foundation model lab dominates the economy AND the race outcomes are already known the internet games took 3 decades to play out. 2024 = year two” / X
https://x.com/saranormous/status/1866952426890166372

“Training Large Language Models to Reason in a Continuous Latent Space Introduces a new paradigm for LLM reasoning called Chain of Continuous Thought (COCONUT) Extremely simple change: instead of mapping between hidden states and language tokens using the LLM head and embedding
https://x.com/iScienceLuvr/status/1866353795502158163

“New market maps covering the latest AI startups: {this is a long one, bookmark it} 1. Intelligent-first apps & infra from Insight Partners
https://x.com/chiefaioffice/status/1865414758763049170

“Sepp Hochreiter giving a keynote talk at #NeurIPS2024 about xLSTM having key structural advantages such as very fast inference speed and high parameter efficiency compared to flash attention transformers and state-space models. xLSTM resources:
https://x.com/hardmaru/status/1866896953730273698

“Is scaling done for? Got a sick debate between myself and the illustrious Jonathan Frankle of Lottery Ticket and MosaicML / Databricks At 4:00PM live at NeurIPS tomorrow (Wednesday) Register now!
https://x.com/dylan522p/status/1866630813074461060

“🔥 Discover the most influential AI papers of 2024! From Self-Discovering LLMs to Chain-of-Thought breakthroughs, the top upvoted papers on @huggingface along with the top 20 presented at #NeurIPS2024 📚🧠 #AI #MachineLearning
https://x.com/fdaudens/status/1867302660556181928

“Good day for Watermarking Research! Today we release Video Seal 📽️🦭, a state-of-the-art open-source video watermarking model. + Many other cool stuff 🧵 🔓We OSS (Training Code + Models + Demo) under MIT license
https://x.com/hadyelsahar/status/1867659846914650271

“The main author of the Best Paper award at #NeurIPS2024 is also the person who “engaged in malicious code attacks that sabotaged at least two research projects during his internship at ByteDance, owner of TikTok” for this paper 🤯
https://x.com/fdaudens/status/1866732715459965278

“Super excited to be going to #NeurIPS to present new work on softly state-invariant world models! We introduce an info bottleneck making world models represent action effects more consistently in latent space, improving prediction and planning! Reach out if you want to meet!
https://x.com/TankredSaanum/status/1865390033395503470

“How similar are the internal processes of artificial and human creativity? What is the effect of artificial creativity on our creativity? In a new paper, I approach these questions through the lens of neuroscience and consciousness research and make some observations 🧐👇” / X
https://x.com/jaaanaru/status/1866145232774914177

“Excited to share our work at NeurIPS on how to effectively learn new tasks from very few demonstrations! We invert demonstrations into the latent space a compositional set of generative models, allowing us to quickly learn new tasks substantially different than training tasks.” / X
https://x.com/du_yilun/status/1865068578405519419

“Hi, I’m at #NeurIPS 2024, presenting “Unsupervised Object Detection with Theoretical Guarantees”! 💡 First unsupervised object detection method that is provably guaranteed to detect objects! 🗓️ Poster #2105 on Thursday 16:30-19:30 in the East Hall. 📄
https://x.com/marian_longa/status/1867026777664786633

“🚀 Our latest research on learning RL policies from tutorial books is being oral presented today at #NeurIPS2024! We take a bold step towards more generalized offline RL by teaching AI to learn directly from textbooks—just like humans do! 📚🤖 #LLM #Reinforcementlearning
https://x.com/xiong_hui_chen/status/1866917866148335938

Scheming reasoning evaluations — Apollo Research
https://www.apolloresearch.ai/research/scheming-reasoning-evaluations

Open-sourcing Three EXAONE 3.5 Models : Frontier-level Model, Top-tier Performance in Instruction Following and Long Context Capabilities – LG AI Research BLOG
https://www.lgresearch.ai/blog/view?seq=507

[2412.08263v1] Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering
https://arxiv.org/abs/2412.08263v1

“Don’t just scale. Algorithmic innovation is still essential. E.g. scaling supervised fine-tuning sounds like a low-hanging fruit but it actually doesn’t help much. It fails not from lack of data, but the limits of the paradigm itself. The real shift? Reframe post-training as” / X
https://x.com/denny_zhou/status/1866239541276999781

[2412.06787] [MASK] is All You Need
https://arxiv.org/abs/2412.06787

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail
https://stereoanywhere.github.io/

StyleMaster
https://zixuan-ye.github.io/stylemaster/

[2412.07187v1] A New Federated Learning Framework Against Gradient Inversion Attacks
https://arxiv.org/abs/2412.07187v1

[2412.07214v1] Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models
https://arxiv.org/abs/2412.07214v1

“🎄 NeurIPS 2024’s hottest AI trends unwrapped! 4,495 accepted papers analyzed to reveal the most frequent keyword connections. See how AI’s building blocks intertwine in this festive knowledge network! 🧠✨ #NeurIPS2024 #AI #MachineLearning
https://x.com/fdaudens/status/1866555464441594231

2412.06769
https://arxiv.org/pdf/2412.06769

[2412.05265] Reinforcement Learning: An Overview
https://arxiv.org/abs/2412.05265

Momentum-GS
https://jixuan-fan.github.io/Momentum-GS_Page/

[2412.03317] FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness
https://arxiv.org/abs/2412.03317

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft | WIRED
https://www.wired.com/story/harvard-ai-training-dataset-openai-microsoft/