Ethan B. Holland

Over 56,100 manually organized AI links and counting

AI Tech and Development News: Week Ending 02/09/2024

February 25, 2024

Tech and Development

“You can now create API keys that are limited to specific API endpoints, enabling granular permissions and better security. 🔗: https://t.co/XZO740WKJ4 https://t.co/Ernbd8Vt0a” / X – https://twitter.com/OpenAIDevs/status/1755275367500386753

“Interested in learning the mathematical foundations of Reinforcement Learning (RL)? Now is a good time! This semester, we will make videos and lecture notes from my graduate-level RL theory course at Princeton available to the public. Now it’s week 1: https://t.co/zLRtZB5u6b” / X – https://twitter.com/chijinML/status/1754697412168343809

“Launching our LLM Leaderboard with 60+ model & API host combinations 🚀 This is the most comprehensive view of LLM inference performance available to date Our leaderboard offers a single page to compare models & API hosts across metrics including quality, throughput, latency,… https://t.co/bRyqCuML0K” / X – https://twitter.com/ArtificialAnlys/status/1755284737399362038

LLM Explorer: A Curated Large Language Model Directory. LLMs List. Explore 18571 Open-Source Language Models. – https://llm.extractum.io/

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates – https://huggingface.co/blog/leaderboards-on-the-hub-nphardeval

“Heard of mysterious “Deluxe v1.2″ on lmsys Arena?? I expect this to be something big! I threw some MT-bench questions at it and it was nailing them (GPT-4 level nailing them!) Slow response (~4 tok/s) & non-streaming answers. I expect it above Mistral Medium. Possible🥈 place https://t.co/mVyHxgYc0m” / X – https://twitter.com/gblazex/status/1753125048179622382?s=20

AI Design Patterns by @ttunguz – https://tomtunguz.com/ai-design-patterns/

[2311.11944] FinanceBench: A New Benchmark for Financial Question Answering – https://arxiv.org/abs/2311.11944

NPHardEval Leaderboard – a Hugging Face Space by NPHardEval – https://huggingface.co/spaces/NPHardEval/NPHardEval-leaderboard

[2312.14890] NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes – https://arxiv.org/abs/2312.14890

LLM-Perf Leaderboard – a Hugging Face Space by optimum – https://huggingface.co/spaces/optimum/llm-perf-leaderboard

LMSys Chatbot Arena Leaderboard – a Hugging Face Space by lmsys – https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

[2302.02083] Evaluating Large Language Models in Theory of Mind Tasks – https://arxiv.org/abs/2302.02083

[2402.01832] SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training? – https://arxiv.org/abs/2402.01832

[2402.02791v1] Rethinking Optimization and Architecture for Tiny Language Models – https://arxiv.org/abs/2402.02791v1

[2402.03190v1] Unified Hallucination Detection for Multimodal Large Language Models – https://arxiv.org/abs/2402.03190v1

Latxa – a HiTZ Collection –

Latxa is a collection of foundation models specifically tuned for Basque.

https://huggingface.co/collections/HiTZ/latxa-65a697e6838b3acc53677304

EnDex: Evaluation of Dialogue Engagingness at Scale – ACL Anthology – https://aclanthology.org/2022.findings-emnlp.359/

[2402.03757v1] The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs – https://arxiv.org/abs/2402.03757v1

[2402.04087v1] A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation – https://arxiv.org/abs/2402.04087v1

The pain points of building a copilot – Austin Z. Henley – https://austinhenley.com/blog/copilotpainpoints.html

Introducing Pkl, a programming language for configuration :: Pkl Docs – https://pkl-lang.org/blog/introducing-pkl.html

Google Kubernetes Engine (GKE) Cheatsheet | Datadog – https://www.datadoghq.com/resources/gke-monitoring-cheatsheet/

[2402.00284v1] PAP-REC: Personalized Automatic Prompt for Recommendation Language Model – https://arxiv.org/abs/2402.00284v1

[2402.00282v1] PAM: Prompting Audio-Language Models for Audio Quality Assessment – https://arxiv.org/abs/2402.00282v1
“Want to build your own https://t.co/xq1jSdoA1L or write a novel with your own LLM? CharacterGLM 6B is now open sourced on @huggingface , aiming to enrich AI character creation in gaming and digital media. Model: https://t.co/J8UfJytoQJ Deep dive: https://t.co/wpMSsKsSmW https://t.co/ZducbUaQ7d” / X – https://twitter.com/Xianbao_QIAN/status/1755076636914110614