Image created with Flux Pro v1.1 Ultra. Image prompt: photorealistic still image of a middle-aged man standing behind a woman, woman covering part of her face with her hand, man looking over her shoulder, both illuminated with warm stadium jumbotron lighting, natural skin tones, subtle lens flare, shallow depth of field, exact color temperature of a live event projection, man wearing a jacket decorated with icons for text, video, and audio, woman holding a matching tote, cinematic realism –no text, captions, watermarks
we plugging ViTPose into Basketball AI according to @NBA rules, a player is considered to be in the paint only if both feet are inside the paint notebook: https://x.com/skalskip92/status/1950231824933982428
what player is that? in the upcoming supervision-0.27.0 release, you’ll be able to freely control text position, including applying custom offsets from the detection box supervision annotators are now so advanced, you can literally use them to create full visual content link: https://x.com/skalskip92/status/1950984077617799534
Runway, Luma Target Sales to Robotics Companies — The Information https://www.theinformation.com/articles/runway-luma-target-sales-robotics-companies
Alibaba to launch AI-powered smart glasses creating rival to Meta https://www.cnbc.com/2025/07/28/alibaba-ai-smart-glasses-creates-rival-to-meta.html
NotebookLM updates: Video Overviews, Studio upgrades https://blog.google/technology/google-labs/notebooklm-video-overviews-studio-upgrades/
Pierre and team really cooked with this vision language model (VLM)! Excited for you to try it out! 111B open parameters”” / X https://x.com/JayAlammar/status/1950931480349143259
RT @1vnzh: Command A Vision – SOTA enterprisemaxx multimodal model – Outperforms GPT 4.1, Llama 4 Maverick, and Mistral Medium 3 in enterpr…”” / X https://x.com/aidangomez/status/1950927454383616343
RT @nickfrosst: cohere vision model 🙂 weights on huggingface https://x.com/andrew_n_carr/status/1951068402090647608
Step3: Cost-Effective Multimodal Intelligence | StepFun https://stepfun.ai/research/en/step3
Introducing Command A Vision: Multimodal AI https://cohere.com/blog/command-a-vision
OpenAI agent is blocked by OpenAI captcha. https://x.com/gneubig/status/1948915714955641159
i stopped using GQA as an eval when i found this woman was labeled a bird and the phone as white. the annotations have a 20-30% error rate. (and it’s supposed to be a “”cleaned up”” version of visual genome, so steer clear of that one too) https://x.com/vikhyatk/status/1949365273901060474
Step3 benchmarks at last. The first «DeepSeek-like» that’s strongly multimodal (Ernie disappointed). It’s very different from V3, too – another in-house attention, the logic around inference economics. A big release. https://x.com/teortaxesTex/status/1951008169989382218
very surprising that fifteen years of hardcore computer vision research contributed ~nothing toward AGI except better optimizers we still don’t have models that get smarter when we give them eyes”” / X https://x.com/jxmnop/status/1949869844142473322
Unitree Introducing | Unitree R1 Intelligent Companion Price from $5900 Join us to develop/customize, ultra-lightweight at approximately 25kg, integrated with a Large Multimodal Model for voice and images, let’s accelerate the advent of the agent era!🥰 https://x.com/UnitreeRobotics/status/1948681325277577551




