This week’s cover depicts a ferret reading a newspaper

AI News #13: Week Ending 12/29/2023 with Executive Summary and Top 9 Stories

January 7, 2024

This week’s cover depicts a ferret reading a newspaper, touching on three themes: first, it showcases the incredible power of MidJourney v.6’s text-to-image modeling; second, The New York Times is suing OpenAI for billions of dollars; third, Apple has released a new open-source AI model called Ferret. A MidJourney image of a ferret reading a newspaper ties it together. The title is in Times New Roman font, created using MidJourney and Photoshop.

Executive Summary

NY Times Sues Open AI: The New York Times is suing OpenAI “for ‘billions of dollars in statutory and actual damages’ related to the ‘unlawful copying and use of The Times’s uniquely valuable works.’ It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times.”
Waymo: While we pondered LLMs, Google’s driverless Waymo cars took 700,000+ trips in 2023.
Image Recognition: AI is getting almost too good at guessing where a photo was taken and identifying everything in the photo (without context nor metadata). For example, a multimodal AI guessed a user’s photo of a backcountry hiking trail within 35 miles of the location.
Apple Releases AI: Without fanfare, Apple quietly launched Ferret, a powerful open-source machine learning model.
The Year of AI Robots: Brett Adcock, a robotics CEO who previously thought robot hardware would outpace software, now says they are progressing at the same rate. It jarred him. This means, as quickly as robots develop agility, they will equally develop the ability to interact with their environment.
Microsoft CoPilot: Microsoft continues to rapidly integrate AI features into its existing products like Office, under the product name Co-Pilot.
AI Video: Elon Musk predicts 2024 will be the year of AI video. Users are creating compelling video using image-to-video and text-to-video prompting. The latest update from MidJourney is so good that animating stills from MidJourney is (for the moment) outpacing video prompts using text.
Job upheaval: While Newsweek announced layoffs attributed to AI advancements, one of the world’s oldest newspapers is leveraging AI to create new jobs.
OpenAI Wearables: In a significant move, OpenAI has recruited a top Apple design executive to develop innovative AI wearables.

The Rest: AI News of The Week

Don’t let the volume overwhelm you. Have fun and skim it. The links are organized by topic, sorted from ‘coolest’ to ‘least cool’, and each topic is clearly defined with a headline. I’ve added a description and glossary of what the topics mean, beneath each label, in plain language. I do the work so you don’t have to! The links descriptions are often pulled directly from tweets or articles, so it’s not always my voice. Pause when you see something that interests you. Reach out to me any time. I enjoy sharing and discussing these items!

Apple Ferret

Apple launched Ferret so quietly that it was made public in October yet no one noticed until December. The main strength of the model is the ability to recognize elements in an image and draw a line around them (similar to last week’s theme of ‘segmentation’).

Introducing Ferret, a new MLLM that can refer and ground anything anywhere at any granularity.

🚀🚀Introducing Ferret, a new MLLM that can refer and ground anything anywhere at any granularity.
📰https://t.co/gED9Vu0I4y
1⃣ Ferret enables referring of an image region at any shape
2⃣ It often shows better precise understanding of small image regions than GPT-4V (sec 5.6) pic.twitter.com/yVzgVYJmHc
— Zhe Gan (@zhegan4) October 12, 2023

Apple Ferret

Did Apple build a multimodal LLM that rivals Google’s Gemini already?

For those who don’t click the link, there’s an image showing an example of how the AI understands not only objects in the photo but their relationship in physical space seeminglyhttps://t.co/q45JVfJCwC pic.twitter.com/gyWSSZBQHC
— Mitchell Bernstein (@MitchBernstein) December 23, 2023

Did Apple build a multimodal LLM that rivals Google’s Gemini already? https://t.co/9sIwP42dHo
— Mitchell Bernstein (@MitchBernstein) December 23, 2023

https://github.com/apple/ml-ferret

Apple releases Ferret

An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response

Apple releases Ferret

Refer and Ground Anything Anywhere at Any Granularity@Gradio demo: https://t.co/D5nUcVn4JR

An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response pic.twitter.com/suGs33DTSr
— AK (@_akhaliq) December 23, 2023

Apple’s ‘Ferret’ is a new open-source machine learning model

https://appleinsider.com/articles/23/12/24/apples-ferret-is-a-new-open-source-machine-learning-model

A Bit of Fun

I got a sincere kick out of reading ChatGPT play the apathetic office banter meta-statement game. Anyone who works in an office will get it. The game of tennis from Rosencrantz & Guildenstern are Dead is fantastic. The security footage with huge boots was also great.

This is spot-on.

“ChatGPT, Play along, making meta-statements without actual content. I’ll start: Tentative question?”

“ChatGPT, Play along, making meta-statements without actual content. I’ll start:

Tentative question?” pic.twitter.com/W0IeibVTvN
— Ethan Mollick (@emollick) December 29, 2023

“Hey ChatGPT, lets play question tennis from Rosencrantz & Guildenstern are Dead. It is a game. Do you know what it is?”

“Hey ChatGPT, lets play question tennis from Rosencrantz & Guildenstern are Dead. It is a game. Do you know what it is?”

Pretty impressive. I scored only on a doubtful technicality. pic.twitter.com/kza8UnRSrI
— Ethan Mollick (@emollick) December 28, 2023

ai generated cctv footage of police arresting ppl for wearing huge boots

peak internet:

ai generated cctv footage of police arresting ppl for wearing huge boots

video credit: u/Qemmish pic.twitter.com/IXelFWMn5c
— Bilawal Sidhu (@bilawalsidhu) December 29, 2023

ChatGPT as information Swiss army knife:

“Look up this bottle of wine, tell me how it is rated on various sites and also how it should talk about it so i sound sophisticated”

ChatGPT as information Swiss army knife:

“Look up this bottle of wine, tell me how it is rated on various sites and also how it should talk about it so i sound sophisticated” pic.twitter.com/gGaepb9FWD
— Ethan Mollick (@emollick) December 30, 2023

What the average redditor looks like according to midjourney (all versions)

What the average redditor looks like according to midjourney (all versions)
byu/Ralib1 inmidjourney

Legal/Ethics

The power of multimodal image recognition plus AI context and prediction has led to AI becoming the world’s champion “Geoguesser”. And… elections.

Artificial intelligence can find your location in photos, worrying privacy experts

The PIGEON algorithm was able to geolocate this 2012 photo of the author on a backcountry trail in Yellowstone National Park to within roughly 35 miles of where it was taken.

Click to access 2307.05845.pdf

https://www.npr.org/2023/12/19/1219984002/artificial-intelligence-can-find-your-location-in-photos-worrying-privacy-expert

world’s best ai vs geoguessr pro

Google: How we’re approaching the 2024 U.S. elections

https://blog.google/outreach-initiatives/civics/how-were-approaching-the-2024-us-elections/

AR/VR

The biggest news in AR is the Apple Vision Pro which is coming out in a matter of weeks. The biggest theme is the concept of “gaussian splatting” which (to my very layperson’s knowledge) is taking still images or frames of a video and extrapolating a 3D image.. For a single image, it’s OK. For a series of images, it can remarkably stitch them together into a real time AR/VR model. And for recorded video, it essentially turns the video into a gaming environment. That’s very dumbed down, but it’s important not to let terms turn us off. Gaussian Splatting should not be a trigger term. Get over it. As Brad Hamilton says, “Learn it. Love it. Live it.”

Apple Vision Pro tipped for late Jan/early Feb release

Apple Vision Pro tipped for late Jan/early Feb release

Apple Vision Pro Currently In Mass Production, Says Analyst, Believes The Headset Is Company’s Most Important Product For Next Year

https://wccftech.com/apple-vision-pro-currently-in-mass-production-most-important-product-2024/

Apple Vision Pro tipped for late Jan/early Feb release

Apple Vision Pro tipped for late Jan/early Feb release

Recording a family house in AR/VR: “I’m still convinced the killer use case for 3d reconstruction tech is memory capture. No surprise Apple is headed in this direction.“

I'm still convinced the killer use case for 3d reconstruction tech is memory capture. No surprise Apple is headed in this direction.

Here's a massive scan of the backyard + rooftop of my parent's old house. They're retired now, but their home is immortalized forever.

Photo… pic.twitter.com/lbB3Nkt3aV
— Bilawal Sidhu (@bilawalsidhu) December 26, 2023

Introducing Marigold, a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out:

Introducing Marigold 🌼 – a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out:

🌐 Website:… pic.twitter.com/UeEHp3fMKQ
— Anton Obukhov (@AntonObukhov1) December 8, 2023

LangSplat: 3D Language Gaussian Splatting (these are the little headlines that are the big headlines, if you’re technical)

ground CLIP features into a set of 3D language Gaussians, which attains precise 3D language fields while being 199 × faster than LERF

LangSplat: 3D Language Gaussian Splatting

paper page: https://t.co/RtOpGlO8lS

ground CLIP features into a set of 3D language Gaussians, which attains precise 3D language fields while being 199 × faster than LERF pic.twitter.com/ehhl2m24Jq
— AK (@_akhaliq) December 27, 2023

Fun with real-time diffusion & controlnet – turning a webcam video subject into an alien monster

Fun with real-time diffusion & controlnet pic.twitter.com/eDkIEBP69q
— Johannes Stelzer (@j_stelzer) December 26, 2023

“Alibabi: Make-A-Character: High Quality Text-to-3D Character Generation within Minutes”

https://github.com/Human3DAIGC/Make-A-Character

Business/Enterprise

The Bloomberg story is the one that deserves some unpacking. Six months after Bloomberg spent $1,000,000 on a custom finance model, another finance model came out that outperforms it and costs $100. Separately, I tend to take Google’s hiring and layoff numbers with a large grain of salt.

“Bloomberg invested over a million dollars in developing a finance-domain focused Large Language Model (LLM) named BloombergGPT. And within just six months of the release of BloombergGPT, a model (AdaptLLM-7B) costing merely $100 came out surpassing BloombergGPT in performance.”

BloombergGPT was great when it came out.

But just to put the rapid evolution of LLMs into perspective

📌 Bloomberg invested over a million dollars in developing a finance-domain focused Large Language Model (LLM) named BloombergGPT.

And within just six months of the release… pic.twitter.com/XH5vrzd0oc
— Rohan Paul (@rohanpaul_ai) December 23, 2023

Google likely to layoff 30,000 employees post new AI innovation

“The proposed restructuring is anticipated to primarily impact Google’s ad sales department, where the company is exploring the benefits of leveraging AI for operational efficiency.”

AI is Consequential:

Google likely to layoff 30,000 employees post new AI innovation

“The proposed restructuring is anticipated to primarily impact Google's ad sales department, where the company is exploring the benefits of leveraging AI for operational efficiency.”

BTW:…
— Jeremiah Owyang (@jowyang) December 29, 2023

Artificial intelligence checks whether your Louis Vuitton bag is fake

Technology company Entrupy claims that it can use AI to detect whether a luxury item is fake with near-perfect accuracy.

Home

Artificial intelligence checks whether your Louis Vuitton bag is fake

Anthropic Projected At Least $850 Million in Annualized Revenue in 2024

Anthropic has projected it will generate more than $850 million in annualized revenue by the end of 2024, The Information reported . That’s a 70% increase from a projection it gave to some investors just three months ago.

https://www.theinformation.com/briefings/anthropic-projected-at-least-850-million-in-annualized-revenue-in-2024

ChatGPT Helps, and Worries, Business Consultants, Study Finds

The A.I. tool helped most with creative tasks. With more analytical work, however, the technology led to more mistakes.

OpenAI competitor Anthropic projects $850 million in annualized revenue

https://the-decoder.com/openai-competitor-anthropic-projects-850-million-in-annualized-revenue

OpenAI

OpenAI Is in Talks to Raise New Funding at Valuation of $100 Billion or More

OpenAI would be second-most valuable US startup behind SpaceX

Company also in talks for billions from G42 for chip venture

https://www.bloomberg.com/news/articles/2023-12-22/openai-in-talks-to-raise-new-funding-at-100-billion-valuation

OpenAI Is in Talks to Sell Shares at an $86 Billion Valuation

https://www.bnnbloomberg.ca/openai-is-in-talks-to-sell-shares-at-an-86-billion-valuation-1.1986538

OpenAI Is in Talks to Raise New Funding at Valuation of $100 Billion or More

https://finance.yahoo.com/news/openai-talks-raise-funding-100-211552141.html

Images

MidJourney’s improvement over one year is outstanding. For every cartoonish Dalle-3 image we see (all of them it seems), MidJourney is out there making photorealistic images that push the boundaries of what I think is possible with diffusion. Diffusion is taking random static and telling the computer, “Hey, if this was a photo of a walrus smoking a cigar on a bullet train in Wisconsin, what would it look like?”… and the computer “denoises the random static” into the text. Whoever figured it out should get a prize in computing (pretty sure it’s Jonathan Ho, Ajay Jain, and Pieter Abbeel). Be sure to look at the examples below.

MidJourney v6 Fanfare Continues

Midjourney v1 until v6, same prompt”

white background, closeup portrait of a very old mean man, 92 years old, wrinkles, realistic skin, studio lighting,, canon f/4

Midjourney v1 until v6, same prompt"

white background, closeup portrait of a very old mean man, 92 years old, wrinkles, realistic skin, studio lighting,, canon f/4#midjourneyV6 #midjouney #aiartcommunity pic.twitter.com/g8wAALAbH3
— Marco Nedermeijer (@MNedermeijer) December 21, 2023

The skin details in #midjouney v6 are insane.

The skin details in #midjouney v6 are insane. pic.twitter.com/BKDwPupb5U
— BorisJov (@Boris_Jov) December 21, 2023

MidJourney V6 can replicate almost any animation style. 10 flawless examples with prompts.

Wow, you won't believe this…

MidJourney V6 can replicate almost any animation style.

10 flawless examples with prompts: pic.twitter.com/bQfVQmncwM
— Proper 🧐 (@ProperPrompter) December 27, 2023

Futuristic Nike explorations

The level of detail in Midjourney V6 is absolutely insane!

Take a look at these futuristic Nike explorations for an upcoming @Framer Template 👇 pic.twitter.com/mITGYvnJw2
— Paul Lapkin (@DesignedByPaul) December 22, 2023

Make-A-Character

Make-A-Character: Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

https://huggingface.co/spaces/Human3DAIGC/Make-A-Character

Wearables

Embodied devices (from Siri to robots) are when AI is embedded into an object in order to let you chat with it in plain language, i.e. “get the red sock from the couch and bring it over here”. OpenAI is making moves in that direction, by hiring the head of design for iPhone. Feels like a big deal.

Apple’s iPhone Design Chief Enlisted by Jony Ive, Sam Altman to Work on AI Devices

Design executive Tang Tan is set to leave Apple in February

Tan will join Ive’s LoveFrom design studio, work on AI project

https://www.bloomberg.com/news/articles/2023-12-26/apple-iphone-design-head-tang-tan-to-work-with-jony-ive-sam-altman-on-ai-tech

Humane’s AI Pin will start shipping in March

https://www.theverge.com/2023/12/22/24012429/humane-ai-pin-shipping-marc

Robotics/Embodiment

Brett Adcock, a robotics CEO who previously thought robot hardware would outpace software, now says they are progressing at the same rate. It jarred him. This means, as quickly as robots develop agility, they will equally develop the ability to interact with their environment. The first link is his tweet, and it’s worth reading.

2024 will be the year of Embodied AI

The timeline split of AI vs Robot Hardware has changed

the last 90 days i’ve witnessed industry leading AI in our lab running on humanoid hardware, and frankly it’s blown me away

i’m watching robots performing complex tasks entirely with neural nets. AI trained tasks that i… pic.twitter.com/B59zBIgW3l
— Brett Adcock (@adcock_brett) December 26, 2023

Waymo cars took 700,000+ trips in 2023

https://waymo.com/blog/2023/12/dear-waymo-community-reflections-from-2023.html.html

Waymo has 7.1 million driverless miles — how does its driving compare to humans?

The Google spinoff’s robotaxis led to a reduction in injury-related and police-reported crashes when compared to human benchmarks, according to new research.

https://www.theverge.com/2023/12/20/24006712/waymo-driverless-million-mile-safety-compare-human

LG Ushers in ‘Zero Labor Home’ With Its Smart Home AI Agent at CES 2024

With its advanced ‘two-legged’ wheel design, LG’s smart home AI agent is able to navigate the home independently. The intelligent device can verbally interact with users and express emotions through movements made possible by its articulated leg joints. Moreover, the use of multi-modal AI technology, which combines voice and image recognition along with natural language processing, enables the smart home AI agent to understand context and intentions as well as actively communicate with users.

LG Ushers in ‘Zero Labor Home’ With Its Smart Home AI Agent at CES 2024

Science/Medicine

The first link is the coolest one. AI is able to read damaged scrolls without opening them!

AI Continues to Chip Away At Ancient Scrolls (especially ones we cannot open without ruining)

The new segment from RICHI is huge https://t.co/jryXYpnheX pic.twitter.com/BRLwuJvaZq
— Nat Friedman (@natfriedman) December 24, 2023

(see previously: https://www.semafor.com/article/10/12/2023/ai-deciphers-ancient-scrolls-burned-and-buried-for-2000-years)

We launched the world’s first Gen AI bot that helps radio-technologists and young radiologists with the appropriate scan protocols in every scenario. It’s already being pinged 500+ times a day.

If you’re interested in medical imaging, just know that life is going to change very fast (for the better).

We launched the world’s first Gen AI bot that helps radio-technologists and young radiologists with the appropriate scan protocols in every scenario.

It’s already being… pic.twitter.com/5ZzQfB9hvo
— Kalyan Sivasailam (@KalSivasailam) December 25, 2023

The Race to Put Brain Implants in People Is Heating Up (Paywall)

Thanks in part to Elon Musk, the field of brain-computer interfaces has captured both public and investor interest, with a cadre of companies now developing implantable devices.

https://www.wired.com/story/the-race-to-put-brain-implants-in-people-is-heating-up/

AI reveals how microplastics are harming global soil and agriculture

https://www.earth.com/news/ai-shows-how-microplastics-are-harming-global-soil-and-agriculture

AI companion ElliQ: Reducing senior loneliness

ElliQ, created by Intuition Robotics, is notable for being the first artificial intelligence device explicitly designed to reduce loneliness and isolation in older Americans.

https://www.aiacceleratorinstitute.com/ai-companion-elliq-reducing-senior-loneliness/

https://elliq.com/

MyHeritage Releases AI Record Finder™ and AI Biographer™ — Two Groundbreaking Features That Transform Genealogy Using Artificial Intelligence

https://www.myheritage.com/research/ai-record-finder/

https://www.businesswire.com/news/home/20231226134311/en/MyHeritage-Releases-AI-Record-Finder%E2%84%A2-and-AI-Biographer%E2%84%A2-%E2%80%94-Two-Groundbreaking-Features-That-Transform-Genealogy-Using-Artificial-Intelligence

Scientists discover the first new antibiotics in over 60 years using AI

https://www.euronews.com/next/2023/12/31/scientists-discover-the-first-new-antibiotics-in-over-60-years-using-ai

Video

While text-to-images has been the star this year, text-to-video has struggled to become camera ready beyond a few second clips. The challenge with AI images is it’s tough to keep them consistent. The requirement of video frames is they have to be consistent. It’s temporarily easier to animate a still image (using different technology) than to prompt a video with text. Below, are three incredible examples of how MidJourney’s image improvements have led to essentially full fledged movie trailers (quick cuts help) using AI. There are two big video AI platforms out there – Pika and Runway. A new startup, Assistive, wants to join them.

MidJourney + Runway

#Caribéanofuturisme #MJV6+GEN2 pic.twitter.com/uW2r15bUnp
— Manuel Sainsily ᯅ Futurist (GDC) (@ManuVision) December 23, 2023

MidJourney + SVD + Topaz

SVD在生成水和云雾这种流体表现的时候是真的强，这个视频的清晰度和运动幅度都是现在runway达不到的。

工作流为midjourney-SVD-Topaz
pic.twitter.com/ChWzT2UBLJ
— 歸藏 (@op7418) December 26, 2023

Movie Trailer using AI

Made a movie using Midjourney V6 + @runwayml and Live Action. The time is now for AI to push our creative at any level. Here is the teaser for a short and feature screenplay that I wrote. #ai #AIArtCommuity #aifilm #aiartwork #filmmaking #runwayml pic.twitter.com/wIbuQjLb2v
— Dave Clark (@Diesol) December 22, 2023

Pika Labs’ text-to-video AI platform opens to all: Here’s how to use it

Pika Labs’ text-to-video AI platform opens to all: Here’s how to use it

https://pika.art/login

Elon Musk: AI movies next year

AI movies next year
— Elon Musk (@elonmusk) December 27, 2023

Domo AI Real Time Filters

Introducing two new models for our Video-to-Video function:

1. Storybook Cartoon

2. Color Illustration

New Model Updates!

Introducing two new models for our Video-to-Video function:
1. Storybook Cartoon
2. Color Illustration

Plus, more stability and enhanced results with our optimized algorithm , ensuring a smoother creative experience for you! Enjoy creating! pic.twitter.com/OxlLllU4ZW
— DomoAI official (@DomoAI_) December 28, 2023

Introducing Leonardo Motion

Generate videos from your images in just a couple of clicks.

Now available to all users, paid and free. Our top plan now also includes unlimited video generations.

Introducing Leonardo Motion 🎬

Generate videos from your images in just a couple of clicks.

Now available to all users, paid and free. Our top plan now also includes unlimited video generations. 🚀🚀 pic.twitter.com/toxbXfEgZk
— Leonardo.Ai (@LeonardoAi_) December 24, 2023

https://app.leonardo.ai/

Assistive

Today, we’re launching our first product, Assistive Video. It’s a generative video platform for creating high quality videos from text and image prompts.

Introduce Assistive Video, the generative video platform for creating videos from text and images.

Simple type what you want to see and watch your ideas come to life.

It's available starting today on the web, and via API. Start creating at https://t.co/oEBgzEtn2g pic.twitter.com/AT2mDt39u2
— Assistive (@assistiveapp) December 27, 2023

https://assistive.chat/blog/introducing-assistive-video

https://assistive.chat/product/video

http://assistive.chat/video

News/Journalism

I’m not sure how to take Newsweek’s layoffs, since my instinct is they were not in the greatest shape, anyway. Which is why the second link is much cooler. Berrow’s Worcester Journal is creating new jobs called “AI Reporters”. These roles appear to resemble those of coordinators or producers which take data-driven reporting or community records/minutes and use AI to transform them into content, which the AI Reporters then manually proof and polish. Sort of an “AI assisted” role.

Newsweek: Massive Layoffs Are Coming in 2024

4/10 to be replaced by AI

Newsweek: Massive Layoffs Are Coming in 2024

4/10 to be replaced by AI

AI is Consequential pic.twitter.com/cNNu9CakIg
— Jeremiah Owyang (@jowyang) December 29, 2023

How one of the world’s oldest newspapers is using AI to reinvent journalism

The AI reporters use an in-house copywriting tool based on the technology ChatGPT, a souped-up chatbot that draws on information gleaned from text on the internet. Reporters input mundane but necessary “trusted content” – such as minutes from a local council planning committee – which the tool turns into concise news reports in the publisher’s style.

https://www.theguardian.com/technology/2023/dec/28/how-one-of-the-worlds-oldest-newspapers-is-using-ai-to-reinvent-journalism

New York Times Lawsuit

Next week’s newsletter will have more extensive coverage and analysis. There are valid concerns, but also instances of miscommunication and a lack of nuance. The lawsuit appears to lean heavily on emotional appeals and connects a lot of dots that may or may not be related. However, beneath these emotional appeals lie some solid points. I highly recommend reading the articles and links explaining the nuances of the case, below.

The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work

Millions of articles from The New York Times were used to train chatbots that now compete with it, the lawsuit said.

Click to access NYT_Complaint_Dec2023.pdf

The New York Times is suing Microsoft/OpenAI over copyright infringement, claiming the companies are responsible for “billions of dollars” in damages and demanding any chatbot models and data that pulled copyrighted work from The Times be destroyed.

https://variety.com/2023/digital/news/new-york-times-sues-openai-microsoft-copyright-infringement-1235851238/

OpenAI’s Napster/Google Moment

The New York Times got rolled by Google in the 2000s, but they’re not getting rolled this time around.

https://calacanis.substack.com/p/openais-napstergoogle-moment

A thread on some misconceptions about the NYT lawsuit against OpenAI. Morality aside, the legal issues are far from clear cut. Gen AI makes an end run around copyright and IMO this can’t be fully resolved by the courts alone.

A thread on some misconceptions about the NYT lawsuit against OpenAI. Morality aside, the legal issues are far from clear cut. Gen AI makes an end run around copyright and IMO this can't be fully resolved by the courts alone. (HT @sayashk @CitpMihir for helpful discussions.)
— Arvind Narayanan (@random_walker) December 29, 2023

In the New York Times OpenAI lawsuit, you can see how complex the relationship of training data to output can be. On one hand, they find that you can induce ChatGPT to produce exact content from famous Times articles, on the other, they show it also hallucinates false articles.

In the New York Times OpenAI lawsuit, you can see how complex the relationship of training data to output can be. On one hand, they find that you can induce ChatGPT to produce exact content from famous Times articles, on the other, they show it also hallucinates false articles. pic.twitter.com/cY7cyZjd8r
— Ethan Mollick (@emollick) December 27, 2023

IP concerns may threaten smaller players in AI, but the large generative AI companies (Adobe, Microsoft, OpenAI, Anthropic) all agreed to defend their users against any copyright or infringement claims. I wonder if this will prove a barrier to startup & open source entrants.

IP concerns may threaten smaller players in AI, but the large generative AI companies (Adobe, Microsoft, OpenAI, Anthropic) all agreed to defend their users against any copyright or infringement claims.

I wonder if this will prove a barrier to startup & open source entrants. pic.twitter.com/Y7VPY6vHZJ
— Ethan Mollick (@emollick) December 25, 2023

Anthropic Joins the Party, Offers Copyright Shield to Enterprise AI Customers

Anthropic Joins the Party, Offers Copyright Shield to Enterprise AI Customers

Artificial General Intelligence (AGI)

Will scaling work?

Data bottlenecks, generalization benchmarks, primate evolution, intelligence as compression, world modelers, and other considerations

https://www.dwarkeshpatel.com/p/will-scaling-work

Multimodality

This is the ability for a language model to “see, hear, etc”. Just like Apple’s Ferret, GPT-4 is multimodal and can identify objects in photos.

New Multi-Modal with Search Grounding.

Microsoft is combining GPT-4 Vision, Bing image search and web data to deliver a better understanding of queries. Search Grounding was not only able to identify the image, but also the EXACT shuttle.

New Multi-Modal with Search Grounding.

Microsoft is combining GPT-4 Vision, Bing image search and web data to deliver a better understanding of queries.

Check out the insane demo below.

Search Grounding was not only able to identify the image, but also the EXACT shuttle. pic.twitter.com/T4OhayhiD9
— Rowan Cheung (@rowancheung) December 28, 2023

Microsoft

Microsoft’s next Surface laptops will reportedly be its first true ‘AI PCs’

https://www.theverge.com/2023/12/28/24017890/microsoft-ai-surface-laptops-arm

EXCLUSIVE: Microsoft readies ‘next-gen’ AI-focused Surface Pro 10 and Surface Laptop 6 with Arm chips and design upgrades for 2024

https://www.windowscentral.com/hardware/surface/microsoft-surface-pro-10-laptop-6-major-update-intel-arm-ai-2024

Copilot for Windows Features Overview

https://www.microsoft.com/en-us/windows/copilot-ai-features

Microsoft Copilot is now available as a ChatGPT-like app on Android

You no longer need the Bing mobile app to access Copilot on Android devices.

https://www.theverge.com/2023/12/26/24015198/microsoft-copilot-mobile-app-android-launch

Copilot for Web

https://copilot.microsoft.com/

Copilot Launches for iOS

https://apps.apple.com/us/app/microsoft-copilot/id6472538445

Explainers

Autonomous AI Video Clip Generator (GPT-4 API, Whisper, PyTube ++)

I created a system in Python that searches YouTube and downloads the video, transcribes it with OpenAI Whisper, uses GPT-4 to find the USERs clip from in the video and cuts the video in that clip, the USER can choose between 16:9 or 9:16 format and if they want subtitles. Perfect for YouTube automation and TikToks / Reels

Google’s Video Poet Elevates AI Video!

Multimodal AI, Gemini, and Google’s Data Moat – Can it beat OpenAi

20/Dec/2023 – AI report Q&A, phi-2 demo, Mistral, Apple HUGS – Weekly livestream Nov-Dec/2023 – LIVE

The Stanford AI INDEX REPORT

Measuring trends in Artificial Intelligence

AI Index Report 2023

Google’s Year In Review: 2023: A year of groundbreaking advances in AI and computing

https://blog.research.google/2023/12/2023-year-of-groundbreaking-advances-in.html

Jeff Bezos on Generative AI: “They’re not inventions. They’re discoveries.”

https://aninternetreference.substack.com/p/jeff-bezos-on-generative-ai-theyre

How Not to Be Stupid About AI, With Yann LeCun (paywall)

It’ll take over the world. It won’t subjugate humans. For Meta’s chief AI scientist, both things are true.

https://www.wired.com/story/artificial-intelligence-meta-yann-lecun-interview/

China/Baidu

Baidu’s ChatGPT-like Ernie Bot has more than 100 mln users -CTO

https://www.reuters.com/technology/baidus-chatgpt-like-ernie-bot-has-more-than-100-mln-users-cto-2023-12-28/

We asked GPT-4 and Chinese rival ERNIE the same questions. Here’s how they answered

https://edition.cnn.com/2023/12/15/tech/gpt4-china-baidu-ernie-ai-comparison-intl-hnk/index.htm

Audio

Nendo is a generative music AI

Introducing three new models for Nendo, trained by our community member pharoAIsanders:

• Model 1 generates finest vintage Dub music
• Model 2 generates Boom Bap Hip Hop tunes
• Model 3 generates rolling Drum'n'Bass bangers

All models are extensive, high quality finetunes of… pic.twitter.com/4SQuXf8uNu
— okio (@okio_ai) December 22, 2023

https://colab.research.google.com/drive/1uGQIejuCKKEQrFBgzaHtdCYHEIFbJD6l

CassetteAI is another generative music AI

https://cassetteai.com/dashboard

New Models

Release of Robin v1.0 – a Suite of Multimodal Models

The Robin team is proud to present Robin, a suite of Multimodal (Visual-Language) Models. These models outperform, or perform on par with, the state of the art models of similar scale.

https://sites.google.com/view/irinalab/blog/robin-v1-0

Nous Hermes 2 – Yi-34B is a state of the art Yi Fine-tune.

Nous Hermes 2 Yi 34B was trained on 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape.

https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B

Chips/Hardware

Nvidia’s biggest Chinese competitor unveils cutting-edge new AI GPUs — Moore Threads S4000 AI GPU and Intelligent Computing Center server clusters using 1,000 of the new AI GPUs. Beefy clusters with 200 petaops of AI compute.

https://www.tomshardware.com/pc-components/gpus/nvidias-biggest-chinese-competitor-unveils-cutting-edge-new-ai-gpus-moore-threads-s4000-ai-gpu-and-intelligent-computing-center-server-clusters-using-1000-of-the-new-ai-gpus

Technical/Dev/IT

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models’ inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals’ privacy by inferring personal attributes from text given at inference time. In this work, we present the first comprehensive study on the capabilities of pretrained LLMs to infer personal attributes from text.

https://arxiv.org/abs/2310.07298

Evaluating Language-Model Agents on Realistic Autonomous Tasks

In this report, we explore the ability of language model agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. We refer to this cluster of capabilities as “autonomous replication and adaptation” or ARA. We believe that systems capable of ARA could have wide-reaching and hard-to-anticipate consequences, and that measuring and forecasting ARA may be useful for informing measures around security, monitoring, and alignment. Additionally, once a system is capable of ARA, placing bounds on a system’s capabilities may become significantly more difficult.

https://arxiv.org/abs/2312.11671

Quantum Computing’s Hard, Cold Reality Check Hype is everywhere, skeptics say, and practical applications are still far away
https://spectrum.ieee.org/quantum-computing-skeptics

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

https://huggingface.co/papers/2312.11514

Google Addresses the Mysteries of Its Hypercomputer

It turns out that the Hypercomputer is Google’s take on a modular supercomputer with a healthy dose of its homegrown TPU v5p AI accelerators, which were also announced this month.

Google Addresses the Mysteries of Its Hypercomputer

Happy to announce the open sourcing of the Capybara dataset!

All of this diversity is contained in less than 20K examples, already aggressively filtered to keep out censorship and undesirable responses.

Happy to announce the open sourcing of the Capybara dataset! Merry Christmas everyone!🎄

Thank you to @yield /@niemerg for sponsoring the creation, as well as @a16z for helping make the first trainings possible within @NousResearch, and @JSupa15 for contributions.… pic.twitter.com/0Y5mXkZt4a
— LDJ (@ldjconfirmed) December 26, 2023

LG’s latest Gram laptops are predictably stuffed with AI features

https://www.engadget.com/lgs-latest-gram-laptops-are-predictably-stuffed-with-ai-features-163910204.html

A deep dive into training dynamics of diffusion models

This paper has received significantly less attention than it deserves, so let me shed a bit more light on it and describe why it's so good:

1. It turns out that the classical U-Net image diffusion backbone, which the entire community has been happily building upon during the… https://t.co/wsgHszGAr7
— Ivan Skorokhodov (@isskoro) December 23, 2023

Introducing AskVideos-VideoCLIPv0.1, a versatile text-grounded video embedding model. Like its image-only counterpart, CLIP, VideoCLIP enables you to compute a single embedding for videos that can be used to compute similarity with text and perform vector retrieval.

Introducing AskVideos-VideoCLIPv0.1, a versatile text-grounded video embedding model. Like its image-only counterpart, CLIP, VideoCLIP enables you to compute a single embedding for videos that can be used to compute similarity with text and perform vector retrieval.

Github:… pic.twitter.com/nrvuIZS1vP
— Sammy Atman (@SammieAtman) December 23, 2023

Consistent lesson from 70 years of AI progress is both how counterintuitive the shape of progress is (chess was much easier than AGI), but also how predictable (neural nets were conceptualized in the 1940s!) with the right mix of intuition and science.

Consistent lesson from 70 years of AI progress is both how counterintuitive the shape of progress is (chess was much easier than AGI), but also how predictable (neural nets were conceptualized in the 1940s!) with the right mix of intuition and science.
— Greg Brockman (@gdb) December 26, 2023