Welcome to issue #14 of the latest artificial intelligence news and trends for the week ending January 5th, 2024.

This week’s cover features a rabbit watching the Killer Rabbit of Caerbannog from “Monty Python and the Holy Grail” using VR goggles.  Behind the rabbit is the Meta logo.  This image was created using this week’s prompting hack: placing the term “iPhone photo” in the description. “iPhone photo of a rabbit wearing VR goggles –ar 16:9 –v 6.0 –style raw”.  It also references next week’s launch of the Rabbit AI assistant. Meta CTO ‘Boz’ says Meta’s new AR headset is “the most advanced thing we’ve ever produced as a species”, which is why the Meta logo is “behind the rabbit”.  Cover created with MidJourney and edited with Photoshop.  The font is Cooper Std which is stylistically similar to the Monty Python Intermission card. Here’s the scene: https://www.youtube.com/watch?v=pmu5sRIizdw 

Executive Summary

This week in AI was full of hyperbole, and that’s saying a lot. 

  • Meta’s CTO called their AR prototype “the most advanced thing we’ve ever produced as a species”
  • “Google Search Killer” Perplexity raised $73.6M  
  • MidJourney is THE name to know in AI imagery, and now they are working on video.  The CEO says they “can get to the holodeck” by the end of 2024.
  • Open Interpreter lets AI vision operate your computer
  • Flowpilot is open-source driverless car software… that runs on your phone.
  • Bland.ai is a robocalling AI voice company that integrates with Zapier automation (brace yourself).  
  • CAPTCHAs can now be easily solved with free open-source AI automation.
  • Stanford’s new robot autonomously cooks and serves shrimp, calls and takes an elevator, wipes up spills, and more.  It’s trained with a combination of human guidance and AI.
  • An AI assistant called Rabbit is coming out next week and getting a lot of buzz.

Top Sixteen Stories 

These are the must-click links if you only have time for a few.  Even if they look boring, click them!  I did the work, so you don’t have to worry.  All are 10/10 would recommend. 

The Rest: AI News of The Week

Don’t let the volume overwhelm you.  Have fun and skim it. The links are organized by topic, sorted from ‘coolest’ to ‘least cool’, and each topic is clearly defined with a headline.  I’ve added a description and glossary of what the topics mean, beneath each label, in plain language.  I do the work so you don’t have to!   The links descriptions are often pulled directly from tweets or articles, so it’s not always my voice.  Pause when you see something that interests you.  Reach out to me any time. I enjoy sharing and discussing these items!

Agents

An AI agent is when AI completes tasks autonomously on your behalf.  This could be shopping, customer service, replying to an email, monitoring traffic, etc.  Agents are a bit like “co-pilots” (assistants) but the difference is agents are autonomous, once set into motion.

Bland.ai is a robocalling AI company that integrates with Zapier.  What could go wrong? I’m afraid to sign up for the demo call… LOL.
https://www.bland.ai/turbo

Open Interpreter 0.2.0—The New Computer Update—is out today.

– OS Mode lets vision models operate your computer

– We included a new model for precise GUI control

– We’re launching a Computer API for LLMs

Microsoft Copilot for Outlook is pretty solid, but it hallucinates instead of doing the single most useful feature for an AI email assistant to handle – scheduling  When it gave me the intriguing option to “accept & suggest times” the times it made up were booked or on Saturdays.

2023 was a big year for AI agents – but 2024 will be even bigger. 

My market map + a spreadsheet of the 60 products I’m tracking

Artificial General Intelligence (AGI)

AGI, in a nutshell, is when AI beats humans at everything.  It’s a horizon that is tough to define, tough to imagine, and tough to predict.

“The last year seems to have changed the timeline of AI, according to a regular survey of published computer scientists working in AI:

In 2016, the median date for when AI would exceed humans at all tasks was 2061. In 2022, it was 2060

In 2023, it was 2047. (10% chance by 2027!)”

Thousands Of AI Authors On The Future Of AI

“In the largest survey of its kind, we surveyed 2,778 researchers who had published in top-tier artificial

intelligence (AI) venues, asking for their predictions on the pace of AI progress and the nature and

impacts of advanced AI systems. The aggregate forecasts give at least a 50% chance of AI systems

achieving several milestones by 2028, including autonomously constructing a payment processing

site from scratch, creating a song indistinguishable from a new song by a popular musician, and

autonomously downloading and fine-tuning a large language model. If science continues undisrupted,

the chance of unaided machines outperforming humans in every possible task was estimated at 10%

by 2027, and 50% by 2047.” 

Anthropic’s Dario Amodei on AI’s limits: ‘I’m not sure there are any’

https://finance.yahoo.com/news/anthropics-dario-amodei-ais-limits-184557981.html

Apple

Siri generative AI capabilities to be announced at WWDC – Rumor

https://9to5mac.com/2024/01/04/siri-generative-ai-2024/

“Apple’s LLM gap is real. It might not last much longer.”

https://joanwestenberg.com/blog/apples-llm-gap-is-real-it-might-not-last-much-longer

Augmented/Virtual Reality

AR might finally have its breakthrough in 2024.  In addition to the news links, below, there is a term everyone should know: Gaussian Splat.  Splats help blur the lines between still and 3D images, stitching together a scene from stills.   When you see the phrase, “scale neural volume rendering to high resolution by rendering every pixel as a splat” it might sound nerdy, but this is the stuff that is going to lead to breakthroughs. 

Meta’s CTO calls its AR glasses prototype “the most advanced thing we’ve ever produced as a species”

Meta has built a sophisticated, prohibitively expensive pair of AR glasses that could be shown to the public in 2024.

https://mixed-news.com/en/bosworth-orion-ar-glasses-2024/

Midjourney CEO in office hours just said he thinks they “can get to the holodeck” by 2024.  “We’re gonna build a lot of stuff this year. I think we’ll build more stuff than I’ve ever built before…By the end of 2024 hopefully we have real-time open worlds”

“These aren’t just video recordings, these are all dynamic 3D Guassian scenes.

Will gaussians become the next generation of pixels?”

https://www.linkedin.com/feed/update/urn:li:activity:7148446781637283840/

“New patent filings for Vision Pro show off just how small the main component really is. Majority of the bulk is pure cushion/light-seal.”

UC San Diego and NVIDIA’s latest project pushes the envelope in 3D-aware generative modeling. By scaling neural volume rendering to high resolutions, this method resolves fine-grained 3D geometry with unprecedented detail. Employing learning-based samplers, it accelerates neural rendering for 3D GAN training using significantly fewer depth samples, allowing for the explicit rendering of every pixel of the full-resolution image during training and inference. This eliminates the need for post-processing super resolution in 2D. The result is a method that synthesizes high-resolution 3D geometry and strictly view-consistent images, setting a new standard for unsupervised learning of 3D shapes in 3D GANs.

https://www.linkedin.com/feed/update/urn:li:activity:7149050506974281728/

Sungkyunkwan University and Hanwha Vision bring forth Deblurring 3D Gaussian Splatting, a novel approach addressing the challenge of blurry input in 3D scene reconstructions. This technique empowers a 3D Gaussian splatting-based method to reconstruct fine and sharp details from blurry images, a significant advancement for real-time rendering applications. Employing a multi-layer perceptron (MLP) to manipulate the covariance of each 3D Gaussian, the model tailors the scene’s blurriness to enhance sharpness and clarity.

https://www.linkedin.com/feed/update/urn:li:activity:7147973111026503680/

‘Ready Player One’ to Be Turned Into Massive Metaverse Experience in Partnership With Warner Bros. Discovery

https://variety.com/2024/digital/news/ready-player-one-metaverse-warner-bros-discovery-1235862190/

Leaked Video Shows Browsing Safari in Apple Vision Pro Is Super Smooth

https://www.inverse.com/tech/apple-vision-pro-safari-web-browser-video

Spatial Computing is a term that you will see increase in use across tech news from CES2024 and in tech announcements in the months to come, yet it still is in its infancy. It will take several years for Spatial Computing to evolve into its full potential and impact business and how we engage with each other and with technology in the same way or in an even more impactful way than the past phases of computing have. The age of AI hardware, smartglasses, and spatial computing is here – and you can help shape that future today. 

https://www.forbes.com/sites/cathyhackl/2024/01/06/what-is-spatial-computing/?sh=63f8ec63360e

AR Bubbles that sit in living rooms and let users walk into other places seem like a really fun feature.

Super excited to bring probably the first budgeting and money management app for Apple Vision Pro when it launches in the US early this year. 

Human NeRFs have shown incredible abilities rendering 3D avatar from single monocular videos. But can they train on videos with occlusions? I’m excited to share our new paper “Wild2Avatar: Rendering Humans Behind Occlusions”

scale neural volume rendering to high resolution by rendering every pixel to ensure that “what you see in 2D, is what you get in 3D”

Business Enterprise

JPMorgan’s AI research team just introduced DocLLM.

DocLLM is an AI model extension designed to better understand complex business documents.

DocLLM outperformed leading models (including GPT-4) by over 15% on some form analysis challenges.

Docllm: A Layout-Aware Generative Language Model For Multimodal Document Understanding

https://arxiv.org/pdf/2401.00908.pdf

The wildest deal in tech right now is about to turn 6-month-old LLM startup Mistral into a $2 billion unicorn, sources say

https://www.businessinsider.com/mistral-in-talks-to-raise-funding-at-2-billion-valuation-2023-11

OpenAI’s annualized revenue reportedly tops $1.6B

https://siliconangle.com/2024/01/01/openais-annualized-revenue-reportedly-tops-1-6b/

OpenAI annualized revenue tops $1.6 billion- The Information

https://www.reuters.com/technology/openai-annualized-revenue-tops-16-billion-information-2023-12-30/

Here are examples of early studies and users that show us how AI is transforming work, education & our perceptions of the truth. And it is very early days.

https://www.oneusefulthing.org/p/signs-and-portents

There’s something going on with AI startups in France

In just a few months, dozens of French entrepreneurs have turned their focus to AI

https://techcrunch.com/2023/11/09/theres-something-going-on-with-ai-startups-in-france/

AI-created “virtual influencers” are stealing business from humans

Brands are turning to hyper-realistic, AI-generated influencers for promotions.

https://arstechnica.com/ai/2023/12/ai-created-virtual-influencers-are-stealing-business-from-humans/

Artificial intelligence to bring Elvis Presley to London stage with ‘never seen before’ performances

Presley’s estate has granted access to thousands of the singer’s personal photos and hours of home video footage, and the firm says this is being used to create “never seen before” performances.

https://news.sky.com/story/ai-elvis-presley-to-star-on-uk-stage-for-first-time-with-never-seen-before-performances-13041602

https://www.layeredreality.com/

How IBM Sees AI Changing the Game for Companies of All Sizes with IBM’s VP of Technology and Director of Startups

https://www.saastr.com/how-ai-is-changing-the-game-for-companies-with-ibm/

Amazon’s Silent Sacking (sort of a rant more than anything, but it was interesting)

https://justingarrison.com/blog/2023-12-30-amazons-silent-sacking/

Rora 2023 Salary Negotiation Report for AI Researchers

https://www.teamrora.com/post/ai-researchers-salary-negotiation-report-2023

How Leading AI Startup Investors Approached Artificial Intelligence In 2023

https://news.crunchbase.com/ai/startup-investors-bessemer-sequoia-m12-eoy-2023

Education

There was a great line I couldn’t quite remember from “One Hundred Years of Solitude”

Search engines & ChatGPT couldn’t recall it either from my bad description

Wrote a little 20 line program to read the whole book and found it in a minute, cost 25 cents

Half Of All Skills Will Be Outdated Within Two Years, Study Suggests

https://www.forbes.com/sites/joemckendrick/2023/10/14/half-of-all-skills-will-be-outdated-within-two-years-study-suggests/?sh=56f7095a2dc2

What We Learned About AI and Education in 2023

https://aisupremacy.substack.com/p/what-we-learned-about-ai-and-education

New research from the University of Michigan revealed LLMs demonstrate higher performance when prompted to act as gender-neutral or male rather than female.

The Seoul City Government in South Korea announced that it will use drones and AI to monitor traffic conditions in real time starting in 2024.

South Korea Will Use Drones And Ai To Monitor Traffic Conditions Next Year

https://www.gizchina.com/2023/12/27/south-korea-drones-ai-traffic-monitoring/amp/

CAPTCHAs are now easily solved automatically with a free, open source vision AI.

Add that to open source voice cloning, open source realistic AI photographs of people, and the existing (closed source, for now) video avatar tools and, well, identity is going to be an issue.

Google’s SEO guru John Mueller weighs in on the AI-generated vs. stock images debate

https://the-decoder.com/googles-seo-guru-john-mueller-comments-on-ai-generated-images

Robin AI raises $26 million to take its AI-powered legal contract solutions global

https://the-decoder.com/robin-ai-raises-26-million-to-take-its-ai-powered-legal-contract-solutions-global/

Ex-Trump lawyer Michael Cohen says he unwittingly sent AI-generated fake legal cases to his attorney

https://apnews.com/article/michael-cohen-donald-trump-artificial-intelligence-777ace9cc34aa0e56398fd47a1d6b420

Google wrote a ‘Robot Constitution’ to make sure its new AI droids won’t kill us

https://www.theverge.com/2024/1/4/24025535/google-ai-robot-constitution-autort-deepmind-three-laws

Images altered to trick machine vision can influence humans too

https://deepmind.google/discover/blog/images-altered-to-trick-machine-vision-can-influence-humans-too/

Can AI be Contained?
A Conversation And Q&A With Nyu Stern Professor Scott Galloway And Inflection Ai Co-Founder Mustafa Suleyman

https://www.sectionschool.com/events/live-events/can-ai-be-contained

Nikon, Sony and Canon fight AI fakes with new camera tech (these never work for long)

Digital signatures to provide way to tell real photos from deepfakes

https://asia.nikkei.com/Business/Technology/Nikon-Sony-and-Canon-fight-AI-fakes-with-new-camera-tech

Tracking AI, a new tool that monitors bias in AI chatbots, concluded that every chatbot other than open-source Mistral scores politically left. Also, apparently grok is actually the most left-wing of all major language models?

WhiteRabbitNeo-33B is a spectacular open source LLM that “knows” red and blue team Cybersecurity.

https://huggingface.co/whiterabbitneo/WhiteRabbitNeo-13B

Security for advanced LLMs doesn’t work like other IT security

A readable paper shows how different easy attacks can cause security issues: fine-tuning can eliminate guardrails, documents can contain malicious instructions & functions can contain bad code

https://arxiv.org/pdf/2312.14302.pdf

We could end up in a world where there are some AIs made by massive US firms with unambiguous rights to their training data, AI startups legally based in places like Japan that allow unlimited training, & then a ton of open source models that don’t really pay attention to rights.

AI Is Not Conscious

  • The Association for Mathematical Consciousness Science (AMCS) calls for increased funding for consciousness and AI research, citing its absence in recent AI safety discussions.
  • A comprehensive study by 19 researchers presents criteria to assess AI consciousness and concluded that no current AI systems are conscious.
  • Opposite views exists on whether or not AI can indeed become consciousness at some point, but it’s probably safer to assume that it is possible than to disregard it.

https://jurgengravestein.substack.com/p/ai-is-not-conscious

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

https://arxiv.org/abs/2305.04388

Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

Are current language models capable of deception and lie detection? We study this question by introducing a text-based game called Hoodwinked, inspired by Mafia and Among Us.

https://arxiv.org/abs/2308.01404

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

https://arxiv.org/abs/2311.07590

Explainers

Stuff we figured out about AI in 2023

https://simonwillison.net/2023/Dec/31/ai-in-2023/

Nine Big Predictions for AI in 2024

AI will be a big year for open weights models, agents, hacks, new model architectures, legislation, legal cases and so much more. Welcome to battleground AI and the war for the future of technology.

https://danieljeffries.substack.com/p/nine-big-predictions-for-ai-in-2024

Google

Newly discovered code and screenshots reveal Google is creating a paid tier “Bard Advanced” option powered by the upcoming Gemini Ultra. Ultra will need to live up to some major hype to entice users to switch from other paid options.

Additional Google Bard leaks:

-‘Motoko’, which will reportedly allow users to create custom bots (similar to GPT Store).

-“Power-ups”, “Gallery”, “Tasks”, “Sharing”, and more.

Google appears to be working on an ‘advanced’ version of Bard that you have to pay for

https://www.theverge.com/2024/1/4/24025270/google-bard-advanced-paid-subscription

OpenAI moves to shrink regulatory risk in EU around data privacy

https://techcrunch.com/2024/01/02/openai-dublin-data-controller/

Images

MidJourney

–v 6 is truly next level for fashion.  boymolish

 posted a flat lay of some clothing and accessories (original post linked below)

I turned it into a prompt and ran it through Midjourney

Not a perfect 1-to-1, but it’s pretty damn good.

iphone photo of a couple sitting on a black leather sofa. On the right is a stocky 30-year-old white man with a brown beard wearing a tight gray tee shirt with a baseball cap, one hand is placed on his cheek while the other is holding his cell phone. On the left is a thin 25-year-old Mexican woman wearing a floral blouse, looking at the man. The man is looking at his phone as the woman looks at him, clearly annoyed. They are in a small apartment building with fluorescent lighting and a window. Photo shot on iPhone –style raw –ar 7:5 –v 6.0

A prompt describing 3 characters in detail, including ethnicity, age, and 6 clothing items with 5 color assignments. It only took a few rerolls and variations for Midjourney v6 to nail it.

Prompting Midjourney with “phone photo” creates AI images that are almost impossible to identify as fake (until you look closely).

https://www.linkedin.com/feed/update/urn:li:activity:7148336778544939008/

ByteDance (parent of TikTok)

Don’t sleep on ByteDance!

DreamTuner: Single Image is Enough for Subject Driven Generation

https://dreamtuner-diffusion.github.io/

Scenario

There have been a bunch of Consistent AI Character videos over the last year, but I’ve never made one–  Until now. Because this one actually works.

https://www.youtube.com/watch?v=v_FXC0iq1Sk

Tutorial on how to create an AI Influencer (basically click bait… but it does talk about how to make consistent characters with Scenario)

Snap, Inc.

Image Resoration

Researchers from UCLA and Snap just introduced a new AI approach called dual-pivot tuning.

The approach leverages personal photos to customize image restoration models, better preserving individual facial features.

https://arxiv.org/pdf/2312.17234.pdf

Microsoft

Microsoft recently introduced new Android, iOS, and iPadOS apps for Copilot, allowing users to access the powerful (and free) AI tool on the go.  GPT-4, DALL-E 3, Voice Chat, and Vision are all available for free now without the need for a ChatGPT+ plan.

Microsoft’s Copilot app is now available on iOS

The Microsoft Copilot app lets you ask questions, draft text, and generate images using AI.

https://www.theverge.com/2023/12/29/24019288/microsoft-copilot-app-available-iphone-ipad-ai

Microsoft’s next Surface laptops will reportedly be its first true ‘AI PCs’

https://www.theverge.com/2023/12/28/24017890/microsoft-ai-surface-laptops-arm

Introducing a new Copilot key to kick off the year of AI-powered Windows PCs

https://blogs.windows.com/windowsexperience/2024/01/04/introducing-a-new-copilot-key-to-kick-off-the-year-of-ai-powered-windows-pcs/

Microsoft’s new Copilot key is the first big change to Windows keyboards in 30 years

https://www.theverge.com/2024/1/4/24023809/microsoft-copilot-key-keyboard-windows-laptops-pcs

News/Journalism

OpenAI Offers Publishers as Little as $1 Million a Year

https://www.theinformation.com/articles/openai-offers-publishers-as-little-as-1-million-a-year

Open AI GPT Store

OpenAI’s app store for GPTs will launch next week

https://techcrunch.com/2024/01/04/openais-app-store-for-gpts-will-launch-next-week/

Open AI GPT page

https://openai.com/blog/introducing-gpts

OpenAI just announced the GPT Store is cFoming next week, and it’s a massive opportunity for early adopters. Here’s a simple 4-step tutorial on how to make a GPT now to profit from this next wave:

GPT Builder Help Page

https://help.openai.com/en/articles/8770868-gpt-builder

Dunking on the concept.

New York Times Lawsuit POVs

More evidence that AIs are not going to be limited by the amount of human-created content available for them to train on. This is another paper suggesting that AIs training on AI-created data can achieve higher quality results than just using human-created data alone.

The NYT vs OpenAI Is Not Just a Legal Battle

There are three layers to this complex conflict and we must understand all three

https://thealgorithmicbridge.substack.com/p/the-nyt-vs-openai-is-not-just-a-legal

HuggingFace Flex – OpenSource NYT Training Tokens

This dataset provides a collection of over 35,000 tokens of text adhering to the New York Times writing style guide. The data is formatted in JSON and is suitable for various natural language processing tasks, text generation, style transfer, and more.

https://huggingface.co/datasets/TuringsSolutions/NYTWritingStyleGuide

OpenAI’s news publisher deals reportedly top out at $5 million a year

The ChatGPT company has been trying to get more news organizations to sign licensing deals to train AI models. 

https://www.theverge.com/2024/1/4/24025409/openai-training-data-lowball-nyt-ai-copyright

Things are about to get a lot worse for Generative AI (obligatory grumpy Gary Marcus post)

https://garymarcus.substack.com/p/things-are-about-to-get-a-lot-worse

Perplexity

Jeff Bezos Bets on a Google Challenger Using AI to Try to Upend Internet Search

Perplexity, with a fraction of Google’s users, raised the largest sum by an internet search startup in recent years

https://www.wsj.com/tech/ai/jeff-bezos-bets-on-a-google-challenger-using-ai-to-try-to-upend-internet-search-0859bda6

“Google Search Killer” Perplexity Raises $73.6M

Excited to announce we’ve raised 73.6M$ at 520M$ valuation, led by IVP, along with our seed and Series A lead investors NEA, Elad Gil, Nat Friedman.

Search startup Perplexity AI valued at $520 mln in funding from Bezos, Nvidia

https://www.reuters.com/technology/perplexity-ai-valued-520-mln-funding-bezos-nvidia-2024-01-04/

Perplexity Raises Series B Funding Round 

https://blog.perplexity.ai/blog/perplexity-raises-series-b-funding-round

Robotics/Embodiment

Mobile ALOHA demo videos of robots doing chores are getting a lot of traction.  They are an interesting mix of remote operation plus learning.  For example, an operator will remotely use the robot to cook eggs, and along the way, the robot learns to cook eggs (with untrained variables).  That’s my layperson’s assessment. It is also instant Breeders in my head (if you like old school indie rock)

Mobile ALOHA can autonomously complete complex mobile manipulation tasks: cook and serve shrimp,call and take elevator, store a 3Ibs pot to a two-door cabinet.

More Mobile ALOHA examples:

Do laundry, self-charge, use a vacuum, water plants, load and unload a dishwasher, use a coffee machine, obtain drinks from the fridge and open a beer, open doors, play with pets, throw away trash, turn on/off a lamp.

Google DeepMind Robotics

Today, we’re announcing a suite of research advances that enable robots to make decisions faster as well as better understand and navigate their environments.

https://deepmind.google/discover/blog/shaping-the-future-of-advanced-robotics/?utm_source=twitter&utm_medium=social

Tesla Automation Video

Giga Shanghai operates a 95% automated production line, enabling a cycle time of less than 40 seconds!

Bipedal robot with wheels that adapts to any terrain

Flowpilot allows you to set up autonomous driving on cars with just a smartphone.

The open-source driver assistant can run on Linux, Windows, and Android devices.

Innovative but also terrifying and super illegal at the same time!

Flowpilot is an open source driver assistance system built on top of openpilot, that can run on most windows/linux and android powered machines. It performs the functions of Adaptive Cruise Control (ACC), Automated Lane Centering (ALC), Forward Collision Warning (FCW), Lane Departure Warning (LDW) and Driver Monitoring (DM) for a growing variety of supported car makes, models, and model years maintained by the community.

https://github.com/flowdriveai/flowpilot

Humanoid Robots Are Getting To Work

Humanoids from Agility Robotics and seven other companies vie for jobs

https://spectrum.ieee.org/humanoid-robots

Roku

Roku introduces Pro Series streaming TVs, with an AI picture adjustment feature

Samsung

Samsung Galaxy AI Phone is teased for a Jan 17th.  

https://www.samsung.com/us/smartphones/the-next-galaxy/reserve/?cid=smf-mktg-brd-mob-us-011123-114715

Galaxy Unpacked 2024: Opening a New Era of Mobile AI

https://news.samsung.com/global/invitation-galaxy-unpacked-2024-opening-a-new-era-of-mobile-ai

Samsung announced its Galaxy launch event, while praising “Galaxy AI” — signaling major AI features coming.

Some of the Samsung Galaxy S24’s key AI features just leaked

https://www.techradar.com/phones/samsung-galaxy-phones/some-of-the-samsung-galaxy-s24s-key-ai-features-just-leaked

Samsung also just revealed its new steam cleaning, AI-powered vacuum.

New AI integrations help the robot distinguish between rooms, detect stains, and identify floor surfaces for optimal cleaning.

Science/Medicine

“Aided by AI, New Catheter Design Prevents Bacterial Infections”

It’s fun to see tech optimists leaning into new ideas and thinking about the future.

“Imagine using your own local Intelligence Amplifier with all the sensors, including the Thermal Sensor, you pass by the imager before a shower every day and you get a “hot spot” report, some that you will share with your doctor.”

Researchers just developed DeWave- an AI system that can turn silent thoughts into text by decoding brain signals. The system achieved over 40% accuracy in translating verbs directly from neural signals, without the need for invasive implants

.https://twitter.com/rowancheung/status/1742417879570473065

DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation

https://arxiv.org/abs/2309.14030

The AI–quantum computing mash-up: will it revolutionize science?

https://www.nature.com/articles/d41586-023-04007-0

Google’s AI Reads Retinas to Prevent Blindness in Diabetics

Artificial brains built by Google can recognize cats in photos. Now they’re gaining a more serious kind of sight to help humans.

https://www.wired.com/2016/11/googles-ai-reads-retinas-prevent-blindness-diabetics/

AI-Enabled Microscopes Demonstrate the Potential for More Timely and Accurate Cancer Detection

https://www.diu.mil/latest/augmented-reality-microscope

Independent assessment of a deep learning system for lymph node metastasis detection on the Augmented Reality Microscope

https://www.sciencedirect.com/science/article/pii/S2153353922007362

An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis

https://www.nature.com/articles/s41591-019-0539-7.epdf

Microscope 2.0: An Augmented Reality Microscope with Real-time Artificial Intelligence Integration

https://arxiv.org/abs/1812.00825

Jellypipe Unveils AI Assistant for 3D Printing: Optimizing Material Selection and Pricing with GPT-4

Video

2023 was a breakout year for AI video.  In January, there were no public text-to-video models. Now, there are dozens of video gen products and millions of users. A recap of the biggest developments + companies to watch

Examples of short clips using Runway

More examples of video-to-video (with what I assume is a latent diffusion model?) using Domo AI.

Barbie turned into anime

Turning live action Barbie back into dolls
AIWarper and their rendition of Kendomland from Barbie.

https://www.linkedin.com/feed/update/urn:li:activity:7141819283591806977/

This is cool use of video-to-video AI model to create sprite sheet for retro pixel art game…

Midjourney is reportedly starting training on video models this month, the founder revealed on Discord. The image generation platform also has plans to bring 3D and video generation to the platform shortly

Midjourney starts training video models in January, v6 updates coming soon

https://the-decoder.com/midjourney-starts-training-video-models-in-january-v6-updates-coming-soon/

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

https://huggingface.co/papers/2312.09767

Vision

Google’s Gemini Pro and OpenAI’s GPT-4V compete in visual capabilities (GPT4 has the edge for now)

https://the-decoder.com/googles-gemini-pro-and-openais-gpt-4v-compete-in-visual-capabilities/

Wearables

Rabbit teases Jan 9 launch with trailer

https://www.rabbit.tech/updates/reveal

Chips/Hardware

AI chips saw major advancements in Q4 2023. Here’s a list of 9 developments that caught my eye:

https://twitter.com/prateekvjoshi/status/1742985424640098398

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

https://www.databricks.com/blog/llm-training-and-inference-intel-gaudi2-ai-accelerators

Google targets AI PC market with Chromebook Plus

https://www.digitimes.com/news/a20231226PD217/google-ai-pc-chromebook-microsoft-acer.html

Nvidia to launch slower version of its gaming chip in China to comply with U.S. export controls

https://www.cnbc.com/amp/2023/12/29/nvidia-brings-slower-gaming-chip-version-to-china-to-bypass-us-rules.html

Intel to spin out AI software firm with outside investment

https://finance.yahoo.com/news/intel-spins-ai-software-firm-133626026.html

Intel Gaudi AI Accelerator Gains 2x Performance Leap on GPT-3 with FP8 Software

The latest MLPerf results for Intel Gaudi2 and 4th Gen Intel Xeon demonstrate how Intel is raising the bar for AI performance with cost-effective and high-performance AI solutions.

https://www.intel.com/content/www/us/en/newsroom/news/intel-gaudi-ai-accelerator-brings-greater-ai-choice.html

Technical/Dev

The Random Transformer

Understand how transformers work by demystifying all the math behind them. In this blog post, we’ll do an end-to-end example of the math within a transformer model. The goal is to get a good understanding of how the model works.

https://osanseviero.github.io/hackerllama/blog/posts/random_transformer/

Sparse Mixtures of Experts has been one of the most impactful innovations in ML in the recent years, enabling breakthroughs such as OpenAI’s GPT-4, Google’s Switch Transformer, Mistral AI’s Mixtral-8x7B, and more.  That said, I think that we’re just starting to see the full impact of sparse MoE on modern ML applications, and that new innovations in this domain will make it even more efficient, resulting in even bigger and more accurate models across domains.

https://www.linkedin.com/feed/update/urn:li:activity:7147789113738338304/

Escaping Plato’s Cave with Multimodal AI

https://pratapranade.substack.com/p/escaping-platos-cave-with-multimodal

The Role of the Ontologist in the Age of LLMs

https://ontologist.substack.com/p/the-role-of-the-ontologist-in-the

Fusion Knowledge Graphs and Language Models Through Compatible Generative Modeling

https://ai.plainenglish.io/fusion-knowledge-graphs-and-language-models-through-compatible-generative-modeling-e2607ba889a9

March of last year, I didn’t know how to code in python. I asked GPT-4 to implement the basic functionality of getting HTTP data from an API endpoint back in April/May of last year.  Then, I put the code base on a secret Gist and kept pointing the GPT-4 Plugin to its own source code with new feature requests, improvements, and bug fixes. Directing its attention to the parts it was to work on next.

 I repeated this process several thousand times over the course of the next 7 months to today. And now it’s a fully Asynchronous Quart ASGI service pushing 5,000 lines of code, and can do all this.

https://twitter.com/JD_2020/status/1742107114510643270

I wanted to see if ChatGPT could build me an entire game without me seeing a SINGLE LINE of code.I wanted to play it directly in my web browser. I didn’t want to copy/paste code to a code editor or GirHub.

https://twitter.com/JD_2020/status/1740918345170374896

This paper summarizes 32 techniques to mitigate hallucination in LLMs. Introduces a taxonomy categorizing methods like RAG, Knowledge Retrieval, CoVe, and more. 

https://twitter.com/omarsar0/status/1742633831234994189

https://t.co/hRPWSW4tKC

I recently gave a guest lecture (outline below) about LLMs for code and math for the “AI Foundation Models” course at Yale, and I’ve just made the slides and recordings publicly available.
https://twitter.com/AnsongNi/status/1742686969166225499
https://github.com/niansong1996/niansong1996.github.io/blob/master/files/Ansong_Yale_488_guest_lecture.pdf

Introducing Text-to-CAD
https://zoo.dev/blog/introducing-text-to-cad

Apple’s new M3 Silicon revolutionizes AI computing! Now you can build your own mini data center for Large Language Models and AI projects. Compact, powerful, and a game-changer in technology
https://twitter.com/benitoz/status/1743725009728942351 

Bash One-Liners for LLMs
https://justine.lol/oneliners/ 

Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
This study conducts a thorough evaluation of Gemini Pro’s efficacy in commonsense reasoning tasks, employing a diverse array of datasets that span both language-based and multimodal scenarios.

https://github.com/eternityyw/gemini-commonsense-evaluation

Noise-free Optimization in Early Training Steps for Image Super-Resolution

https://arxiv.org/abs/2312.17526v1

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

https://arxiv.org/abs/2312.13913v2

A case for AI alignment being difficult

https://www.lesswrong.com/posts/wnkGXcAq4DCgY8HqA/a-case-for-ai-alignment-being-difficult

Is “A Helpful Assistant” the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

https://arxiv.org/abs/2311.10054

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

https://auffusion.github.io/

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

https://arxiv.org/abs/2401.01313

Credits/Sources

Most of these links come from just a few incredible sources.  Please follow them:

Previous Issues

33 responses to “AI News #14: Week Ending 01/05/2024 with Executive Summary and Top 16 Stories”

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading