Welcome to issue #14 of the latest artificial intelligence news and trends for the week ending January 5th, 2024.
This week’s cover features a rabbit watching the Killer Rabbit of Caerbannog from “Monty Python and the Holy Grail” using VR goggles. Behind the rabbit is the Meta logo. This image was created using this week’s prompting hack: placing the term “iPhone photo” in the description. “iPhone photo of a rabbit wearing VR goggles –ar 16:9 –v 6.0 –style raw”. It also references next week’s launch of the Rabbit AI assistant. Meta CTO ‘Boz’ says Meta’s new AR headset is “the most advanced thing we’ve ever produced as a species”, which is why the Meta logo is “behind the rabbit”. Cover created with MidJourney and edited with Photoshop. The font is Cooper Std which is stylistically similar to the Monty Python Intermission card. Here’s the scene: https://www.youtube.com/watch?v=pmu5sRIizdw
Executive Summary
This week in AI was full of hyperbole, and that’s saying a lot.
- Meta’s CTO called their AR prototype “the most advanced thing we’ve ever produced as a species”
- “Google Search Killer” Perplexity raised $73.6M
- MidJourney is THE name to know in AI imagery, and now they are working on video. The CEO says they “can get to the holodeck” by the end of 2024.
- Open Interpreter lets AI vision operate your computer
- Flowpilot is open-source driverless car software… that runs on your phone.
- Bland.ai is a robocalling AI voice company that integrates with Zapier automation (brace yourself).
- CAPTCHAs can now be easily solved with free open-source AI automation.
- Stanford’s new robot autonomously cooks and serves shrimp, calls and takes an elevator, wipes up spills, and more. It’s trained with a combination of human guidance and AI.
- An AI assistant called Rabbit is coming out next week and getting a lot of buzz.
Top Sixteen Stories
These are the must-click links if you only have time for a few. Even if they look boring, click them! I did the work, so you don’t have to worry. All are 10/10 would recommend.
- MidJourney continues to amaze:
- AI imagery demo: a Twitter user posted a ‘laydown’ of clothing, and another turned it into a model wearing similar clothes, just with text prompting.
- The image leader will start training video models in January
- Stanford’s “Mobile ALOHA can autonomously complete complex mobile manipulation tasks: cook and serve shrimp,call and take elevator, store a 3Ibs pot to a two-door cabinet.”
- Prompting Midjourney with “phone photo” creates AI images that are difficult to spot as fake
- Bland.ai is a robocalling AI company that integrates with Zapier automation. What could go wrong? I’m afraid to sign up for the demo call… LOL.
- Meta has created a “prohibitively expensive pair of AR glasses” that CTO Andrew Bosworth calls “the most advanced thing we’ve ever produced as a species”
- The Humane Pin wearable AI was a PR flop. But a new contenter, Rabbit AI, teases Jan 9 launch with a trailer that is getting a lot of buzz.
- “Flowpilot allows you to set up autonomous driving on cars with just a smartphone. The open-source driver assistant can run on Linux, Windows, and Android devices. Innovative but also terrifying and super illegal at the same time!”
- “Midjourney CEO in office hours just said he thinks they ‘can get to the holodeck’ by 2024. ‘We’re gonna build a lot of stuff this year. I think we’ll build more stuff than I’ve ever built before…By the end of 2024 hopefully we have real-time open worlds’”
- “Google Search Killer” Perplexity Raises $73.6M. “Excited to announce we’ve raised 73.6M$ at 520M$ valuation, led by IVP, along with our seed and Series A lead investors NEA, Elad Gil, Nat Friedman.”
- Open Interpreter lets AI *vision* operate your computer
- “JPMorgan’s AI research team just introduced DocLLM. DocLLM is an AI model extension designed to better understand complex business documents. DocLLM outperformed leading models (including GPT-4) by over 15% on some form analysis challenges.”
- “Tesla Automation Video: Giga Shanghai operates a 95% automated production line, enabling a cycle time of less than 40 seconds”
- “Researchers just developed DeWave- an AI system that can turn silent thoughts into text by decoding brain signals. The system achieved over 40% accuracy in translating verbs directly from neural signals, without the need for invasive implants”
- “CAPTCHAs are now easily solved automatically with a free, open source vision AI. Add that to open source voice cloning, open source realistic AI photographs of people, and the existing (closed source, for now) video avatar tools and, well, identity is going to be an issue.”
- The nerdiest yet most important story to know: “These aren’t just video recordings, these are all dynamic 3D Guassian scenes. Will gaussians become the next generation of pixels?” As of now, I predict AR will be the top new pop-culture technology of 2024 (rivaling LLMs).
- Real time video transformation:
- Barbie turned into anime with Domo AI
- Turning live action Barbie back into dolls
The Rest: AI News of The Week
Don’t let the volume overwhelm you. Have fun and skim it. The links are organized by topic, sorted from ‘coolest’ to ‘least cool’, and each topic is clearly defined with a headline. I’ve added a description and glossary of what the topics mean, beneath each label, in plain language. I do the work so you don’t have to! The links descriptions are often pulled directly from tweets or articles, so it’s not always my voice. Pause when you see something that interests you. Reach out to me any time. I enjoy sharing and discussing these items!
Agents
An AI agent is when AI completes tasks autonomously on your behalf. This could be shopping, customer service, replying to an email, monitoring traffic, etc. Agents are a bit like “co-pilots” (assistants) but the difference is agents are autonomous, once set into motion.
Bland.ai is a robocalling AI company that integrates with Zapier. What could go wrong? I’m afraid to sign up for the demo call… LOL.
https://www.bland.ai/turbo
Open Interpreter 0.2.0—The New Computer Update—is out today.
– OS Mode lets vision models operate your computer
– We included a new model for precise GUI control
– We’re launching a Computer API for LLMs
Microsoft Copilot for Outlook is pretty solid, but it hallucinates instead of doing the single most useful feature for an AI email assistant to handle – scheduling When it gave me the intriguing option to “accept & suggest times” the times it made up were booked or on Saturdays.
2023 was a big year for AI agents – but 2024 will be even bigger.
My market map + a spreadsheet of the 60 products I’m tracking
Artificial General Intelligence (AGI)
AGI, in a nutshell, is when AI beats humans at everything. It’s a horizon that is tough to define, tough to imagine, and tough to predict.
“The last year seems to have changed the timeline of AI, according to a regular survey of published computer scientists working in AI:
In 2016, the median date for when AI would exceed humans at all tasks was 2061. In 2022, it was 2060
In 2023, it was 2047. (10% chance by 2027!)”
Thousands Of AI Authors On The Future Of AI
“In the largest survey of its kind, we surveyed 2,778 researchers who had published in top-tier artificial
intelligence (AI) venues, asking for their predictions on the pace of AI progress and the nature and
impacts of advanced AI systems. The aggregate forecasts give at least a 50% chance of AI systems
achieving several milestones by 2028, including autonomously constructing a payment processing
site from scratch, creating a song indistinguishable from a new song by a popular musician, and
autonomously downloading and fine-tuning a large language model. If science continues undisrupted,
the chance of unaided machines outperforming humans in every possible task was estimated at 10%
by 2027, and 50% by 2047.”
Anthropic’s Dario Amodei on AI’s limits: ‘I’m not sure there are any’
https://finance.yahoo.com/news/anthropics-dario-amodei-ais-limits-184557981.html
Apple
Siri generative AI capabilities to be announced at WWDC – Rumor
https://9to5mac.com/2024/01/04/siri-generative-ai-2024/
“Apple’s LLM gap is real. It might not last much longer.”
https://joanwestenberg.com/blog/apples-llm-gap-is-real-it-might-not-last-much-longer
Augmented/Virtual Reality
AR might finally have its breakthrough in 2024. In addition to the news links, below, there is a term everyone should know: Gaussian Splat. Splats help blur the lines between still and 3D images, stitching together a scene from stills. When you see the phrase, “scale neural volume rendering to high resolution by rendering every pixel as a splat” it might sound nerdy, but this is the stuff that is going to lead to breakthroughs.
Meta’s CTO calls its AR glasses prototype “the most advanced thing we’ve ever produced as a species”
Meta has built a sophisticated, prohibitively expensive pair of AR glasses that could be shown to the public in 2024.
https://mixed-news.com/en/bosworth-orion-ar-glasses-2024/
Midjourney CEO in office hours just said he thinks they “can get to the holodeck” by 2024. “We’re gonna build a lot of stuff this year. I think we’ll build more stuff than I’ve ever built before…By the end of 2024 hopefully we have real-time open worlds”
“These aren’t just video recordings, these are all dynamic 3D Guassian scenes.
Will gaussians become the next generation of pixels?”
https://www.linkedin.com/feed/update/urn:li:activity:7148446781637283840/
“New patent filings for Vision Pro show off just how small the main component really is. Majority of the bulk is pure cushion/light-seal.”
UC San Diego and NVIDIA’s latest project pushes the envelope in 3D-aware generative modeling. By scaling neural volume rendering to high resolutions, this method resolves fine-grained 3D geometry with unprecedented detail. Employing learning-based samplers, it accelerates neural rendering for 3D GAN training using significantly fewer depth samples, allowing for the explicit rendering of every pixel of the full-resolution image during training and inference. This eliminates the need for post-processing super resolution in 2D. The result is a method that synthesizes high-resolution 3D geometry and strictly view-consistent images, setting a new standard for unsupervised learning of 3D shapes in 3D GANs.
https://www.linkedin.com/feed/update/urn:li:activity:7149050506974281728/
Sungkyunkwan University and Hanwha Vision bring forth Deblurring 3D Gaussian Splatting, a novel approach addressing the challenge of blurry input in 3D scene reconstructions. This technique empowers a 3D Gaussian splatting-based method to reconstruct fine and sharp details from blurry images, a significant advancement for real-time rendering applications. Employing a multi-layer perceptron (MLP) to manipulate the covariance of each 3D Gaussian, the model tailors the scene’s blurriness to enhance sharpness and clarity.
https://www.linkedin.com/feed/update/urn:li:activity:7147973111026503680/
‘Ready Player One’ to Be Turned Into Massive Metaverse Experience in Partnership With Warner Bros. Discovery
https://variety.com/2024/digital/news/ready-player-one-metaverse-warner-bros-discovery-1235862190/
Leaked Video Shows Browsing Safari in Apple Vision Pro Is Super Smooth
https://www.inverse.com/tech/apple-vision-pro-safari-web-browser-video
Spatial Computing is a term that you will see increase in use across tech news from CES2024 and in tech announcements in the months to come, yet it still is in its infancy. It will take several years for Spatial Computing to evolve into its full potential and impact business and how we engage with each other and with technology in the same way or in an even more impactful way than the past phases of computing have. The age of AI hardware, smartglasses, and spatial computing is here – and you can help shape that future today.
https://www.forbes.com/sites/cathyhackl/2024/01/06/what-is-spatial-computing/?sh=63f8ec63360e
AR Bubbles that sit in living rooms and let users walk into other places seem like a really fun feature.
Super excited to bring probably the first budgeting and money management app for Apple Vision Pro when it launches in the US early this year.
Human NeRFs have shown incredible abilities rendering 3D avatar from single monocular videos. But can they train on videos with occlusions? I’m excited to share our new paper “Wild2Avatar: Rendering Humans Behind Occlusions”
scale neural volume rendering to high resolution by rendering every pixel to ensure that “what you see in 2D, is what you get in 3D”
Business Enterprise
JPMorgan’s AI research team just introduced DocLLM.
DocLLM is an AI model extension designed to better understand complex business documents.
DocLLM outperformed leading models (including GPT-4) by over 15% on some form analysis challenges.
Docllm: A Layout-Aware Generative Language Model For Multimodal Document Understanding
https://arxiv.org/pdf/2401.00908.pdf
The wildest deal in tech right now is about to turn 6-month-old LLM startup Mistral into a $2 billion unicorn, sources say
https://www.businessinsider.com/mistral-in-talks-to-raise-funding-at-2-billion-valuation-2023-11
OpenAI’s annualized revenue reportedly tops $1.6B
https://siliconangle.com/2024/01/01/openais-annualized-revenue-reportedly-tops-1-6b/
OpenAI annualized revenue tops $1.6 billion- The Information
https://www.reuters.com/technology/openai-annualized-revenue-tops-16-billion-information-2023-12-30/
Here are examples of early studies and users that show us how AI is transforming work, education & our perceptions of the truth. And it is very early days.
https://www.oneusefulthing.org/p/signs-and-portents
There’s something going on with AI startups in France
In just a few months, dozens of French entrepreneurs have turned their focus to AI
https://techcrunch.com/2023/11/09/theres-something-going-on-with-ai-startups-in-france/
AI-created “virtual influencers” are stealing business from humans
Brands are turning to hyper-realistic, AI-generated influencers for promotions.
https://arstechnica.com/ai/2023/12/ai-created-virtual-influencers-are-stealing-business-from-humans/
Artificial intelligence to bring Elvis Presley to London stage with ‘never seen before’ performances
Presley’s estate has granted access to thousands of the singer’s personal photos and hours of home video footage, and the firm says this is being used to create “never seen before” performances.
https://www.layeredreality.com/
How IBM Sees AI Changing the Game for Companies of All Sizes with IBM’s VP of Technology and Director of Startups
https://www.saastr.com/how-ai-is-changing-the-game-for-companies-with-ibm/
Amazon’s Silent Sacking (sort of a rant more than anything, but it was interesting)
https://justingarrison.com/blog/2023-12-30-amazons-silent-sacking/
Rora 2023 Salary Negotiation Report for AI Researchers
https://www.teamrora.com/post/ai-researchers-salary-negotiation-report-2023
How Leading AI Startup Investors Approached Artificial Intelligence In 2023
https://news.crunchbase.com/ai/startup-investors-bessemer-sequoia-m12-eoy-2023
Education
There was a great line I couldn’t quite remember from “One Hundred Years of Solitude”
Search engines & ChatGPT couldn’t recall it either from my bad description
Wrote a little 20 line program to read the whole book and found it in a minute, cost 25 cents
Half Of All Skills Will Be Outdated Within Two Years, Study Suggests
What We Learned About AI and Education in 2023
https://aisupremacy.substack.com/p/what-we-learned-about-ai-and-education
Ethics/Legal/Security
New research from the University of Michigan revealed LLMs demonstrate higher performance when prompted to act as gender-neutral or male rather than female.
The Seoul City Government in South Korea announced that it will use drones and AI to monitor traffic conditions in real time starting in 2024.
South Korea Will Use Drones And Ai To Monitor Traffic Conditions Next Year
https://www.gizchina.com/2023/12/27/south-korea-drones-ai-traffic-monitoring/amp/
CAPTCHAs are now easily solved automatically with a free, open source vision AI.
Add that to open source voice cloning, open source realistic AI photographs of people, and the existing (closed source, for now) video avatar tools and, well, identity is going to be an issue.
Google’s SEO guru John Mueller weighs in on the AI-generated vs. stock images debate
https://the-decoder.com/googles-seo-guru-john-mueller-comments-on-ai-generated-images
Robin AI raises $26 million to take its AI-powered legal contract solutions global
Ex-Trump lawyer Michael Cohen says he unwittingly sent AI-generated fake legal cases to his attorney
Google wrote a ‘Robot Constitution’ to make sure its new AI droids won’t kill us
https://www.theverge.com/2024/1/4/24025535/google-ai-robot-constitution-autort-deepmind-three-laws
Images altered to trick machine vision can influence humans too
Can AI be Contained?
A Conversation And Q&A With Nyu Stern Professor Scott Galloway And Inflection Ai Co-Founder Mustafa Suleyman
https://www.sectionschool.com/events/live-events/can-ai-be-contained
Nikon, Sony and Canon fight AI fakes with new camera tech (these never work for long)
Digital signatures to provide way to tell real photos from deepfakes
https://asia.nikkei.com/Business/Technology/Nikon-Sony-and-Canon-fight-AI-fakes-with-new-camera-tech
Tracking AI, a new tool that monitors bias in AI chatbots, concluded that every chatbot other than open-source Mistral scores politically left. Also, apparently grok is actually the most left-wing of all major language models?
WhiteRabbitNeo-33B is a spectacular open source LLM that “knows” red and blue team Cybersecurity.
https://huggingface.co/whiterabbitneo/WhiteRabbitNeo-13B
Security for advanced LLMs doesn’t work like other IT security
A readable paper shows how different easy attacks can cause security issues: fine-tuning can eliminate guardrails, documents can contain malicious instructions & functions can contain bad code
https://arxiv.org/pdf/2312.14302.pdf
We could end up in a world where there are some AIs made by massive US firms with unambiguous rights to their training data, AI startups legally based in places like Japan that allow unlimited training, & then a ton of open source models that don’t really pay attention to rights.
AI Is Not Conscious
- The Association for Mathematical Consciousness Science (AMCS) calls for increased funding for consciousness and AI research, citing its absence in recent AI safety discussions.
- A comprehensive study by 19 researchers presents criteria to assess AI consciousness and concluded that no current AI systems are conscious.
- Opposite views exists on whether or not AI can indeed become consciousness at some point, but it’s probably safer to assume that it is possible than to disregard it.
https://jurgengravestein.substack.com/p/ai-is-not-conscious
Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
https://arxiv.org/abs/2305.04388
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models
Are current language models capable of deception and lie detection? We study this question by introducing a text-based game called Hoodwinked, inspired by Mafia and Among Us.
https://arxiv.org/abs/2308.01404
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure
https://arxiv.org/abs/2311.07590
Explainers
Stuff we figured out about AI in 2023
https://simonwillison.net/2023/Dec/31/ai-in-2023/
Nine Big Predictions for AI in 2024
AI will be a big year for open weights models, agents, hacks, new model architectures, legislation, legal cases and so much more. Welcome to battleground AI and the war for the future of technology.
https://danieljeffries.substack.com/p/nine-big-predictions-for-ai-in-2024
Newly discovered code and screenshots reveal Google is creating a paid tier “Bard Advanced” option powered by the upcoming Gemini Ultra. Ultra will need to live up to some major hype to entice users to switch from other paid options.
Additional Google Bard leaks:
-‘Motoko’, which will reportedly allow users to create custom bots (similar to GPT Store).
-“Power-ups”, “Gallery”, “Tasks”, “Sharing”, and more.
Google appears to be working on an ‘advanced’ version of Bard that you have to pay for
https://www.theverge.com/2024/1/4/24025270/google-bard-advanced-paid-subscription
OpenAI moves to shrink regulatory risk in EU around data privacy
https://techcrunch.com/2024/01/02/openai-dublin-data-controller/
Images
MidJourney
–v 6 is truly next level for fashion. boymolish
posted a flat lay of some clothing and accessories (original post linked below)
I turned it into a prompt and ran it through Midjourney
Not a perfect 1-to-1, but it’s pretty damn good.
iphone photo of a couple sitting on a black leather sofa. On the right is a stocky 30-year-old white man with a brown beard wearing a tight gray tee shirt with a baseball cap, one hand is placed on his cheek while the other is holding his cell phone. On the left is a thin 25-year-old Mexican woman wearing a floral blouse, looking at the man. The man is looking at his phone as the woman looks at him, clearly annoyed. They are in a small apartment building with fluorescent lighting and a window. Photo shot on iPhone –style raw –ar 7:5 –v 6.0
A prompt describing 3 characters in detail, including ethnicity, age, and 6 clothing items with 5 color assignments. It only took a few rerolls and variations for Midjourney v6 to nail it.
Prompting Midjourney with “phone photo” creates AI images that are almost impossible to identify as fake (until you look closely).
https://www.linkedin.com/feed/update/urn:li:activity:7148336778544939008/
ByteDance (parent of TikTok)
Don’t sleep on ByteDance!
DreamTuner: Single Image is Enough for Subject Driven Generation
https://dreamtuner-diffusion.github.io/
Scenario
There have been a bunch of Consistent AI Character videos over the last year, but I’ve never made one– Until now. Because this one actually works.
Tutorial on how to create an AI Influencer (basically click bait… but it does talk about how to make consistent characters with Scenario)
Snap, Inc.
Image Resoration
Researchers from UCLA and Snap just introduced a new AI approach called dual-pivot tuning.
The approach leverages personal photos to customize image restoration models, better preserving individual facial features.
https://arxiv.org/pdf/2312.17234.pdf
Microsoft
Microsoft recently introduced new Android, iOS, and iPadOS apps for Copilot, allowing users to access the powerful (and free) AI tool on the go. GPT-4, DALL-E 3, Voice Chat, and Vision are all available for free now without the need for a ChatGPT+ plan.
Microsoft’s Copilot app is now available on iOS
The Microsoft Copilot app lets you ask questions, draft text, and generate images using AI.
https://www.theverge.com/2023/12/29/24019288/microsoft-copilot-app-available-iphone-ipad-ai
Microsoft’s next Surface laptops will reportedly be its first true ‘AI PCs’
https://www.theverge.com/2023/12/28/24017890/microsoft-ai-surface-laptops-arm
Introducing a new Copilot key to kick off the year of AI-powered Windows PCs
Microsoft’s new Copilot key is the first big change to Windows keyboards in 30 years
https://www.theverge.com/2024/1/4/24023809/microsoft-copilot-key-keyboard-windows-laptops-pcs
News/Journalism
OpenAI Offers Publishers as Little as $1 Million a Year
https://www.theinformation.com/articles/openai-offers-publishers-as-little-as-1-million-a-year
Open AI GPT Store
OpenAI’s app store for GPTs will launch next week
https://techcrunch.com/2024/01/04/openais-app-store-for-gpts-will-launch-next-week/
Open AI GPT page
https://openai.com/blog/introducing-gpts
OpenAI just announced the GPT Store is cFoming next week, and it’s a massive opportunity for early adopters. Here’s a simple 4-step tutorial on how to make a GPT now to profit from this next wave:
GPT Builder Help Page
https://help.openai.com/en/articles/8770868-gpt-builder
Dunking on the concept.
New York Times Lawsuit POVs
More evidence that AIs are not going to be limited by the amount of human-created content available for them to train on. This is another paper suggesting that AIs training on AI-created data can achieve higher quality results than just using human-created data alone.
The NYT vs OpenAI Is Not Just a Legal Battle
There are three layers to this complex conflict and we must understand all three
https://thealgorithmicbridge.substack.com/p/the-nyt-vs-openai-is-not-just-a-legal
HuggingFace Flex – OpenSource NYT Training Tokens
This dataset provides a collection of over 35,000 tokens of text adhering to the New York Times writing style guide. The data is formatted in JSON and is suitable for various natural language processing tasks, text generation, style transfer, and more.
https://huggingface.co/datasets/TuringsSolutions/NYTWritingStyleGuide
OpenAI’s news publisher deals reportedly top out at $5 million a year
The ChatGPT company has been trying to get more news organizations to sign licensing deals to train AI models.
https://www.theverge.com/2024/1/4/24025409/openai-training-data-lowball-nyt-ai-copyright
Things are about to get a lot worse for Generative AI (obligatory grumpy Gary Marcus post)
https://garymarcus.substack.com/p/things-are-about-to-get-a-lot-worse
Perplexity
Jeff Bezos Bets on a Google Challenger Using AI to Try to Upend Internet Search
Perplexity, with a fraction of Google’s users, raised the largest sum by an internet search startup in recent years
“Google Search Killer” Perplexity Raises $73.6M
Excited to announce we’ve raised 73.6M$ at 520M$ valuation, led by IVP, along with our seed and Series A lead investors NEA, Elad Gil, Nat Friedman.
Search startup Perplexity AI valued at $520 mln in funding from Bezos, Nvidia
https://www.reuters.com/technology/perplexity-ai-valued-520-mln-funding-bezos-nvidia-2024-01-04/
Perplexity Raises Series B Funding Round
https://blog.perplexity.ai/blog/perplexity-raises-series-b-funding-round
Robotics/Embodiment
Mobile ALOHA demo videos of robots doing chores are getting a lot of traction. They are an interesting mix of remote operation plus learning. For example, an operator will remotely use the robot to cook eggs, and along the way, the robot learns to cook eggs (with untrained variables). That’s my layperson’s assessment. It is also instant Breeders in my head (if you like old school indie rock)
Mobile ALOHA can autonomously complete complex mobile manipulation tasks: cook and serve shrimp,call and take elevator, store a 3Ibs pot to a two-door cabinet.
More Mobile ALOHA examples:
Do laundry, self-charge, use a vacuum, water plants, load and unload a dishwasher, use a coffee machine, obtain drinks from the fridge and open a beer, open doors, play with pets, throw away trash, turn on/off a lamp.
Google DeepMind Robotics
Today, we’re announcing a suite of research advances that enable robots to make decisions faster as well as better understand and navigate their environments.
Tesla Automation Video
Giga Shanghai operates a 95% automated production line, enabling a cycle time of less than 40 seconds!
Bipedal robot with wheels that adapts to any terrain
Flowpilot allows you to set up autonomous driving on cars with just a smartphone.
The open-source driver assistant can run on Linux, Windows, and Android devices.
Innovative but also terrifying and super illegal at the same time!
Flowpilot is an open source driver assistance system built on top of openpilot, that can run on most windows/linux and android powered machines. It performs the functions of Adaptive Cruise Control (ACC), Automated Lane Centering (ALC), Forward Collision Warning (FCW), Lane Departure Warning (LDW) and Driver Monitoring (DM) for a growing variety of supported car makes, models, and model years maintained by the community.
https://github.com/flowdriveai/flowpilot
Humanoid Robots Are Getting To Work
Humanoids from Agility Robotics and seven other companies vie for jobs
https://spectrum.ieee.org/humanoid-robots
Roku
Roku introduces Pro Series streaming TVs, with an AI picture adjustment feature
Samsung
Samsung Galaxy AI Phone is teased for a Jan 17th.
Galaxy Unpacked 2024: Opening a New Era of Mobile AI
https://news.samsung.com/global/invitation-galaxy-unpacked-2024-opening-a-new-era-of-mobile-ai
Samsung announced its Galaxy launch event, while praising “Galaxy AI” — signaling major AI features coming.
Some of the Samsung Galaxy S24’s key AI features just leaked
Samsung also just revealed its new steam cleaning, AI-powered vacuum.
New AI integrations help the robot distinguish between rooms, detect stains, and identify floor surfaces for optimal cleaning.
Science/Medicine
“Aided by AI, New Catheter Design Prevents Bacterial Infections”
It’s fun to see tech optimists leaning into new ideas and thinking about the future.
“Imagine using your own local Intelligence Amplifier with all the sensors, including the Thermal Sensor, you pass by the imager before a shower every day and you get a “hot spot” report, some that you will share with your doctor.”
Researchers just developed DeWave- an AI system that can turn silent thoughts into text by decoding brain signals. The system achieved over 40% accuracy in translating verbs directly from neural signals, without the need for invasive implants
.https://twitter.com/rowancheung/status/1742417879570473065
DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation
https://arxiv.org/abs/2309.14030
The AI–quantum computing mash-up: will it revolutionize science?
https://www.nature.com/articles/d41586-023-04007-0
Google’s AI Reads Retinas to Prevent Blindness in Diabetics
Artificial brains built by Google can recognize cats in photos. Now they’re gaining a more serious kind of sight to help humans.
https://www.wired.com/2016/11/googles-ai-reads-retinas-prevent-blindness-diabetics/
AI-Enabled Microscopes Demonstrate the Potential for More Timely and Accurate Cancer Detection
https://www.diu.mil/latest/augmented-reality-microscope
Independent assessment of a deep learning system for lymph node metastasis detection on the Augmented Reality Microscope
https://www.sciencedirect.com/science/article/pii/S2153353922007362
An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis
https://www.nature.com/articles/s41591-019-0539-7.epdf
Microscope 2.0: An Augmented Reality Microscope with Real-time Artificial Intelligence Integration
https://arxiv.org/abs/1812.00825
Jellypipe Unveils AI Assistant for 3D Printing: Optimizing Material Selection and Pricing with GPT-4
Video
2023 was a breakout year for AI video. In January, there were no public text-to-video models. Now, there are dozens of video gen products and millions of users. A recap of the biggest developments + companies to watch
Examples of short clips using Runway
More examples of video-to-video (with what I assume is a latent diffusion model?) using Domo AI.
Barbie turned into anime
Turning live action Barbie back into dolls
AIWarper and their rendition of Kendomland from Barbie.
https://www.linkedin.com/feed/update/urn:li:activity:7141819283591806977/
This is cool use of video-to-video AI model to create sprite sheet for retro pixel art game…
Midjourney is reportedly starting training on video models this month, the founder revealed on Discord. The image generation platform also has plans to bring 3D and video generation to the platform shortly
Midjourney starts training video models in January, v6 updates coming soon
https://the-decoder.com/midjourney-starts-training-video-models-in-january-v6-updates-coming-soon/
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
https://huggingface.co/papers/2312.09767
Vision
Google’s Gemini Pro and OpenAI’s GPT-4V compete in visual capabilities (GPT4 has the edge for now)
https://the-decoder.com/googles-gemini-pro-and-openais-gpt-4v-compete-in-visual-capabilities/
Wearables
Rabbit teases Jan 9 launch with trailer
https://www.rabbit.tech/updates/reveal
Chips/Hardware
AI chips saw major advancements in Q4 2023. Here’s a list of 9 developments that caught my eye:
https://twitter.com/prateekvjoshi/status/1742985424640098398
LLM Training and Inference with Intel Gaudi 2 AI Accelerators
https://www.databricks.com/blog/llm-training-and-inference-intel-gaudi2-ai-accelerators
Google targets AI PC market with Chromebook Plus
https://www.digitimes.com/news/a20231226PD217/google-ai-pc-chromebook-microsoft-acer.html
Nvidia to launch slower version of its gaming chip in China to comply with U.S. export controls
Intel to spin out AI software firm with outside investment
https://finance.yahoo.com/news/intel-spins-ai-software-firm-133626026.html
Intel Gaudi AI Accelerator Gains 2x Performance Leap on GPT-3 with FP8 Software
The latest MLPerf results for Intel Gaudi2 and 4th Gen Intel Xeon demonstrate how Intel is raising the bar for AI performance with cost-effective and high-performance AI solutions.
Technical/Dev
The Random Transformer
Understand how transformers work by demystifying all the math behind them. In this blog post, we’ll do an end-to-end example of the math within a transformer model. The goal is to get a good understanding of how the model works.
https://osanseviero.github.io/hackerllama/blog/posts/random_transformer/
Sparse Mixtures of Experts has been one of the most impactful innovations in ML in the recent years, enabling breakthroughs such as OpenAI’s GPT-4, Google’s Switch Transformer, Mistral AI’s Mixtral-8x7B, and more. That said, I think that we’re just starting to see the full impact of sparse MoE on modern ML applications, and that new innovations in this domain will make it even more efficient, resulting in even bigger and more accurate models across domains.
https://www.linkedin.com/feed/update/urn:li:activity:7147789113738338304/
Escaping Plato’s Cave with Multimodal AI
https://pratapranade.substack.com/p/escaping-platos-cave-with-multimodal
The Role of the Ontologist in the Age of LLMs
https://ontologist.substack.com/p/the-role-of-the-ontologist-in-the
Fusion Knowledge Graphs and Language Models Through Compatible Generative Modeling
March of last year, I didn’t know how to code in python. I asked GPT-4 to implement the basic functionality of getting HTTP data from an API endpoint back in April/May of last year. Then, I put the code base on a secret Gist and kept pointing the GPT-4 Plugin to its own source code with new feature requests, improvements, and bug fixes. Directing its attention to the parts it was to work on next.
I repeated this process several thousand times over the course of the next 7 months to today. And now it’s a fully Asynchronous Quart ASGI service pushing 5,000 lines of code, and can do all this.
https://twitter.com/JD_2020/status/1742107114510643270
I wanted to see if ChatGPT could build me an entire game without me seeing a SINGLE LINE of code.I wanted to play it directly in my web browser. I didn’t want to copy/paste code to a code editor or GirHub.
https://twitter.com/JD_2020/status/1740918345170374896
This paper summarizes 32 techniques to mitigate hallucination in LLMs. Introduces a taxonomy categorizing methods like RAG, Knowledge Retrieval, CoVe, and more.
https://twitter.com/omarsar0/status/1742633831234994189
I recently gave a guest lecture (outline below) about LLMs for code and math for the “AI Foundation Models” course at Yale, and I’ve just made the slides and recordings publicly available.
https://twitter.com/AnsongNi/status/1742686969166225499
https://github.com/niansong1996/niansong1996.github.io/blob/master/files/Ansong_Yale_488_guest_lecture.pdf
Introducing Text-to-CAD
https://zoo.dev/blog/introducing-text-to-cad
Apple’s new M3 Silicon revolutionizes AI computing! Now you can build your own mini data center for Large Language Models and AI projects. Compact, powerful, and a game-changer in technology
https://twitter.com/benitoz/status/1743725009728942351
Bash One-Liners for LLMs
https://justine.lol/oneliners/
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
This study conducts a thorough evaluation of Gemini Pro’s efficacy in commonsense reasoning tasks, employing a diverse array of datasets that span both language-based and multimodal scenarios.
https://github.com/eternityyw/gemini-commonsense-evaluation
Noise-free Optimization in Early Training Steps for Image Super-Resolution
https://arxiv.org/abs/2312.17526v1
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
https://arxiv.org/abs/2312.13913v2
A case for AI alignment being difficult
https://www.lesswrong.com/posts/wnkGXcAq4DCgY8HqA/a-case-for-ai-alignment-being-difficult
Is “A Helpful Assistant” the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts
https://arxiv.org/abs/2311.10054
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
https://arxiv.org/abs/2401.01313
Credits/Sources
Most of these links come from just a few incredible sources. Please follow them:
- Robert Scoble: https://x.com/Scobleizer
- Ethan Mollick: https://www.linkedin.com/in/emollick/
- David Armano: https://www.linkedin.com/in/darmano/
- Alan Thompson: https://lifearchitect.ai/
- Theoretically Media: https://www.youtube.com/@TheoreticallyMedia
- The Rundown: https://www.therundown.ai/
- Borriss: https://twitter.com/_Borriss_
- Bilawal Sidhu: https://twitter.com/bilawalsidhu/
- TLDR: https://tldr.tech/ai
- Jeremiah Owyang: https://twitter.com/jowyang
- Wes Roth: https://www.youtube.com/@WesRoth
Previous Issues
- AI News: Week Ending 12/29/2023: https://ethanbholland.com/2024/01/07/ai-news-12-week-ending-12-29-2023-with-executive-summary-and-top-9-stories/
- AI News: Week Ending 12/22/2023: https://www.linkedin.com/pulse/ai-news-week-ending-12222023-executive-summary-top-links-holland-frx4e
- AI News: Week Ending 12/15/2023: https://www.linkedin.com/pulse/ai-news-week-ending-12152023-ethan-holland-elmee
- AI News: Week Ending 12/08/2023: https://www.linkedin.com/pulse/ai-news-week-ending-12082023-ethan-holland-zabve
- AI News: Week Ending 12/01/2023: https://www.linkedin.com/pulse/ai-news-week-ending-12012023-ethan-holland-rglve
- AI News: Week Ending 11/24/2023: https://www.linkedin.com/pulse/ai-news-week-ending-11242023-ethan-holland-jqvre
- AI News: Week Ending 11/17/2023: https://www.linkedin.com/pulse/ai-news-week-ending-11172023-ethan-holland-ad6le
- AI News: Week Ending 11/10/2023: https://www.linkedin.com/pulse/ai-news-week-ending-11102023-executive-summary-top-three-holland-yjdef/
- AI News: Week Ending 11/03/2023: https://www.linkedin.com/posts/ethanholland_aiebh-ai-generativeai-activity-7131396231678844928-3U8M
- AI News: Week Ending 10/27/2023: https://www.linkedin.com/posts/ethanholland_aiebh-ai-generativeai-activity-7127139342321356800-uBPD
- AI News: Week Ending 10/20/2023: https://www.linkedin.com/pulse/ai-news-week-ending-10202023-ethan-holland-eocpe
- AI News: Week Ending 10/13/2023: https://www.linkedin.com/pulse/ai-news-week-ending-10132023-executive-summary-ethan-holland-nq9bf
- AI News: Week Ending 10/6/2023: https://www.linkedin.com/pulse/ai-news-week-ending-1062023-ethan-holland-b6uhe





Leave a Reply