“iphone photo. an old fashioned card catalog stands in the middle of a vast desert outside of Scottsdale Arizona. A single drawer is open. light beams out of the drawer. –ar 5:3 –v 6.0 –style raw” Font is Attic Antique layered in Photoshop.

Resources mentioned during the panel

AI working group discussion links (will fill in afterward)

Experimentation/Impact

Platforms

Traffic

Tools/Partners

  • What tool is the most useful to you so far?
  • Do work with vendors who are using AI? Which ones? Would you recommend them?
  • What are some AI vendors you are testing out?

Internal Policies/Guidelines

  • Do you have a committee focused on AI?
  • Does your organization have an AI policy?

Glossary of Generative AI Terms

  • Interaction
    • Prompts:
      • A prompt is the most familiar way to interact with a generative model. You tell the chatbot what you’d like to do and interact with conversationally, like a person. The AI creates images or answers questions based on this conversation.
    • Context windows:
      • A context window is the memory available for prompts. Just like a person, there is only so much a language model can handle in one prompt. Google Gemini is famous for having such a large context window that you can paste five Harry Potter books into the prompt, and Gemini can discuss all of them.
    • RAG (Retrieval-Augmented Generation):
      • RAG is a work-around for context windows. Instead of loading information into the AI’s memory, the AI uses documents as references – much like we’d use a book. RAG is good for adding 1000s of documents to build a reference bot. However, RAG is notorious for hallucinating. As with everything AI, that changes and improves constantly.
      • Now that we’ve covered context windows and RAG, here are three recent headlines:
    • API (application programming interface):
    • Agents:
      • Agency is when AI can do things for you (like Googling an actress name or fetching the latest weather forecast). An agent is one step further, when AI given autonomy to take action on your behalf (“Alexa, book a reservation for three at Peak in Hudson Yards for Friday night”). A co-pilot is an assistant (like spell check or autofill). Exactly like the use of the word agent with people. If AI is able to look up prices of hotel rooms using Hilton.com, it’s an agent. If an AI can order DoorDash, it’s an agent. Agents can fix bugs in code. A co-pilot is when an AI agent works alongside you, helping write code or checking spelling. I’ve keep an archive of all the headlines about agents since Sept 2023.
    • Multimodality
      • Multimodal AI is when the model can interact with more than text in the prompt/context window, or via API. For example, “What’s in this image”. Or “What happens in this video?”. It can also imply the ability to create images, sounds, or videos. Multimodality is critical to robot embodiment and awareness.
    • Large Action Models
      • A large language model is trained on text. A large action model is trained on actions. The analogy would be an LLM reads everything it can and learns how language works. A LAM watches interactions and interfaces and learns how devices work. An action model would be able to learn how to use an iPhone for example. Apple’s Ferret project appears to be heading that direction. Rabbit’s R1 and Open Interpreter are other examples.
  • Types of Generation
    • Diffusion:
      • Most people know diffusion models as the tools we use to create AI images, videos, or sounds. We conversationally describe what we’d like, and the diffusion model creates it. Using images as an example, a diffusion model works by first learning to identify1000s of objects and scenes. Once the model can identify what’s in an image, it is trained to remove noise in incrementally harder steps. 10% noise, 20% noise, etc. Another computer (the challenger) sends back the image if it can guess that the first computer removed the noise. Eventually, the diffusion model is given 100% noise. “What would this image have looked like if it had been a cat?” And that’s diffusion generation. The best tutorial I know is How AI Image Generators Work (Stable Diffusion / Dall-E) – by Computerphile.
    • Latent Consistency
      • Latent consistency is where a model generates media in real time, while a reference media (often very low resolution) guides the engine. The reference media might be a stick figure, but the model generates a high resolution person, based on the prompt. Here’s a fun tutorial.
    • Upscaling
    • Inpainting
      • Because most generative AI uses random noise to seed outputs, its tough to refine or recreate an image. Inpainting allows portions of an image to be changed while retaining the rest of the painting. This is usually done by selecting an area to change, and then prompting the detail to vary. For example,
  • More AI Terms to Know

Apple Ferret:

Headlines about ethics and content

“Now would be the time for archivists and librarians to begin separating out post-2022 written and visual work from what was produced before. As I saw once on Twitter, “this is the K-T boundary of information.” Anything afterwards is increasingly unlikely to be made by humans.” / X – https://twitter.com/emollick/status/1766581955905089747

Chris Alsikkan ™ on X: “these all have thousands of likes and comments https://t.co/FKVk9lkYU6” / X – https://twitter.com/AlsikkanTV/status/1770214935282237878

“Peer review isn’t built to handle the flood of AI content, especially as not all of it will be obvious, and not all will be malicious (lots of scholars pay editors to help make their writing better, now they will use chat). The system, already straining, won’t be able to adjust.” / X – https://twitter.com/emollick/status/1768526138614186026

“The true threat of AI generated content is not that it will convince us to believe things that aren’t true, it’s that we won’t believe anything unless it reinforces what we already think is true, or we’ll just disengage completely because the truth seems impossible to find.” / X – https://twitter.com/EliotHiggins/status/1767162572912873789

“We are seeing the beginnings of a massive problem we will have with AI or suspected AI generated photos. The fact that they throw the veracity and reality of EVERYTHING into question, whether real or not.” / X – https://twitter.com/histoftech/status/1766959370426823068

“No comment from Kensington Palace tonight after at least 3 international pictures agencies refuse to distribute this morning’s photo of Kate and her children. Some of them (@AP ) have claimed “the source [the palace] has manipulated the image”. https://twitter.com/chrisshipitv/status/1766944328847364201

“11% of the best pieces of journalism from the last year used AI in some way. This is evidence of the power of AI as tool or co-intelligence – boosting the work of even great humans. Uses are likely boring, like transcription, which is the point. It does stuff you don’t want to. https://twitter.com/emollick/status/1767405132587970940

“As we increasingly rely on #LLMs for product recommendations and searches, can companies game these models to enhance the visibility of their products? Our latest work provides answers to this question & demonstrates that LLMs can be manipulated to boost product visibility!  https://twitter.com/hima_lakkaraju/status/1778834301658050733

Google’s new AI search results promotes sites pushing malware, scams
https://www.bleepingcomputer.com/news/google/googles-new-ai-search-results-promotes-sites-pushing-malware-scams/

Disruption/New Media

Gen-AI Search Engine Perplexity Has a Plan to Sell Ads – https://www.adweek.com/media/gen-ai-search-engine-perplexity-has-a-plan-to-sell-ads/

NYT to soon offer most articles via automated voice – https://www.axios.com/2024/04/02/exclusive-nyt-to-soon-offer-most-articles-via-automated-voice

Here’s How Google’s Generative AI for Newsrooms Product Will Work – https://www.bigtechnology.com/p/heres-how-googles-generative-ai-for 

Google Is Paying Publishers Five-Figure Sums to Test an Unreleased Gen AI Platform – In exchange for a five-figure sum, publishers must use the tool to publish 3 stories per day
https://www.adweek.com/media/google-paying-publishers-unreleased-gen-ai

“Search engine volume expected to drop by 25% and shift towards answer engines (perplexity) and chatbots (chatgpt), according to Gartner. https://twitter.com/AravSrinivas/status/1762157477779665090

Who makes money when AI reads the internet for us? – https://www.engadget.com/who-makes-money-when-ai-reads-the-internet-for-us-200246690.html

It’s the End of the Web as We Know It – WSJ – https://archive.ph/USxqu

Here’s how we’re working with journalists to create the newsrooms of the future with AI – Microsoft On the Issues – https://blogs.microsoft.com/on-the-issues/2024/02/05/journalism-news-generative-ai-democracy-forward/

Seward’s post said the Times plans to hire a machine learning engineer, a software engineer, a designer, and a couple of editors to round out the AI newsroom initiative. So far, the Times has posted job listings for an associate editorial director for AI initiatives and a senior design editor.
https://www.theverge.com/2024/1/30/24055718/new-york-times-generative-ai-machine-learning

We’re building a team at the NYTimes focused on prototyping uses of generative AI and other machine-learning techniques to help with reporting and how The Times is presented to readers. Jobs are posting this week for a machine-learning engineer, software engineer, designer, and editor. I’ll thread links here as they go up. Please spread the word, thanks!
https://www.threads.net/@zseward/post/C2upYZZOEVT

Transcripts on Apple Podcasts – Apple Podcasts for Creators – https://podcasters.apple.com/support/5316-transcripts-on-apple-podcasts

Multimodal AI News

Meta is training AI models to watch and understand video
“Meta announces MA-LMM Memory-Augmented Large Multimodal Model for Long-Term Video Understanding With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently.”https://twitter.com/_akhaliq/status/1777539936364662817

“Breaking down + understanding a long video I uploaded the entire NBA dunk contest from last night and asked which dunk had the highest score. Gemini 1.5 was incredibly able to find the specific perfect 50 dunk and details from just its long context video understanding!”
https://twitter.com/rowancheung/status/1759281003690873254

“I fed an *entire* biology textbook into Gemini 1.5 Pro. 491,002 tokens. I asked it 3 extremely specific questions, and it got each answer 100% correct. 1M token context windows are a game changer.”
https://twitter.com/mckaywrigley/status/1760146610745643347 

“Gemini can find a challenge message 99.7% of the time, even in a sea of 7M words” “The new version of Gemini holds 30,000 double-spaced pages in memory at the same time. Up to 1 hour of video.”  An English major in college could load their entire year of reading into Gemini and find trends across 70 works of literature.  Police departments could run loops of minute-long clips from 60 cameras at a time and conversationally ask the AI to tell them what’s happening in all the videos at once.  If you are a creative person, your mind is racing. “

“I used Anthropic Claude 3 Opus model to analyse all 63 Podcasts from Dwarkesh Patel’s channel and extracted useful book recommendations, career advices, interesting ideas, learning tips etc 1.49Million tokens processed, costs $23 ( luckily Anthropic Claude 3 has provided me…  https://twitter.com/arunprakashml/status/1774989084307624144

“It can “hear” -> Google’s Gemini 1.5 Pro can now process audio w/out needing a transcript “The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to refer to a written transcript.”  https://twitter.com/glenngabe/status/1777746267667365998 

Google’s Gemini 1.5 Pro can now hear – The Verge – https://www.theverge.com/2024/4/9/24124741/google-gemini-pro-imagen-updates-vertex 

“I uploaded two music tracks to Gemini 1.5 Pro: Black Sea by Drexciya, and Song of Scheherazade by Renaissance, and asked it to analyse them  https://twitter.com/sebkrier/status/1778386521319428167 

“Can Gemini 1.5 actually read all the Harry Potter books at once? I tried it. All the books have ~1M words (1.6M tokens). Gemini fits about 5.7 books out of 7. I used it to generate a graph of the characters and it CRUSHED it.  https://twitter.com/deedydas/status/1778621375592485076 

“I asked Gemini 1.5 Pro to analyze tech startup Rubrik’s S-1, the IPO document, and tell me if I should buy or sell. It read ~200k words, wrote a ~500 word memo in ~61s with NO hallucinations. Wall Street pays analysts $3000+ for a week of “specialized” work, and this cost $1.5!  https://twitter.com/deedydas/status/1775886093554180109

Top ten models as of April 26, 2024

ModelOrganization
GPT-4Open AI
ClaudeAnthropic
GeminiGoogle
BardGoogle
LlamaMeta
Comand R+Cohere
MistralMistral
RekaReka
QwenAlibaba
ZephrHuggingFace
StarlingNexusFlow
YiO1
GrokX
https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

via https://twitter.com/emollick/status/1779841429960851957/photo/1

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading