Local Media Consortium AI Panel: Supplemental Resources

Local Media Consortium AI Panel: Supplemental Resources

April 18, 2024

“iphone photo. an old fashioned card catalog stands in the middle of a vast desert outside of Scottsdale Arizona. A single drawer is open. light beams out of the drawer. –ar 5:3 –v 6.0 –style raw” Font is Attic Antique layered in Photoshop.

Resources mentioned during the panel

Text-to-Audio
- Trinity Audio
- ElevenLabs
SEO Tools
Video Production
- Waymark
- Pika
Video Translation and Clones/Avatars
- HeyGen
Real Estate Listings
- United Robots
Image Creation
- Midjourney
- Magnific.AI
- Krea
- Dall-e
Music Creation
- Udio
- Suno

AI working group discussion links (will fill in afterward)

Experimentation/Impact

What’s the coolest AI example you’ve seen?
What’s the one thing that scares you most about AI?
- What happens if the consumer decides they don’t care if everything is AI generated?
- What is the definition of journalistic standards?
- Loss of accountability for politicians
- Blurred lines of reality for a new generation
- News not being labeled or disclosed properly
- Smaller media outlets might not have a seat at the negotiating table
- AI content flooding the algorithms and a lack of control of your media diet
- Google’s generative search results driving down traffic
What do you think is going to be most disruptive this year?
What are your favorite resources for keeping informed?
- Robert Scoble: https://x.com/Scobleizer
- Ethan Mollick: https://www.linkedin.com/in/emollick/
- Alan Thompson: https://lifearchitect.ai/
- Theoretically Media: https://www.youtube.com/@TheoreticallyMedia
- The Rundown: https://www.therundown.ai/
- Bilawal Sidhu: https://twitter.com/bilawalsidhu/
- TLDR: https://tldr.tech/ai
- Jeremiah Owyang: https://twitter.com/jowyang
- Nick St. Pierre: https://twitter.com/nickfloats
- Dr. Jim Fan: https://twitter.com/DrJimFan
- All About AI: https://www.youtube.com/@AllAboutAI
- Marshall Kirkpatrick: https://aitimetoimpact.com/
- AI News (Smol Talk): https://buttondown.email/ainews/archive

Platforms

Anyone using RAG models to query your own content for new products/services?
Have you tried using models other than Chat GPT-4 for your daily use (Gemini, Claude, etc)? Any stand outs?
What are your predictions for the Apple Siri update in June?
- “In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities.” (see image below)
  - https://twitter.com/_akhaliq/status/1777542957383446691
  - https://twitter.com/_philschmid/status/1776240697219195221

Traffic

Have you seen any changes in site traffic in 2024?
- Gartner Predicts Search Engine Volume Will Drop 25% by 2026, Due to AI Chatbots and Other Virtual Agents
  - https://www.gartner.com/en/newsroom/press-releases/2024-02-19-gartner-predicts-search-engine-volume-will-drop-25-percent-by-2026-due-to-ai-chatbots-and-other-virtual-agents
What are your thoughts on Google potentially putting AI search results behind a paywall?
- Google might make SGE paid, not working on ad-free Search
  - https://9to5google.com/2024/04/03/google-ad-free-search/
- Google looks to AI paywall option, claims report
  - https://www.bbc.com/news/business-68727857

Tools/Partners

What tool is the most useful to you so far?
Do work with vendors who are using AI? Which ones? Would you recommend them?
What are some AI vendors you are testing out?

Internal Policies/Guidelines

Do you have a committee focused on AI?
Does your organization have an AI policy?

Glossary of Generative AI Terms

Interaction
- Prompts:
  - A prompt is the most familiar way to interact with a generative model. You tell the chatbot what you’d like to do and interact with conversationally, like a person. The AI creates images or answers questions based on this conversation.
- Context windows:
  - A context window is the memory available for prompts. Just like a person, there is only so much a language model can handle in one prompt. Google Gemini is famous for having such a large context window that you can paste five Harry Potter books into the prompt, and Gemini can discuss all of them.
- RAG (Retrieval-Augmented Generation):
  - RAG is a work-around for context windows. Instead of loading information into the AI’s memory, the AI uses documents as references – much like we’d use a book. RAG is good for adding 1000s of documents to build a reference bot. However, RAG is notorious for hallucinating. As with everything AI, that changes and improves constantly.
  - Now that we’ve covered context windows and RAG, here are three recent headlines:
    - Google’s new technique gives LLMs infinite context | VentureBeat – https://venturebeat.com/ai/googles-new-technique-gives-llms-infinite-context/
    - “Meta announces Megalodon Efficient LLM Pretraining and Inference with Unlimited Context Length The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and https://twitter.com/_akhaliq/status/1780083267888107546
    - GroundX, an advanced RAG API to turn any document into LLM-ready data. You can use any file with tables, images, forms, diagrams, columns, or any other visual components and have GroundX process it, cure it, summarize it, chunk it, and hand it over to the LLM.
    - https://twitter.com/svpino/status/1780571442096087224
- API (application programming interface):
  - Instead of a person manually interacting with a chat bot, an API allows third party systems and applications to talk to an AI. AI companies offer documented, structured methods for systems to talk to them without using the user interface. APIs lets third parties build custom integrations.
- Agents:
  - Agency is when AI can do things for you (like Googling an actress name or fetching the latest weather forecast). An agent is one step further, when AI given autonomy to take action on your behalf (“Alexa, book a reservation for three at Peak in Hudson Yards for Friday night”). A co-pilot is an assistant (like spell check or autofill). Exactly like the use of the word agent with people. If AI is able to look up prices of hotel rooms using Hilton.com, it’s an agent. If an AI can order DoorDash, it’s an agent. Agents can fix bugs in code. A co-pilot is when an AI agent works alongside you, helping write code or checking spelling. I’ve keep an archive of all the headlines about agents since Sept 2023.
- Multimodality
  - Multimodal AI is when the model can interact with more than text in the prompt/context window, or via API. For example, “What’s in this image”. Or “What happens in this video?”. It can also imply the ability to create images, sounds, or videos. Multimodality is critical to robot embodiment and awareness.
- Large Action Models
  - A large language model is trained on text. A large action model is trained on actions. The analogy would be an LLM reads everything it can and learns how language works. A LAM watches interactions and interfaces and learns how devices work. An action model would be able to learn how to use an iPhone for example. Apple’s Ferret project appears to be heading that direction. Rabbit’s R1 and Open Interpreter are other examples.
Types of Generation
- Diffusion:
  - Most people know diffusion models as the tools we use to create AI images, videos, or sounds. We conversationally describe what we’d like, and the diffusion model creates it. Using images as an example, a diffusion model works by first learning to identify1000s of objects and scenes. Once the model can identify what’s in an image, it is trained to remove noise in incrementally harder steps. 10% noise, 20% noise, etc. Another computer (the challenger) sends back the image if it can guess that the first computer removed the noise. Eventually, the diffusion model is given 100% noise. “What would this image have looked like if it had been a cat?” And that’s diffusion generation. The best tutorial I know is How AI Image Generators Work (Stable Diffusion / Dall-E) – by Computerphile.
- Latent Consistency
  - Latent consistency is where a model generates media in real time, while a reference media (often very low resolution) guides the engine. The reference media might be a stick figure, but the model generates a high resolution person, based on the prompt. Here’s a fun tutorial.
- Upscaling
  - Upscaling uses AI to fill detail and resolution to an existing image (or video/audio). The most popular upscaler is Magnific and this walk-through demonstrates.
- Inpainting
  - Because most generative AI uses random noise to seed outputs, its tough to refine or recreate an image. Inpainting allows portions of an image to be changed while retaining the rest of the painting. This is usually done by selecting an area to change, and then prompting the detail to vary. For example,
More AI Terms to Know
- Language Models
  - Proprietary (closed data)
    - OpenAI: GPT-4
    - Anthropic: Claude
    - Google: Gemini
- Open Source
  - Meta: Llama
  - Mistral
  - Cohere: Command R+
  - Databricks
  - Reka
  - Gemma
  - HuggingFace
  - Alibaba: Qwen
  - Phind
  - X: Grok
- Local
  - Local models run on your device and by nature need to be small and highly optimized. There is an entire category of models running locally.
- AR/VR
  - Robot training labs: https://www.ted.com/talks/jim_fan_the_next_grand_challenge_for_ai?language=en
  - Embodiment: https://twitter.com/drjimfan/status/1786429467537088741?s=46
- AGI
  - Artificial General Intelligence, in a nutshell, is when artificial intelligence is able to beat humans at everything (including embodying physical forms and completing physical tasks). It’s usually a thought catalyst for predictions, like when AGI will occur. 10 years? 25 years? 100? AGI is an event horizon that is tough to define, tough to imagine, and tough to predict. OpenAI defined AGI in its charter as “highly autonomous systems that outperform humans at most economically valuable work”. OpenAI has a section of its website dedicated to AGI. Google’s DeepMind published my favorite report on the five levels of artificial intelligence on the way to AGI (see also here).

Apple Ferret:

Headlines about ethics and content

“Now would be the time for archivists and librarians to begin separating out post-2022 written and visual work from what was produced before. As I saw once on Twitter, “this is the K-T boundary of information.” Anything afterwards is increasingly unlikely to be made by humans.” / X – https://twitter.com/emollick/status/1766581955905089747

Chris Alsikkan ™ on X: “these all have thousands of likes and comments https://t.co/FKVk9lkYU6” / X – https://twitter.com/AlsikkanTV/status/1770214935282237878

“Peer review isn’t built to handle the flood of AI content, especially as not all of it will be obvious, and not all will be malicious (lots of scholars pay editors to help make their writing better, now they will use chat). The system, already straining, won’t be able to adjust.” / X – https://twitter.com/emollick/status/1768526138614186026

“The true threat of AI generated content is not that it will convince us to believe things that aren’t true, it’s that we won’t believe anything unless it reinforces what we already think is true, or we’ll just disengage completely because the truth seems impossible to find.” / X – https://twitter.com/EliotHiggins/status/1767162572912873789

“We are seeing the beginnings of a massive problem we will have with AI or suspected AI generated photos. The fact that they throw the veracity and reality of EVERYTHING into question, whether real or not.” / X – https://twitter.com/histoftech/status/1766959370426823068

“No comment from Kensington Palace tonight after at least 3 international pictures agencies refuse to distribute this morning’s photo of Kate and her children. Some of them (@AP ) have claimed “the source [the palace] has manipulated the image”. https://twitter.com/chrisshipitv/status/1766944328847364201

“11% of the best pieces of journalism from the last year used AI in some way. This is evidence of the power of AI as tool or co-intelligence – boosting the work of even great humans. Uses are likely boring, like transcription, which is the point. It does stuff you don’t want to. https://twitter.com/emollick/status/1767405132587970940

“As we increasingly rely on #LLMs for product recommendations and searches, can companies game these models to enhance the visibility of their products? Our latest work provides answers to this question & demonstrates that LLMs can be manipulated to boost product visibility! https://twitter.com/hima_lakkaraju/status/1778834301658050733

Google’s new AI search results promotes sites pushing malware, scams
https://www.bleepingcomputer.com/news/google/googles-new-ai-search-results-promotes-sites-pushing-malware-scams/

Disruption/New Media

Gen-AI Search Engine Perplexity Has a Plan to Sell Ads – https://www.adweek.com/media/gen-ai-search-engine-perplexity-has-a-plan-to-sell-ads/

NYT to soon offer most articles via automated voice – https://www.axios.com/2024/04/02/exclusive-nyt-to-soon-offer-most-articles-via-automated-voice

Here’s How Google’s Generative AI for Newsrooms Product Will Work – https://www.bigtechnology.com/p/heres-how-googles-generative-ai-for

Google Is Paying Publishers Five-Figure Sums to Test an Unreleased Gen AI Platform – In exchange for a five-figure sum, publishers must use the tool to publish 3 stories per day
https://www.adweek.com/media/google-paying-publishers-unreleased-gen-ai

“Search engine volume expected to drop by 25% and shift towards answer engines (perplexity) and chatbots (chatgpt), according to Gartner. https://twitter.com/AravSrinivas/status/1762157477779665090

Who makes money when AI reads the internet for us? – https://www.engadget.com/who-makes-money-when-ai-reads-the-internet-for-us-200246690.html

It’s the End of the Web as We Know It – WSJ – https://archive.ph/USxqu

Here’s how we’re working with journalists to create the newsrooms of the future with AI – Microsoft On the Issues – https://blogs.microsoft.com/on-the-issues/2024/02/05/journalism-news-generative-ai-democracy-forward/

Seward’s post said the Times plans to hire a machine learning engineer, a software engineer, a designer, and a couple of editors to round out the AI newsroom initiative. So far, the Times has posted job listings for an associate editorial director for AI initiatives and a senior design editor.
https://www.theverge.com/2024/1/30/24055718/new-york-times-generative-ai-machine-learning

We’re building a team at the NYTimes focused on prototyping uses of generative AI and other machine-learning techniques to help with reporting and how The Times is presented to readers. Jobs are posting this week for a machine-learning engineer, software engineer, designer, and editor. I’ll thread links here as they go up. Please spread the word, thanks!
https://www.threads.net/@zseward/post/C2upYZZOEVT

Transcripts on Apple Podcasts – Apple Podcasts for Creators – https://podcasters.apple.com/support/5316-transcripts-on-apple-podcasts

Multimodal AI News

Meta is training AI models to watch and understand video
“Meta announces MA-LMM Memory-Augmented Large Multimodal Model for Long-Term Video Understanding With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently.”https://twitter.com/_akhaliq/status/1777539936364662817

“Breaking down + understanding a long video I uploaded the entire NBA dunk contest from last night and asked which dunk had the highest score. Gemini 1.5 was incredibly able to find the specific perfect 50 dunk and details from just its long context video understanding!”
https://twitter.com/rowancheung/status/1759281003690873254

“I fed an *entire* biology textbook into Gemini 1.5 Pro. 491,002 tokens. I asked it 3 extremely specific questions, and it got each answer 100% correct. 1M token context windows are a game changer.”
https://twitter.com/mckaywrigley/status/1760146610745643347

“Gemini can find a challenge message 99.7% of the time, even in a sea of 7M words” “The new version of Gemini holds 30,000 double-spaced pages in memory at the same time. Up to 1 hour of video.” An English major in college could load their entire year of reading into Gemini and find trends across 70 works of literature. Police departments could run loops of minute-long clips from 60 cameras at a time and conversationally ask the AI to tell them what’s happening in all the videos at once. If you are a creative person, your mind is racing. “

“I used Anthropic Claude 3 Opus model to analyse all 63 Podcasts from Dwarkesh Patel’s channel and extracted useful book recommendations, career advices, interesting ideas, learning tips etc 1.49Million tokens processed, costs $23 ( luckily Anthropic Claude 3 has provided me… https://twitter.com/arunprakashml/status/1774989084307624144

“It can “hear” -> Google’s Gemini 1.5 Pro can now process audio w/out needing a transcript “The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to refer to a written transcript.” https://twitter.com/glenngabe/status/1777746267667365998

Google’s Gemini 1.5 Pro can now hear – The Verge – https://www.theverge.com/2024/4/9/24124741/google-gemini-pro-imagen-updates-vertex

“I uploaded two music tracks to Gemini 1.5 Pro: Black Sea by Drexciya, and Song of Scheherazade by Renaissance, and asked it to analyse them https://twitter.com/sebkrier/status/1778386521319428167

“Can Gemini 1.5 actually read all the Harry Potter books at once? I tried it. All the books have ~1M words (1.6M tokens). Gemini fits about 5.7 books out of 7. I used it to generate a graph of the characters and it CRUSHED it. https://twitter.com/deedydas/status/1778621375592485076

“I asked Gemini 1.5 Pro to analyze tech startup Rubrik’s S-1, the IPO document, and tell me if I should buy or sell. It read ~200k words, wrote a ~500 word memo in ~61s with NO hallucinations. Wall Street pays analysts $3000+ for a week of “specialized” work, and this cost $1.5! https://twitter.com/deedydas/status/1775886093554180109

Top ten models as of April 26, 2024

Model	Organization
GPT-4	Open AI
Claude	Anthropic
Gemini	Google
Bard	Google
Llama	Meta
Comand R+	Cohere
Mistral	Mistral
Reka	Reka
Qwen	Alibaba
Zephr	HuggingFace
Starling	NexusFlow
Yi	O1
Grok	X