18 of 77

October 2023: “Start posting publicly or it doesn’t count”

Made a commitment to share my efforts… even if I looked foolish or failed.

19 of 77

Launched a weekly AI news blog

Took 54 hours of work to publish one week of AI news

Information mining, clicking, opening, sorting…

https://ethanbholland.com/category/this-week-in-ai/

20 of 77

Struggled to keep up.

Was six weeks behind for 10 months.

21 of 77

Used Claude and GPT to build:

26 Python scripts

Google Apps scripts

3 Chrome extensions

CSV automation

WordPress integration

Adobe Photoshop scripts

22 of 77

My blog takes the time needed to read the headlines, assign categories, and write the summaries.

Can’t shortcut the reading and I don’t want to shortcut the summarizing (it’s for me to process, not for AI to summarize)

BUT I saved over 5000 mouse clicks per week!

23 of 77

Every week I manually review, organize and publish 400-700 AI headlines into 56 categories.

24 of 77

98 weeks later I’ve organized 36,100 links

25 of 77

I’m taking my first break in 98 weeks. Restarting next week.

26 of 77

Reading 36,000 links has given me a “Spidey Sense” of where things are going.

27 of 77

Any time we discuss AI, I recommend three guiding principles:

28 of 77

1) THE SUM is greater than the parts.

Think about how things we see today will connect in the future.

2008: iPhone 3G adds GPS to apps

2025: Uber = $200 billion market cap

29 of 77

2) When discussing AI align your timetable with your audience:

Short term

Mid term

Long term

Otherwise, it’s not the same conversation.

30 of 77

3) If possible, use the tool name or technology, not simply “AI”

GPT-5, Gemini, Nano Banana, Opus 4.1, Kimi, Grok 4, Llama, Mistral, ElevenLabs, HeyGen, Flux, MidJourney, Mistral, Qwen, Sora, Deepseek, Runway, Kling, Viggle, NeRFs, LivePortrait, Gaussian Splats, Diffusion…

31 of 77

Four Themes For Today

Generative AI
Multimodal AI
Agents
The Future of Browsers

32 of 77

Term One: Basic Generative AI

Traditional Chat (most familiar)
ALSO Image generation
ALSO Video creation

“Grounded diffusion”

33 of 77

Term TWO: Multimodality

Interacting with AI using audio, imagery, or video instead of chatting

34 of 77

Multimodal AI is most familiar for it’s ability to:

Track objects athletes, vehicles, targets
Analyze medical imagery
Dub voices and audio processing
Create video clones
Index the contents of images, videos, or documents

35 of 77

Truly multimodal AI understands the sound waves of my voice as audio and does not need convert them into text. That’s what most people don’t realize…just as you hear and understand sound, AI understands sound.

36 of 77

Actual interaction with Google DeepMind’s multimodal model “Flamingo

APRIL 2022 (3.5 YEARS AGO)

How many people are in this photo

“There at least five people in this photo”

How many mirrors are in this photo?

“There are at least two mirrors”

What is the person on the scale doing?

“He is looking at the scale”

Where is Obama’s foot positioned?

“Obama’s foot is on the right side of the scale”

What happens as a result?

“The scale shows a higher weight”

Is the person on the scale aware of it?

“He is not aware of it.”

Context Understanding

37 of 77

Term THREE: Agency

AI taking actions on your behalf

Language is an instructions interface

38 of 77

Like multimodality, agency helps us break away from thinking of AI as simply “chat bots”…or writing tools.

39 of 77

2019: 3 seconds

2025: 57 minutes

“This is the worst it will ever be”

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

40 of 77

Let’s do a six-month recap of multimodal and agentic AI�

Mid-February to mid-August (when I took a break - it’s plenty)

For efficiency… I’m only reading the Mizzou Gold highlights.

#FDB719 REPRESENT

41 of 77

Multimodality

42 of 77

Multimodality

March 7, 2025

Microsoft’s Phi-4 Multimodal Takes Top Spot in Speech Recognition (OpenSource)
Sesame Launches Voice AI That Rivals Human Interaction

March 21, 2025

OpenAI has released three advanced audio models, which include text-to-speech and speech-to-text with better transcription

March 28. 2025

Researchers developed a model that can identify endometrial cancer in tissue samples with a 99.2% accuracy rate. This is significantly better than the current best-in-class methods which are at 80%.

April 11, 2025

Google has integrated vision into its AI tools, finally catching up to OpenAI’s multimodal product.
Audio leader ElevenLabs has adopted Anthropic’s model context protocol (MCP) to enable locally running servers to create voice agents to make outbound calls and clone voices.

43 of 77

April 18, 2025

OpenAI o3 is getting rave reviews. In particular, it can combine every single ChatGPT tool, including web search, Python coding, visual analysis, and image generation. It’s exceptional with math and coding, and has a more natural conversational style.
Google announced a model that is going to attempt to talk with dolphins.
Anthropic is coming out with a voice feature to compete with OpenAI.
Meta released two open-weight vision models.
AI can now find a location using only a single photo with no attached metadata.

�

“This picture was taken in West Ocean City, Maryland, looking east toward the Ocean City Inlet.

In the foreground you can see the houses on the bay side of West Ocean City,
Across the inlet is Assateague Island (the long strip of sand with greenery).
In the distance, on the left side of the image, you can clearly see the Ocean City Boardwalk area with its amusement rides and Ferris wheel.
The jetties mark the entrance to the Ocean City Inlet,”

44 of 77

April 25, 2025

Microsoft added the ability to “see” to its web browser. It’s also launched computer use and web browsing within the operating system.

May 9, 2025

OpenAI’s o3 analyzed a Harvard Business School case study PDF, extracted scattered financial data and built a business model comparable to an MBA student.

May 16, 2025

Amazon launched Nova Sonic, that can have natural back-and-forth conversations in real time. Sounds like Alexa wants to drink Siri’s milkshake.

June 6, 2025

The FDA approved the first AI tool to predict breast cancer risk from mammograms.

June 14, 2025

Meta released a robot training system that learns how the physical world works by watching videos, similar to children developing intuitions by watching the world.
Instagram is testing features that convert static photos into a 3-D images.
OpenAI updated their voice mode to be a lot more expressive. You can ask the voice to sound nervous, excited, or jittery, and the new voice features can capture those emotions.

45 of 77

June 20, 2025

OpenAI rolled out a record mode for ChatGPT and can capture meetings and voice notes.

June 27, 2025

Amazon Ring launched video descriptions where your security camera tells you what’s happening outside so you don’t need to look at the camera. It could email you what happened or describe it out loud as it happens. “A guy drove up in a UPS truck and put a box on your front steps.” “A woman is at the door with a dog on a leash and keeps looking into the window.” Etc.
A few months ago, Google DeepMind released a tool called VideoPrism that is suddenly gaining attention. VideoPrism can watch a video and provide deep context and details (like the Obama photo) about every frame, a task that would be virtually impossible for a human to accomplish. Imagine the potential for content tagging and search.

https://research.google/blog/videoprism-a-foundational-visual-encoder-for-video-understanding/ <- Quick look at it in action

46 of 77

June 27, 2025 (continued)

Alibaba developed a system that can identify gastric cancer from standard CT scans with greater accuracy than doctors. China has already deployed the system and screened over 78,000 patients!
Google also launched an update of their Gemma model which is designed to run locally on smart phones and edge devices. Gemma provides multimodal AI capabilities locally, without any Internet connection. In addition to being able to understand text, images, audio, and video, the model can handle translation and is open source.
Google DeepMind released an AI system called AlphaGenome that predicts how genetic variations affect biological processes and can analyze DNA sequences up to 1 million letters long.

July 4, 2025

Researchers have developed an artificial intelligence system that can detect Parkinson’s disease by analyzing short videos of people smiling, achieving 88% accuracy.
Mayo Clinic researchers have created an artificial intelligence tool that can identify nine different types of dementia from a single brain scan.

47 of 77

July 4, 2025

Microsoft researchers developed an AI diagnostic system that correctly solved 85% of complex medical cases from the New England Journal of Medicine compared to just 20% accuracy achieved by experienced physicians.
The Olympics are implementing artificial intelligence across quite a few areas of the coming winter Olympics.

Broadcast partners during the last Olympic games used AI to transform 11,000 hours of footage into 97,000 highlight clips.
Alibaba and Omega are using depth and segmentation (two great terms to know!) to track athlete movements and give biomechanical analysis.

ElevenLabs released a new tool that creates custom synthetic voices that are extremely dynamic and might be putting the voiceover professional out of work. These are worth checking out as they are very dynamic and have the ability to be guided to match the timbre and enthusiasm of the script.
Andrej Karpathy a great point that local models are making significant inroads and will soon be the bridge between devices in the cloud, small models can process text, images, and audio offline, and then hook into external tools as needed.

48 of 77

July 4, 2025 (continued)

Nvidia released an 8 billion parameter vision model designed for document processing and character recognition. This can extract and understand information from complex documents, PDFs, images, tables, charts, formulas, and diagrams. This will lead to automating workflows in finance healthcare and law firms. These models will also help with invoice processing and compliance.
Google released a hurricane forecasting model for with long-term forecasting… up to two weeks ahead of a storm.

July 18, 2025

ChatGPT is on track to edit and understand Excel and PowerPoint within chats, without opening Microsoft Office.

July 25, 2025

Meta released a dataset of 4,000+ videos of face-to-face conversations and over 65,000 social interactions with full annotations to help models learn and emulate behavior.
Google DeepMind released an AI tool called Aeneas that will help historians interpret fragments of Latin inscriptions.

August 8, 2025

Google released a world simulation model that creates entire explorable 3D worlds with just a prompt.

50 of 77

Agents

March 7, 2025

OpenAI Plans Premium AI “Agents” with Monthly Fees Up to $20,000
Amazon Developing AI Reasoning Model Under Nova Brand
AI financial analyst Endex emerges from stealth with OpenAI partnership
Adam Silverman Predicts UI Design Will Shift From Human to Agent Experience

March 14, 2025

OpenAI is pushing the pace of agent adoption (AIs that can take action on your behalf) with the addition of commands that can help developers build agents that can search the web, look through local files, and even control your computer.
Legal AI provider, Harvey, has built agents that are outperforming humans at law firms.

https://x.com/harvey/status/1899491666429632907 <-Watch this later!

Another breakthrough product comes out of China called Manus, with building hype coming off of the previous week’s announcement. This is a product that is built on top of Anthropic’s Claude that has 29 additional agent tools, like browser control. https://manus.im/
The Pentagon signed a deal to use AI agents for military planning.

51 of 77

March 21 2025

Nvidia released two open models that allow AI agents to reason.
Anthropic gave Claude the ability to search the web
A new study came out that shows that AI agents are doubling their abilities every seven months.

March 28, 2025

A few months ago, Anthropic released a structured method of allowing AI models to talk to third-party systems. This is called the “Model Context Protocol” or MCP for short. This week OpenAI announced that it would adopt Anthropic’s standard.
Researchers at Harvard conducted a randomized trial with 776 Procter & Gamble employees and found that a single person can work as well as a two-person team if given AI tools.
A trend in the last two weeks is developers releasing text files with detailed language model instructions in text.
Lockheed Martin and Google announced an integration of Google’s AI tools into Lockheed’s factory systems.

52 of 77

April 4, 2025

Amazon released a model that can perform tasks using web browsers with over 90% accuracy. Alexa will be able crawl the internet and take actions for you.
Automation tool Zapier has now integrated over 8000 apps with the model context protocol. It may take a few more months for this to trickle down to lay people, but I think automation of tasks is going to be a welcome discovery once it becomes demystified.

April 11, 2025

Google launched their own open agent protocol that allows agents from different vendors to communicate. At the same time, Google announced they will adopt and support Anthropic’s model context protocol (MCP) agent structure.
Google released a Deep Research competitor to OpenAI and Perplexity. Thanks to Google’s deep integration with search, Google’s Deep Research is already outperforming PhD-level researchers when given access to the web.
Adobe plans to integrate AI agents throughout its Creative Cloud product lineup to automate the mundane parts of creative operations as well as improve context-aware search within videos and photos.

53 of 77

April 25, 2025

Anthropic predicts fully automated AI employees within the next 12 months.
Along those lines, Microsoft predicts that 2025 will be the year that “frontier firms” emerge and AI employees begin to lead operational tasks.
Anthropic launched a research feature - which means Google, OpenAI, Perplexity, Claude, and DeepSeek all now have robust research capabilities. This is also a harbinger for the end of the Internet as we know it.
ElevenLabs, the audio AI company, has developed a method to allow voice agents to transfer calls to other voice agents. That’s a spectacular development and it implies that different specialists agents could take on different tasks and transfer those tasks, just like humans do now. I would say call centers are soon to be an extinct species.
A new benchmark has shown that OpenAI’s O3 model can outperform 94 percent of expert virologists.

May 02, 2025

Yelp began testing AI-powered voice agents to help restaurants and businesses manage incoming phone calls.

54 of 77

May 9, 2025

Google Gemini’s 2.5 Pro model became the first model to achieve the number one rank across all text, vision, and web dev benchmarks simultaneously. The model can watch videos and convert them into functional web applications. Gemini has such a large memory that it can review 50,000 lines of code (!) at once within a chat window with no problem.
Former Google CEO Eric Schmitt has a startup called FutureHouse which has built five specialized artificial intelligence agents to help scientists navigate large amounts of data and research.

May 16, 2025

Google released AlphaEvolve, a reasoning agent that found the best human solution to 70% of the toughest known logic problems. It went beyond humans, and for 20% of the problems (!), it found a novel solution that no one had thought of before and improved on the best known solution. Google is using this model to self improve its workflow and has been able to achieve 23% efficiency boosts internally.
OpenAI’s latest models are exceeding human doctors across standard medical knowledge tests. Six months ago, “AI plus humans” were the best at the test but now “AI without humans” is the best.

55 of 77

May 23, 2025

Microsoft launched a science agent which discovered a new material in a few hours. Not only was the new material discovered, but scientists were able to synthesize the compound in the laboratory.

May 30, 2025

OpenAI went on the record that they plan to launch a super assistant in the first half of 2025.
Doctors at Harvard and Stanford benchmarked OpenAI’s o1 preview model and reconfirmed it displayed superhuman diagnostics and reasoning in medicine.

June 20, 2025

Microsoft released an upgrade to its assistant that can see and analyze what’s on your computer screen to give real-time guidance or directions to troubleshoot or modify settings and use software.
The UK deployed a system called Extract that converts complex planning documents, including handwritten notes and blurry maps, into digital data within 40 seconds.
Japan’s largest bank partnered with AI company Sakana to automate banking document creation.

56 of 77

June 20, 2025 (continued)

Russia’s Sberbank announced plans to create their own AI system with advanced reasoning capabilities.
Salesforce released a marketing suite of tools that can handle routine marketing tasks like building audience segments, writing email copy, and managing ad performance. These agents can work independently or assist marketing managers, and take directions like building a campaign or creating personalized offers.
A model specifically trained for medical use can answer questions as well as doctors on healthcare benchmarks. What’s really wild about this model is that it is 70% smaller than most models and can run on a laptop locally. We’re getting to the point where everyone will be able to have an expert in their pocket without even being connected to the cloud.

57 of 77

June 27, 2025

Legal AI software company Harvey released a demonstration of their latest workflow tool, which essentially chopped the bottom out of legal work and would terrify me if I were a student in law school.
OpenAI added document editing within chat windows, which directly competes with Microsoft Office and Google Docs
Microsoft announced a very small model that can run locally on a PC to help users change system settings, using natural language… basically like Alexa for your laptop settings.

July 4, 2025

A legal AI assistant company called Legora made the radar this week with an introduction of a product called workflows (the same exact name as the product from competing firm Harvey). This allows the automation of due diligence document analysis and legal research using natural language commands. It seems like a harbinger of the end of the paralegal era.
Writing assistant company Grammarly is acquiring email automation provider Superhuman to combine writing assistance with email management for automating transactional communication. (real estate, law, accounting, sales)

58 of 77

July 11, 2025

A tool trained to work on spreadsheets achieved a score of 80% on the Excel World Championship test in only 10 minutes. That is 10 times faster than a human.

July 18, 2025

Anthropic launched an enterprise solution for financial data. Norway’s sovereign wealth fund beta tested it and reported a 20% efficiency boost, equivalent to 213,000 saved hours. AIG sped up their underwriting time by 5x and improved data accuracy from 75% to 90%.
OpenAI launched an agent designed for investment banking. It’s specifically made to analyze financial documents and create pitch decks and can also help with valuations and due diligence. Beta testers reported up to 70% reductions in document prep time.

59 of 77

July 25, 2025

Multiple models threw their hat into the ring of the recent International Mathematics Olympiad, a world class high school math competition.

Google Gemini and OpenAI both achieved gold medals in the math competition.

August 8, 2025

Google launched Gemini DeepThink which is a consumer version of the thinking engine used to win the International Math Olympiad recently.
Singapore start-up Manus announced Wide Research which empowers multiple agents across multiple models (as a wrapper). It’s supposedly incredibly powerful for dauntingly scaled tasks like researching all Fortune 100 companies at the same time.
Andreessen Horowitz is backing voice agent platform, EliseAI, which makes AI voice agents for property management, and healthcare, at a $2B valuation!
ElevenLabs announced their redesigned interface for AI voice conversation agents.
Shopify continues to roll out conversational commerce agents.

60 of 77

Multimodality + Agents = Future of the Internet

61 of 77

“The ‘content’ of any medium is always another medium. The content of writing is speech, just as the written word is the content of print, and print is the content of the telegraph.” – Marshall McLuhan 1964

62 of 77

Each new medium both contains and can emulate the mediums it replaces. The internet contains and emulates film, radio, television, publishing, and retail.

AI will include and emulate… the entire Internet.

63 of 77

Browsers, SEO and Publishing

February 28, 2025

New Alexa powered by Claude
Perplexity planning AI browser (Comet)

March 7, 2025

Apple may be preparing Gemini integration in Apple Intelligence
Perplexity Launches Enterprise File Integration

March 21, 2025

Payment processing provider Stripe created a template to allow companies to share transaction documentation with AI agents. Never leave the chat!

April 18, 2025

It’s rumored that OpenAI is creating a social media platform
A company called Firecrawl has released a web scraper that can navigate complex websites and fill out forms.

64 of 77

April 25, 2025

Google announced that its AI-enhanced search now reaches 1.5 billion users per month.
A study showed that these AI overviews reduce click-through rates by 34%.
OpenAI has expressed interest in purchasing the Google Chrome browser, if given the chance.
The Washington Post partnered with OpenAI to include news within ChatGPT.
Adobe has now incorporated Google and OpenAI’s models into its Firefly product.

May 02, 2025

OpenAI rolled out enhanced ChatGPT shopping.
Mastercard and Visa also separately introduced AI agent integration to automate AI purchasing.

65 of 77

May 9, 2025

Both OpenAI and Google have integrated shopping capabilities into their AI features. Big change is coming for retailers (and affiliate models) this year.
Anthropic has integrated web searching into both its chat model and the API. This is yet another shift in the publishing landscape and the internet itself, as models search the Internet and retrieve information without leaving the chat.
Apple plans to add artificial intelligence search options to its web browser, Safari.
LinkedIn added AI powered job searching which supposedly allows users to search for jobs using plain language descriptions. Zillow did this over a year ago with plain language home searches.

66 of 77

May 23, 2025

If agents weren’t enough, interfaces are officially beginning to be disrupted. OpenAI acquired Apple designer Jony Ive’s company for $6.5 billion. The rumor is OpenAI is going to build a device without a predetermined user interface.
Google launched an Internet agent that can manage up to 10 web-based tasks at the same time, including things like booking flights or making restaurant reservations. Even with a web interface like a browser, if the system takes a search query and goes out throughout the entire Internet to accommodate requests, that implies it might as well be a non-graphical interface. You just ask Google what you want and Chrome goes and does it.
Google’s agent can schedule tasks to repeat on a regular basis. For example, if you want to check real estate listings, instead of opening the apps, Google will check them every day for you and even schedule viewings of properties.

https://x.com/Google/status/1924877428997939563 <- In MAY.

AI search tool Perplexity reports that people are booking hotels directly through its AI search platform more and more every day. Hotel advertising is Google’s second-largest category.

67 of 77

May 30, 2025

A startup called the Browser Company announced that they were building a web browser that was chat interface-based as opposed to page-based. The press release bluntly said that web browsers will die.
The CEO of Perplexity predicts that AI agents will decimate Google search volume and shift cost per click budgets to AI integration.
Perplexity released a new feature called Labs, which is a powerful search mode that handles complex projects like building dashboards or mini applications. Labs supports interactive tool use beyond basic web searching. For example, creating financial reports from more than one source, aggregating findings, and creating charts and visualizations.
Microsoft announced “a commitment to The Agentic Web,” where AI agents team up and collaborate to go out on the Internet on our behalf so we no longer have to browse.

68 of 77

June 06, 2025

Meta plans to fully automate their advertising tools by 2026 using artificial intelligence. Completely cutting out the creative process and middle persons.
Opera released a browser called Neon that uses AI agents to automatically complete tasks like filling out forms or booking trips while users browse the web.
H Company introduced web automation agents that are able to read, click, and type on websites with 92% accuracy.
Firecrawl launched a hosted service that connects web scraping abilities directly to the large frontier models. (Savvy people will create tools to compete with you)

June 14, 2025

Google started to roll out data visualization features in their AI search results that automatically create interactive graphs for stock and mutual fund queries. These visualizations are built on the fly using Gemini’s reasoning capabilities to build whatever charts or graphs “Google thinks” best fits the data question. This is bigger than it might initially seem as user interfaces will simply diffuse in front of us rather than being built in advance as fixed experiences and navigation paths. Dynamic content -> dynamic interfaces
Perplexity launched an AI browser called Comet that appears to be gaining loyalty among power users. It integrates with browser history and can read emails and attempts to think ahead of people’s intentions… you just confirm as you go.

69 of 77

July 18, 2025

OpenAI launched “Agent” which combines web browsing, deep research, coding, and local file creation and manipulation. I’ve been using it for a few months. Inside the chat it emulates browsing websites, checking, opening, and sending email, and creates spreadsheets and Google docs. It succeeds via relentless brute force, which is surreal to witness, but this is the worst it will ever be.
Google is adding AI features into search and Chrome. There’s now an option to have AI call a business on your behalf

August 2, 2025

Google’s AI overviews now reach 2 billion monthly users (up from 1.5 in April)

70 of 77

Bonus News From September

ByteDance opensourced a desktop automation Agent.

This agent can use any desktop app, open files, and browse websites using vision models running locally. 100% Free, Opensource, and Local.

Perplexity is serving over 300M user queries every week. 3x growth in approx 9 months from the time we hit 100m weekly queries.

Open-source, self-hostable browser automation library for AI agents; build agents to navigate sites, fill forms, click, and extract info, 90.4% on Web Voyager

Kernel lauched Crazy Fast Browser Infrastructure. Kernel’s API allows developers to launch browsers in the cloud so agents can use the internet as background services.

71 of 77

We've spent recent months connecting Claude to your calendar, documents, and software. The next logical step is letting Claude work directly in your browser.

We view browser-using AI as inevitable: so much work happens in browsers that giving Claude the ability to see what you're looking at, click buttons, and fill forms will make it substantially more useful.

Browser-using agents powered by frontier models are already emerging, making this work especially urgent.

72 of 77

An open-source agent can autonomously curate news briefings, extract facts, and summarize content with integrated human feedback and dynamic tool selection.

73 of 77

Gemini’s API can visit webpages, PDF's, images, and more when you provide the direct URL

That means no opening a browser and it’s from Google.

74 of 77

Florent is head of journalism at HuggingFace

75 of 77

It’s a new era of ergonomics.

The primary audience of your thing (product, service, library, …) is now an LLM, not a human.

There is no playbook.

It can feel daunting, but…

76 of 77

You can participate in this future.

Print catalogs and brick and mortar couldn’t stop e-commerce.

Broadcast couldn’t stop YouTube, TikTok, and Netflix.

This is the most pivotal moment to be in school since 1993.

This is what you are here for!

77 of 77

I’m excited to talk with all of you.

Will share these slides as well as additional resources.

Let’s connect!

Thank You!

ETHANBHOLLAND.COM

1 of 77

2 of 77

3 of 77

4 of 77

5 of 77

6 of 77

7 of 77

8 of 77

9 of 77

10 of 77

11 of 77

12 of 77

13 of 77

14 of 77

15 of 77

16 of 77

17 of 77