Skip to main content

Home  »  Business NewsUS business news   »   New OpenAI ChatGPT Can Listen, Look, And Talk

New OpenAI ChatGPT Can Listen, Look, And Talk


As Apple and Google evolve their voice assistants into chatbots, OpenAI is transforming its chatbot into a voice assistant. 

On Monday, May 13, the San Francisco-based AI start-up unveiled a new version of its ChatGPT that can receive and respond to voice commands, images, and videos.

Based on an AI system called GPT-4o, the new app processes audio, images, and video significantly faster than previous versions. 

It will be free for smartphones and desktop computers. 

Mira Murati, the company's chief technology officer, said: "We are looking at the future of the interaction between ourselves and machines."

This app is part of a broader effort to merge conversational chatbots like ChatGPT with voice assistants like Google Assistant and Apple's Siri. 

As Google integrates its Gemini chatbot with Google Assistant, Apple is preparing a more conversational version of Siri.

OpenAI announced it would gradually roll out the technology to users over the coming weeks, marking the first time ChatGPT is available as a desktop application. 

Previously, similar technologies were offered within various free and paid products, but now they are consolidated into a single system accessible across all products.

During an online event, Murati and her colleagues demonstrated the new app's ability to respond to conversational voice commands, analyze math problems via a live video feed, and read aloud playful stories it generated on the fly. 

While the app cannot generate video, it can create still images representing video frames.

Need Career Advice? Get employment skills advice at all levels of your career

Since its debut in late 2022, ChatGPT has shown that machines can handle requests more like humans. 

It can answer questions, write essays, and even generate computer code by responding to conversational text prompts. 

Unlike traditional AI driven by rules, ChatGPT learned from analyzing vast amounts of text from the internet, including Wikipedia, books, and chat logs. 

Experts praised the technology as a potential alternative to search engines like Google and voice assistants like Siri.

Newer versions of the technology have also learned from sounds, images, and videos, a concept known as "multimodal AI." 

Chatbots sometimes "hallucinate" or fabricate information 

This involves combining chatbots with AI image, audio, and video generators. 

However, chatbots, which are prone to mistakes, sometimes "hallucinate" or fabricate information. 

While chatbots can generate convincing language, they struggle with actions like scheduling meetings or booking flights. 

Companies like OpenAI aim to develop AI agents that can reliably handle such tasks.

The previous version of ChatGPT could accept voice commands and respond with voice, but it used three different AI technologies. 

The new app, based on GPT-4o, can accept and generate text, sounds, and images using a single AI technology, making it more efficient and cost-effective to offer for free. 

Follow us on YouTubeXLinkedIn, and Facebook


Most Read News