The addition of multimodal support to ChatGPT is a significant enhancement to an already human-like AI that offers responses to user inquiries. Currently, ChatGPT is only capable of text input or a single mode of interaction. GPT-4, on the other hand, will enable information of text, audio, video, and images, making it a multimodal AI technology that has the potential to significantly enhance its capabilities.
Last week, Microsoft USA remained tight-lipped about ChatGPT's GPT-4 upgrade, only hinting at the March 16th event. However, Microsoft Germany took it further and effectively soft-launched GPT-4. The company held an event in Germany that provided information about the GPT-4 upgrade, as reported by Heise.de.
Whether GPT-4 will be integrated into ChatGPT or limited to Microsoft's Bing search engine already has ChatGPT support. Nevertheless, Microsoft Germany confirmed that GPT-4 would be released this week and have multimodal capabilities.
Microsoft CTO Andreas Braun said, "We will be presenting GPT-4 next week, and we will have multimodal models that will provide entirely new possibilities – such as videos."
Braun referred to the AI's natural language understanding technology as a "game changer." He also disclosed that ChatGPT would function in all languages, including multi-language support, allowing users to ask a question in one language and receive an answer in another.
In addition, Holger Kenn, another executive at Microsoft Germany, clarified that a multimodal ChatGPT chatbot could convert text to images, music, and video upon request.
In what ways will the multimodal technology of GPT-4 improve user experience with ChatGPT?
Although there is still much to learn about GPT-4, its multimodal capabilities are expected to enable users to obtain information using various input types. With the ability to process text, audio, video, and image inputs, the AI could analyze YouTube videos or audio recordings to answer users' inquiries.
Microsoft provided an illustration of how ChatGPT's multimodal feature could benefit businesses. Using various input types, the AI could automatically generate text summaries of support calls by analyzing the recordings. This would save significant time, with a large Microsoft customer in the Netherlands potentially saving up to 500 work hours per day. They receive 30,000 calls daily that require summarization. It would take a couple of hours to set up ChatGPT for this purpose.
Microsoft has cautioned that ChatGPT may only be unreliable sometimes, even after its multimodal GPT-4 upgrade. The company is currently developing confidence metrics to improve the chatbot's dependability. It remains to be seen how users can test GPT-4 or whether it will be integrated into ChatGPT later this week. Microsoft recently introduced Kosmos-1, a multimodal AI that supports image input. While Microsoft is a significant investor in OpenAI, the latter will continue to upgrade ChatGPT, including making GPT-4 accessible to a broader audience.