Someone posted a video of the Gemini 1.5 doing their workout. He is able to become a personal trainer

Multimodal chatbots that accept videos as content for analysis give a new dimension to these tools

26 February 2024, 11:00 am

Updated February 26, 2024, 12:32

The introduction of Gemini 1.5 a few days ago was based around current technology, but this Google AI model may be more interesting than one might prefer. Above all, thanks to its video analytics capability, one of its features demonstrates that multimodal chatbots – those that accept video, text or images as input – are a promising development for the chatbots of the world.

Gemini, look how I exercise, Mackey Wrigley, an AI solutions developer, explained on twitter How he recorded a nearly 21-minute video of himself lifting weights and then uploaded the video to Gemini for analysis. The result was surprising.

my personal trainer is an ai, This developer asked Gemini 1.5 to create a JSON file to generate a series of recommendations based on the name of each exercise, the number of sets, the repetitions per exercise, the weight, and above all, the video. After seventy seconds, he did it perfectly. In his opinion, the idea worked exceptionally well, and was a validation that such a system could serve as an AI-based personal trainer.

And it can be combined with other data, In fact, according to Wrigley, other information can also be added to this information, such as some medical data, records of our diet, progress photos, thus making this chatbot even more interesting personal trainers and dietitians in this area. Can be completely customized.

A good use case for GPT store, This type of application gives an idea of where things can go in the OpenAI GPT store. With ChatGPT Plus you have access to the creation of personalized chatbots, and one can of course analyze our physical training and then give us advice to correct and make changes to those exercises and improve those routines. Could.

Multimodal chatbots hold promise, The introduction of Gemini 1.5 demonstrated that this type of multimodal option can be very relevant. The model can receive about 700,000 words (about 30,000 lines of code) as input at a time, plus up to 11 hours of audio and an hour of video for later analysis. From there the options are really wide when it comes to analyzing and working with those entries.

Analyze and summarize this video for me, That capability is easily demonstrated in Gemini 1.5 by asking it to analyze any YouTube video and summarize it for us into a few key points. We tried it with a video from our Xataka channel, but Spanish is not supported at the moment, so we tried one of the latest MKBHD videos. In just 10 seconds he produced a remarkable summary of the content.

Image | John Arano

In Xataka We asked two nutritionists to blindly evaluate weekly menus created with GPT-4. it turned out great

Source link

Multimodal chatbots that accept videos as content for analysis give a new dimension to these tools

Leave a Comment Cancel reply