DolphinGemma: Pioneering AI Communication With Dolphins in the Deep Blue

Abhishek Koiri
8 Min Read

For decades, marine biologists have been captivated by the complex and often mysterious communication systems of dolphins — full of clicks, whistles, and burst pulses. While much progress has been made in observing patterns, truly deciphering and interacting with these marine mammals has remained elusive. That is, until now.

This National Dolphin Day, Google, in collaboration with Georgia Tech and the Wild Dolphin Project (WDP), unveiled a revolutionary advancement in the world of marine biology and artificial intelligence: DolphinGemma. This foundational AI model is trained specifically to recognize, interpret, and generate dolphin vocalizations, potentially bridging the communicative gap between humans and dolphins like never before.

The Need for AI in Dolphin Communication

Understanding another species’ language isn’t simply a matter of listening — it requires analyzing context, history, and behavior. Since 1985, the Wild Dolphin Project has conducted the world’s longest-running underwater dolphin research program, observing a community of wild Atlantic spotted dolphins (Stenella frontalis) in the Bahamas. Their decades-long dataset of audio, video, and individual dolphin behavior provides a uniquely rich foundation for training AI systems like DolphinGemma.

WDP’s approach is unique: non-invasive, longitudinal, and deeply contextual. Researchers have documented communication behaviors such as:

  • Signature whistles (individualized sounds similar to names, especially between mothers and calves)
  • Burst-pulse squawks (linked to aggressive or territorial behavior)
  • Click buzzes (often associated with courtship or shark chases)

These rich acoustic signals are not isolated; they are deeply tied to social dynamics, environmental cues, and individual identities.

Enter DolphinGemma: Structure, Sound, and Sequence

Developed by Google, DolphinGemma is a ~400-million parameter AI model designed for audio-in, audio-out communication modeling. It combines Google’s SoundStream tokenizer with an architecture similar to language models — allowing it to efficiently compress, analyze, and generate complex sound patterns.

What makes DolphinGemma groundbreaking is its capacity to learn and replicate the structure of dolphin vocalizations, offering:

  • Pattern recognition within sequences of natural dolphin calls
  • Generation of novel dolphin-like sound sequences
  • Prediction of subsequent calls in a vocal series

Essentially, DolphinGemma functions similarly to a large language model, but for dolphin audio, predicting the “next sound” based on prior acoustic context.

Aiding Field Research: From Theory to Practice

The Wild Dolphin Project is now deploying DolphinGemma in its 2025 field season. Equipped with Google Pixel phones, researchers are running the model directly on-device, enabling real-time analysis and generation without bulky hardware. Early results indicate that DolphinGemma can significantly accelerate researchers’ ability to:

  • Identify recurring sound clusters
  • Map acoustic structures to social interactions
  • Detect anomalies or unique events in vocal patterns

As the model continues to refine its understanding, researchers envision developing a shared vocabulary, perhaps beginning with synthetic whistles that refer to objects dolphins already recognize — like seagrass, scarves, or play items.

From Listening to Conversing: The CHAT System

DolphinGemma is also being integrated into a parallel system called CHAT (Cetacean Hearing Augmentation Telemetry), developed in partnership with Georgia Tech. CHAT doesn’t attempt to decipher full dolphin language; rather, it enables two-way interaction using a controlled set of synthetic sounds.

How CHAT Works:

  1. Synthetic Whistles: Created by the system and associated with specific objects.
  2. Recognition and Feedback: If a dolphin mimics a whistle, researchers receive alerts via underwater bone-conduction headphones.
  3. Reinforcement: The correct item is offered to the dolphin, reinforcing the association.

CHAT has been tested on previous Pixel models and will soon transition to Google Pixel 9, which enables real-time processing of both deep learning models and signal analysis simultaneously — all in a compact, waterproof design.

Benefits of Using Pixel Phones:

  • Lightweight and field-portable
  • High-performance audio processing
  • Lower cost than custom marine hardware
  • Scalable for wide deployment

This real-time, in-ocean communication could lead to a deeper understanding of how dolphins engage with the world — and with us.

Open Source for Global Research Impact

In the spirit of open science, Google plans to release DolphinGemma as an open model by summer 2025. While it is currently trained on Atlantic spotted dolphin data, it holds potential for adaptation to other cetacean species (e.g., bottlenose or spinner dolphins) with fine-tuning.

This open model will allow:

  • Independent marine labs to analyze their own acoustic data
  • Cross-species research on marine communication
  • Academic collaboration to refine interspecies AI tools

The Promise of Interspecies Communication

The dream of speaking to dolphins has long been a fascination of scientists, storytellers, and ocean lovers. While we are still far from fluent conversations, tools like DolphinGemma may one day allow us to:

  • Understand emotional or social contexts in dolphin communication
  • Predict group behavior based on vocal cues
  • Develop symbolic languages that enable shared understanding

As DolphinGemma’s generative capabilities grow, so does the potential to test dolphin-to-human communication loops, where patterns from both parties can be reciprocated and interpreted in real-time.

Ethical Considerations

As with any breakthrough in AI and animal research, ethical practices must guide development:

  • All interaction with dolphins should remain non-invasive and respectful.
  • No attempt is made to domesticate or manipulate behavior outside the natural context.
  • Open sharing of data ensures transparency and scientific integrity.

WDP’s motto, “In Their World, On Their Terms,” remains the guiding principle.

Conclusion: A Future of Listening, Learning, and Interacting

DolphinGemma is more than a research tool — it is a bridge between species, powered by the best of human innovation and deep respect for marine life. Through the tireless observational work of the Wild Dolphin Project, the engineering brilliance of Georgia Tech, and the technological muscle of Google, we are approaching a historic milestone.

With DolphinGemma, we are not just recording and observing. We are beginning to understand. And perhaps one day, we will be able to respond in kind — not as passive listeners, but as respectful participants in one of the most sophisticated natural communication systems on Earth.

Stay tuned this summer as DolphinGemma is released to the global research community. The ocean is speaking — and now, we might just start speaking back.

Read Related Article:

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *