Time-of-Arrival Theory of the Imaging Sonar of Dolphins
by Douglas Moreman ( Updated on )

Practical applications of the mechanism of this theory are feasible – even if it turns out that dolphins use some other mechanism.

Sonaring behaviors of dolphins suggest an ability perhaps better described by “echo-vision” than by the customary expression “echo-location.” In signals, vision requires greater bandwidth than does hearing. Bony paths in the inner ear carry, to the cochlea, frequency-dependent sounds of complex communications of dolphins, and other sounds of the sea. Does it seem plausible that the bandwidth of a bony path is sufficient for all that plus a form of “vision”?

The physics of the dolphin’s environment suggests a time-of-arrival, “triangulating,” mechanism that would, if it exists, by-pass the cochleae and carry specialized, non-frequency, information along several parallel paths directly into a parallel-processing part of the brain.

Using appropriate math and the physics of an underwater environment, we can see a possible natural version of the mechanism, one that computes “in parallel,” and also see another version that processes serially and that we might use in a 3D imaging fish-finder.

We assume, for our mechanisms, a "click," a sound-of-illumination whose echoes have at least one Feature such that the “toa,” the time-of-arrival, of an instance of that Feature can be detected.

For some species of dolphins, the click has a most prominent cycle whose duration is about 1/100,000, or 10 micro, second. The click has a most prominent Rise in amplitude that lasts for about 5 micro second and a most prominent Fall that lasts for about 5 micro second. Rise and Fall are examples of click-Features such that times-of-arrival of their echoes can be detected.

The proposed serial mechanism, using appropriate Features, has been shown to work in simulations. But from physical necessity, each simulation is incomplete and simplified. The mechanism has not yet (in Summer 2019) been tested in the real world.

Consider how some simple physics-and-mathematics imply the existence of possibly useful, abstract hyperbolic surfaces. Let ToT(A,B) abbreviate Time-of-Travel from A to B. Suppose an instance W of a Feature (such as a Rise or a Fall) reflects from a point P and arrives at points M1 and M2 at times T1 and T2, respectively. Then P has this property:

ToT(P,M1) - ToT(P,M2) = T1 - T2.
(difference in times-of-travel = difference in two measured times)

In an environment which is nearly enough homogeneous with respect to Speed of sound, and letting d(A,B) denote the distance from A to B, we can multiply both sides by Speed and see that P has this property

d(P,M1) - d(P,M2) = u, where u denotes Speed*(T1-T2).
(difference in unknown distances = a knowable product)

We recognize, from Analytic Geometry, that, except in the rare case where u=0, P lies on an hyperbolic surface whose foci are M1 and M2. P must lie in the intersection of all such hyperbolic surfaces for all feasible pairs {(M1,T1),(M2,T2)} of “toa-events” that arise from the same traveling Feature-instance W. And this fact indicates that we might be able to use modifications of textbook "numerical methods" to approximate P from a finite set of pairs of toa-events. Finding directions D to such reflectors P might be part of a kind of “acoustic vision.”

If echoes did not overlap each other then a picture of echoing points might be computed from a large number of such directions D obtained from several toa-events (sensor,toa).

Alas, echoes do, in fact, overlap on sensors, destroying toa-information. This posed a problem to be addressed by augmenting the method. More on that later.

Another problem is that toas cannot be known precisely. Consequently, the sets of possible source-points, are not “surfaces” but are “shells” lying between two boundary surfaces. An intersection of finitely many of them is not a precise point but a non-zero approximating volume. Used in imaging, the thinner are the shells, the smaller are the smallest objects that can be resolved one from another.

The briefer are the detected signals, the thinner are those shells.

Thus, we are not surprised that the sonar-click of a bottlenose dolphin has a prominent cycle that lasts for just 10 micro second. Bottlenoses hunt individual fish. Humpback whales, on the other hand, hunt thousands of small critters-in-schools and seem (I thank Alison Stimpert and Whitlow Au for this information) to have a sonar-click having a prominent cycle that lasts for about 125 times as long, it seems to me, as that of a bottlenose. Sounds of lower “frequency” are better for larger targets and also for targets at longer ranges.

Simulations suggest that echoes of a dolphin-click arriving at an array of 16 sensors jointly contain extractable information on the location, shape, and size of a reflecting object – even in the “passive” case wherein neither the time nor the place of emission of a click is known. Given enough detected directions-to-reflector-points, an image can be computed. In the passive case, the image might be flat, as is the image that the lens of an eye projects onto a retina. In the “active” case, the time and place of emission of a click are known, and not just directions but locations, and not just an image, but a 3D model, of echoing objects can be constructed – by augmenting hyperboloids with ellipsoids.

That a dolphin can acoustically “see” in the passive case is shown by dolphins that “see” – both in nature and in experiment – by means of echoes of clicks made by other dolphins.

Before designing a dolphin-inspired, imaging sonar device, we use simple tools of physiology to see a way that Nature might be computing mental images using a multitude of times-of-arrival.

Consider this possibility: the face of a dolphin – probably the lower jaw, possibly the chin – contains an array of “echo-trigger” sensors each of which can send a neuronal pulse directly into some part of the brain which we might as well call an “acoustic retina.” Such neuronal pulses arrive at the "toa-neurons" of the acoustic retina from an array of echo-triggers -- with sufficiently precise indication of relative times-of-arrival of the click-feature at echo-triggers. Relative times are possibly “coded,” in effect, in lengths of axonal paths. From several correctly associated times-of-arrival, a point of an image can be computed – provided that overlap of echoes at the sensors does not destroy too much information. One method of reducing damage done by overlaps involves using a larger diameter sensor-array. Another method has been found – more on that later.

Simulations software exists that, using special methods to sufficiently overcome overlap, computes images from simulated echoes off simulated fish. These simulations support the idea that, on the scale of dolphins and the fish they eat, a chin-sized array of simple sensors provides sufficient information to compute images of those fish. Effort has been put into making real-enough simulated echoes; so that, existing simulations-software is likely to compute, with little modification, images from real echoes. Thus, it seems likely that a 3D imaging fish-finder can be built that has a sensor-head smaller than a 3 inch square. A commercially viable fish-finder seems plausible -- assuming that tests in the real world prove the concept and that computing-the-next-image can be done fast enough, and that the price can be made low enough. Such a device might prove that sufficient time-of-arrival information for acoustic vision reaches the chin of a dolphin.

Brain processes, particularly when they involve parallel processing, can be, practically, impossible to simulate satisfactorily. The physics-and-mathematics of time-of-arrival theory do not depend upon the physiology. Our ancestors admired wings of birds but our airplanes do not flap their wings. A method has been found and simulated that serially processes times-of-arrival into images and might do so fast enough for non-jerky imaging in a recreational fish-finder.

Suppose an acoustic retina has a million toa-neurons and, for each toa-neuron, N, there is a set E(N) of echotriggers and a direction D(N) from the center of the Array. These are such that, for some Feature F, if a traveling instance W of F comes from D(N) to sufficiently many echotriggers in E(N) then resulting axonal pulses will leave these sensors and reach N so nearly enough at the same time that N fires. The firing of N sends a pulse to some higher part of the brain and indicates that there is, probably, a reflecting object in direction D(N). A large enough set of simultaneously-enough activated toa-neurons can be, or can result in, an "image" somewhere in the brain.

This echotrigger/toa-neuron mechanism is essentially “parallel” and, in Nature, might involve a couple thousand echotriggers and several million axonal connections to a million toa-neurons. Fortunately, there is a serial method that can compute probably useful images from an array of maybe sixteen piezo-elements.

A concept that can help one understand the mostly simple mathematics of time-of-arrival methods is this: a “toa-map” is a set of space-time “points” (X,t) where X is a point of echoing space and t is a possible time-of-arrival at X and no such t is the second term of two such ordered pairs (X1,t) and (X2,t). When a click-feature (such as Rise) emanates from a point P and reaches points X of a point-set S at times T(X) then the set of all such space-time points (X,T(X)) is a toa-map on S that contains information for approximating that location P. The points X of S might be locations of sensors. In a click-event, computation of an image, from feature-instances at an array of sensor-locations X, can be seen as beginning with the discovery, in an array of echoes, of a family of mutually exclusive toa-maps using up all, or enough, of the detected feature-instances of that click-event. Every toa-map in that family can, potentially, yield an image-point. So, every click results, for every click-feature F, in a family of toa-maps, and each family results in an image.

More later ...