Human Interface Support for Information Technology for Mars Research Missions

Craig Jin1, Teewoon Tan1,2, Johahn Leung3, André van Schaik1, Simon Carlile3

1 School of Electrical and Information Engineering, The University of Sydney, NSW, Australia 2006
2Tanad Pty Ltd, NSW, Australia 2069
3 Department of Physiology, The University of Sydney, NSW, Australia 2006

E-mail: {craig,teewoon,andre},,



This paper presents a model system and technology that we are building to investigate the issues important to achieving a psychophysically-adapted mobile outdoor augmented reality (AR) system. These issues are important in situations where human interaction and deployment is necessary and in which unencumbered information delivery to the human explorer is essential. Typical application scenarios for AR technology involve situations in which the overlaying of visual and auditory information onto a real sensory environment enhances the worker’s effectiveness. Relevant to the context of this paper are situations that include the proposed manned space exploration missions (such as of the planet Mars, associated with our point of contact, The Mars Society) and also the Mars Analogue Research Stations being developed to prepare for such expeditions. Our particular focus is developing inter-personal spatial-audio communication head-sets (the talkers’ voices are heard as externalized or out-of-the-head and originating from specific directions in space) that enhance auditory situational awareness. In addition, we are applying image recognition techniques via a head-mounted video camera to provide real-time assistance in the categorization and collection of image data. A transparent head-mounted visual display system (HMD) provides additional visual information to the user.

KeywordsOutdoor augmented reality, Virtual auditory space, Head-related transfer functions, Spatial-audio communication, Image Recognition


Head-sets currently used with inter-personal electronic communication systems do not generate a normal listening condition with spatial hearing. For instance, a speech signal transmitted from due north of the listener is not heard as coming from the North. Instead, sounds over the head-set appear to originate from inside the listener’s head which deprives the listener of situational awareness. In the case of multiple talkers it also drastically reduces the listener’s ability to separate the multiple speech streams. In many activities, such as fire fighting, security and surveillance, search and rescue, and audio/video teleconferencing, it is useful to have an electronic spatial-audio communication system in which the listener is acoustically aware of the talker’s location. We assume that such auditory perceptual factors apply similarly to inter-personal communication during manned space exploration missions, especially during the extravehicular activities (EVAs) associated with such missions. Technical difficulties have so far prevented the general application of 3D audio to these systems.

In addition to an audio interface, we also consider visual feedback and visual data analysis. In the framework proposed here, visual feedback is provided in a transparent mode so that the explorer’s normal vision remains unblocked. In other words, the computer display is rendered onto the existing visual field. Another goal is to provide a "smart" visual interface based on an image recognition system. The visual interface should assist with navigation, landmark recognition, and geological identification.

From our viewpoint, human-interface support for Mars research missions is technologically well-framed in terms of mobile, outdoor augmented reality systems. Augmented reality (AR) systems overlay computer-generated sensory images onto a real scene [1], [14]. In our case, speech communication is rendered as 3D audio and overlaid on top of the listener’s real acoustic environment and visual information is displayed using a transparent head-mounted display. We give a brief review of AR systems and follow this with a detailed discussion of our spatial-audio technology and image recognition techniques.


AR systems typically consist of a wearable computer (commonly a laptop in a backpack) and wearable location and orientation sensors. Depending on the input from the sensors, the computer generates and displays appropriate information to the user. For most AR systems working outdoors in a mobile environment, the U.S. Department of Defence Global Positioning System (GPS) is used to determine the user’s location [9], [40]. A handheld GPS receiver such as the 12 channel eTrex Vista by Garmin weighs as little as 150 grams. This unit updates its position once per second and provides a communication link with a PC via a serial port. As the accuracy of the GPS unit is approximately 15 meters (varying with atmospheric conditions), a differential GPS beacon receiver, such as the Garmin GBR 23, is used to reduce the errors in position to less than 5 meters. The GBR 23 receives the free differential GPS correction signal broadcast by the Australian Maritime Safety Authority. Furthermore, land-based GPS transceiver systems, known as pseudolite (pseudo-satellite) systems, can be made to work in difficult terrains or indoors and can provide accuracies down to a 1 cm [8], [10], [37].

Orientation sensing is used in visually oriented AR systems to determine the direction in which the user is looking [1]. Spatial audio also requires head-orientation sensing, as the ability to render an acoustic space for a listener relies on knowing both the listener’s orientation as well as location relative to the sound source. In the laboratory, such information is commonly obtained using a tethered (not wireless) electromagnetic head-tracking system (FASTRAK, Polhemus; Flock of Birds, ATC). An example of such a system is the 3D auditory horizon indicator in flight cockpits that indicates the gravitational up direction and helps compensate for the fighter pilot’s visual sensory overload during critical situations [43] or the convolvotron system [16]. These spatial-audio displays are also common in psychoacoustic laboratories such as the Auditory Neuroscience Laboratory (ANL), [20-21], but are not suitable for outdoor environments. Several outdoor orientation sensors exist, using a magnetometer, such as the TCM2-50 by Precision Navigation, or a hybrid sensor system combining magnetic and inertial sensors such as the IS-300 Precision Motion Tracker Pro (InterSense). The inertial sensing eliminates jitter and is not susceptible to electromagnetic interference. Nonetheless, active outdoor work requires frequent and sometime quick head rotations so that a smooth rendering of 3D audio requires prediction of head motion to avoid problems with system latencies. The IS-300 Pro has a signal processor that analyses inertial angular rate and angular acceleration data to provide motion prediction up to 50 ms in the future. Communication of the IS-300 Pro orientation sensor with the PC is through a serial port.

Most AR systems research focus on visual interfaces and head-mounted displays to provide the user with such facilities as outdoor navigation, architectural visualisation, and integration with sophisticated military simulation software [1], [2], [12], [15], [29], [34-35], [40]. A limited amount of research has investigated spatial-audio displays [9], [13], [26], [36].


The 3D audio technology behind spatial-audio communication is based on psychoacoustic studies of human spatial hearing that have shown conclusively that the acoustic filtering of the outer ear provides spectral cues (gain and attenuation across frequency) that are critical for the auditory system to determine the location of a sound source [4], [5]. These spectral cues are so perceptually important that sounds are not heard as externalised and outside of the head without them. Our understanding of human spatial hearing, especially the spectral analyses performed by the auditory system that underlie accurate sound localisation, is still incomplete. Nonetheless, our recent work has characterised sound localisation performance and localisation errors under many sound conditions. For example, sounds with restricted bandwidth, such as narrow-band and band-pass filtered sounds, are poorly localised [22], [41], as are sounds whose spectra have been substantially scrambled or smeared [3]. Furthermore, speech stimuli that have been low-pass filtered at 8 kHz (almost all speech-only communication devices have low-pass filtering characteristics at or below this cut-off frequency) are poorly localised compared to broadband speech stimuli [24]. These localisation errors can be large and include front-back errors. It is important for the synthesis of 3D audio to understand that there exists a cone of confusion for which acoustic cues such as the interaural time difference and interaural level difference are constant. Locations within a cone of confusion can only be resolved by the auditory system using the spectral cues associated with the outer ear. Therefore, high-fidelity spatial-audio telecommunication systems depend on accurate modelling of the acoustics of the listener’s outer ears.

Spatial hearing can be simulated electronically by first filtering the sound stimulus with the filter function that mimics the acoustic filtering of the listener’s left ear and correspondingly with the filter function for the listener’s right ear for a particular direction in space. The filtered sound stimuli are then played over earphones to the listener. The generation of 3D audio in such a manner using earphones is commonly referred to as virtual auditory space (VAS). The filter functions will have to be measured for every direction in space and for each listener. The acoustic filtering of each listener’s outer ear is unique and each listener’s spatial hearing is perceptually tuned to their own outer ears. Our recent work shows that approximately 60 percent of the differences in the acoustics of listeners’ outer ears is important for accurate localisation [20]. It is a demanding requirement for the synthesis of 3D audio that the filter functions of the individual listener’s outer ears have to be specified for the location of each sound source. Indeed, special equipment, such as a large anechoic chamber, and recording techniques are required to acoustically measure the filter functions of the outer ear. The recording process takes about 1 hour. The difficulty in obtaining accurate acoustic filter functions quickly and conveniently has hindered the development of the technology.

The next three sections provide a detailed description of important issues involved in the accurate display of 3D audio. The fourth and last section provides a description of the visual interface.


The fidelity of a reproduced sound field can be assessed along a number of dimensions that are both qualitative (e.g. the sense of telepresence engendered) and quantitative (e.g spectral distortions). Here, we report experiments that have explored quantitative aspects of the spatial attributes of the rendered sound field revealed by the accuracy with which observers are able to localize a standardized sound stimuli presented in the virtual field. We describe the behavioral test that best measures localization performance and summarize data on human localization performance in both the anechoic free-field and in virtual space. In addition, we describe the effects on performance of a range of methods for compressing the filters used in rendering the virtual space and describe methods for generating spatially continuous VAS. The impact of various spatial sampling regimes for rendering continuous virtual space is also described in terms of variation in users’ localization performance.

From a sound localization study of 19 subjects under normal listening conditions, the systematic errors and the dispersion were smallest for frontal locations close to the audio-visual horizon and largest for locations behind and above the subject (Figure 1: from [6]). The spherical correlation coefficient (SCC) measures the association between the centroids of the perceived locations and the actual target locations (1 = perfect correlation; 0 = no correlation). The correlation of the data shown in Figure 1 was 0.98 for the data pooled from 19 subjects. The most common type of localisation error is associated with relatively small deviations of the perceived location from the actual location ("local error"). The second type of error typically involves a large error where the perceived location of the target is at a location reflected about the interaural axis. Such errors are often referred to as front-back confusions or (more properly) as cone-of-confusion errors and occur relatively infrequently (< 4%). These cone-of-confusion errors have been removed from the data prior to calculating the centroids and the SCC. In summary, these data provide a reference or benchmark of the spatial resolving power of the human auditory system for broad bandwidth sounds in anechoic space. These data also indicate how accuracy is dependent on the location of the sound. Localization performance also provides a practical and objective method to explore the range of factors that determine the spatial fidelity of rendered virtual auditory space.

Virtual Auditory Space (VAS) is generated by recording the acoustic filtering properties of the outer ears of individual listeners (the Head Related Transfer Functions: HRTFs) and convolving these with sounds subsequently presented over headphones. We routinely employ a "blocked-ear" recording paradigm in our laboratory. This approach involves embedding a small recording microphone in an earplug secured flush with the distal end of the ear canal [32]. The recordings are performed inside an anechoic chamber with the subject placed at the center. A speaker, mounted on the robot arm, delivers the stimuli at a radius of one meter from the listener and is able to describe a spherical space with a lower limit of -50º elevation (0º: Audio-visual horizon). The automated procedure results in 393 HRTFs for the right and left ear recorded for locations evenly distributed on the sphere, from –45° to 90° in elevation. The position of the subject’s head is monitored by the magnetic tracking system to ensure head stability throughout the procedure. Impulse responses are measured using Golay code pairs of 1024 bits long sampled at 80kHz [18].

FIGURE 1 - Pooled localisation responses from 19 subjects shown for front (F), back (B), left (L) and right(R) hemispheres. Target locations are shown by the small '*'; the centroid of the pooled data by the filled circle. The ellipse = response SD.

FIGURE 2 - Location estimates pooled for 5 subjects for sounds presented in VAS. All other details as for figure 1.

Sound localisation performance for stimuli presented in VAS was assessed in the exactly the same manner as for the free field localisation with the exception that the stimuli were delivered using in-ear headphones. The spatial distribution of localisation errors for sounds located in VAS was very similar to that for sounds presented in the free field (c.f. Figure 1 and Figure 2). On average, dispersion of the localization estimates about the centroid was 1.5° greater for stimuli presented in VAS compared to free field. There was also slight increase in the dispersion of localisation estimates for locations behind and also above the subjects when compared to free field localization. An increase in the front-back confusion rates (the most prominent form of cone-of-confusion errors) was also seen with average rates rising from around 3-4% in the free field up to 6% for sounds presented in virtual space. The spherical correlation between the perceived and actual target locations (with the cone of confusion data removed) was 0.973. Furthermore the spherical correlation between the VAS and free field localization was higher still (0.98) indicating that subject biases evident in the free field data were replicated in VAS.


Several reports [25], [27] have indicated that the HRTF contains a degree of redundancy and several algorithms have been investigated to compress these filters (see also [30]). As well as reducing the storage and computational overheads, HRTF compression is also an important first step towards solving the problem of efficient spatial interpolation. Principal Component Analysis (PCA) is one approach for compressing the HRTFs [7], [25], [30]. The general operation of PCA is based on the decomposition of the covariance of the input matrix via eigenvectors and eigenvalues. This results in a set of linear basis functions or principal components (PC) and weighting constants which can be ordered by how much variance they account for. Since the PCs are orthogonal, it is possible to reconstruct a reduced fidelity output by linearly expanding only some of the PC dimensions. In addition, the approach is essentially non-dimensional as it relies on an analysis of variance. Therefore, different representations of the inputs may have different statistical efficiencies in terms of the distribution of the variance across the PC dimensions. It was this aspect of the PCA approach that was explored in this study and the HRTF filters were represented with various formats in time and frequency domains and the efficiency of the PCA compression was analyzed. The efficiency was determined by the rate at which the variance was accommodated by increasing numbers of principal components. In the time domain, unmodified FIR filters were used as input (751 taps). In the frequency domain the inputs were represented as linear amplitude, log magnitude and complex value pairs. Since there were 393 unique locations in each set of HRTF recordings, there were 393 dimensions in each input matrix. The results of these analysis using 18 sets of HRTF filters are presented in Figure 3.

FIGURE 3 - Rate of accommodation of the variance (compression) for different input representations of the HRTF

TABLE 1 - Localization performance for 3 subjects

It is clear that representing the HRTF filters in the frequency domain using a linear amplitude format resulted in the highest compression efficiency. The fidelity of VAS generated with compressed HRTF filters was examined using sound localisation accuracy for HRTF filters reconstructed with 5, 10, 20 and 300 PCs. Minimum phase approximation was used to ensure an appropriate magnitude-phase relationship [33]. Three of the 11 original subjects (Section 5.2) participated in this set of auditory localization experiments and their results are shown (Table 1). These data show that VAS using filters with only 10 PCs resulted in performance approaching that for control stimuli. In terms of the original 393 dimensions, the 10 PCs required here to produce high fidelity VAS represents a compression of the HRTFs to less than 2.5% of the original data size.


While the real free field is a continuous space, the HRTFs are recorded at discrete locations around the listener. Previous work has looked at the interpolation of HRTFs using different methods [28], [42] in both the time and frequency domain. Frequency domain approaches have used straightforward methods such as linear interpolation between nearest neighbors as well as more sophisticated methods such as the Euclidean thin-plate spline [7] and the application of a radial basis function neural network [19]. Generally, the better approaches account for the spherical geometry of the data; however, there are very few studies of psychoacoustical errors associated with HRTF interpolation. One systematic investigation of the psychophysical errors associated with HRTF interpolation unfortunately only examined localization performance at 8 test locations [28]. Our approach involves compressing the frequency domain magnitude components of the HRTFs using PCA to provide a series of PCs and weights as described above. The weights are then interpolated using a spherical thin-plate spline (STPS) according to [44]. The interaural time delay was estimated using a cross correlation of the impulse response functions for the left and right ears and interpolated using the STPS. The benefits of the STPS are: (1) the approximation is continuous in all directions and is suitable for modeling spherically directional data and (2) the spline is a global approximation incorporating all data around the sphere to provide one interpolation value. The frequency amplitude components of the interpolated HRTFs were reconstructed from the PCs and the interpolated weights. The interaural time delay components were added back into the reconstructed HRTFs as an all-pass delay (see [31]). For the nearest neighbor interpolation, the calculations followed the same procedure outlined above except that the interpolation was based on the 12 nearest neighbors for each location that were weighted inversely as their distance on the sphere.

To test the accuracy of the interpolation, HRTF recordings were obtained from 475 locations around the head, 82 of which were selected as a test set and were not used to generate the spherical splines. The magnitude errors for the interpolated HRTFs were calculated for the 82 test locations using estimates derived from the interpolation data sets with a varying number of locations (subsets with 393, 250, 150, 125, 90, 70, 60, 50, 30, 20; subsets covered a range of spatial resolutions varying from about 10° - 45° between neighbors). The root-mean-square error of the magnitude components of the HRTFs was calculated for each test location (Figure 4). Error increased as the number of positions contributing to the spline functions decreased from 393 to around 150 positions and then increased markedly for the smaller sets (Figure 4a,b). The STPS was significantly better than the nearest neighbor interpolation for a given number of locations. A plot of the errors (Figure 4c) demonstrated that the distribution of errors was not uniform throughout space and varied by up to 10dB. This suggests that for sparse data sets the spline fails to accurately model some areas of space where, presumably, the spatially dependent rate of change of the HRTF are relatively higher or have relatively discontinuous changes in the spectral shape.

The localization performance of 5 human subjects was measured for noise stimuli presented in VAS rendered using four interpolation sets (250, 150, 50, 20 positions; 10° , 15° , 30° , 45° degrees of resolution). For 150 locations, performance was identical to that using HRTFs measured at the test locations. Although performance was significantly degraded using the sparse sets of 50 and 20 locations, substantial localization capacity was still evident in VAS despite the relatively high levels of acoustic errors in the HRTF estimates (Figure 5). These data indicate that a spherical thin plate spline with as few as 150 recorded filter functions spaced approximately 15 degrees apart was sufficient to achieve localisation performance equivalent to that in the free field

FIGURE 4 - The change in the magnitude of the interpolation errors as a function of the number of measured HRTFs contributing to the spherical spline. (a) the STPS; (b) a nearest neighbor interpolation. (c) The distribution of RMS dB error across space for the left ear using STPS for 50 HRTFs

FIGURE 5 - (a) The spherical correlation coefficient is plotted as function of the number of HRTF recording locations contributing to the interpolation model. (b) The % of cone of confusion errors for each subject plotted as a function of interpolation set size


Current EVA suits lack visual aids to display auxiliary and augmented information that may assist operations carried out by the astronaut explorer. Our example visual system consists of a transparent-mode head mounted display (HMD) that fits over the eyes and may be connected to the output of a Central Processing Unit (CPU) to display processed information. Sensors, such as cameras and temperature gauges, can be fed to the CPU as raw unprocessed signals. A transparent HMD (Sony Glasstron PLM S700E) uses a half-silvered, optical mirror to combine a computer output LCD display with the light from the real visual scene in front of the observer. In this manner, the physical world is seen along with a ghosted display of computer generated images.

On extended EVAs and in particular explorations of the Martian surface the explorer must have adequate access to up-to-date information to work effectively with an acceptable level of safety. An HMD can provide instant access to required information such as life support parameters, readouts from medical sensors such as blood oxygen level, ECG, body temperature and fatigue [11], [17]. Other parameters include surrounding temperature, radiation levels, and atmospheric composition. It can also show a geographical map of the current location with the relative positions of other crew members. A block diagram of the dataflow between various components is shown in Figure 7. For the CPU we use a notebook computer.

One or more externally mounted cameras capture information in the normal visible electromagnetic spectrum and/or in other non-visible regions such as the infrared band. Each video frame can then be processed to enhance contrast, adjust brightness, perform white balancing and reduce glare. The processed results can be then be displayed using the HMD. Using two cameras it is possible to have stereoscopic vision and short-range distance calculation. Other types of image processing can be performed including digital zooming, edge enhancement and object recognition. The latter is the most difficult and is still the subject of on-going research.

FIGURE 7 - Block diagram of the components of a helmet system with assisted vision.

The aim of object recognition is to have a computer automatically classify an object based on pre-installed expert knowledge and/or training samples. It can be used to automatically identify artificial landmarks, such as flags and beacons, and natural land formations. Landmarks can be used to locate the helmet’s current position on a geographical map. Another use is to assist in identifying rocks or other geographical features of interest. Expert knowledge from a geologist coupled with a large array of images of known geographical examples can be used to train an operator to collect data in the field even when a trained geologist is not in the vicinity. The computer can select images and videos of objects of interest and these can be transmitted to a trained geologist for verification.

We have developed object recognition algorithms that can recognise human faces and extract car number plates as shown in Figures 8 and 9. Figure 8 shows a snapshot of the face recognition system running in real-time. The eyes were located by the algorithm and indicated by crosses. The face was then extracted and recognised. Figure 9 shows the letters and numbers of a car number plate extracted and cross verified with a database of number plates. Although face recognition and car number plate recognition are different applications they have some common core algorithms. Our intention is to apply these same core algorithms to artificial target and geographical formation recognition. One of the methods we intend to use is based on fractal image coding [38], [39].

Fractals are mathematical sets that exhibit self-similarity in all scales of magnification. It is well known that many images of naturally occurring objects, such as leaves, ferns, clouds and coastlines, possess some self-similar properties. Figure 10 shows an artificial fern created using just 4 small equations constituting the fractal code. Any given image can be approximated by a set of equations in a similar manner. Existing images of known geographical properties can be transformed into their fractal code representations and together with expert knowledge can be used to classify geographical formations on the Martian surface.

FIGURE 8 - Car Number Plate Recognition System

FIGURE 9 - Car Number Plate Recognition System

FIGURE 10-Artificial fern


1. Azuma, R., "Survey of augmented reality," Presence: Teleoperators and Virtual Environments, 6(4), 1997.

2. Bajura, M., Fuchs, H., R. Ohbuchi, "Merging virtual reality with the real world: seeing ultrasound imagery within the patient, Proceedings of SIGGRAPH’92, In Computer Graphics 26(2), pp. 203-210, 1992.

3. Best, V., Jin, C., Carlile, S., "Spectral smearing and human sound localisation," In Proceedings of the Australian Neuroscience Society, 13, 2001.

4. Blauert, J., "Spatial hearing: The psychophysics of human sound localization," Revised Ed., Cambridge, Mass.: The MIT Press, 1997.

5. Carlile, S., "Virtual auditory space: Generation and applications," Chapman and Hall (New York), 1996.

6. Carlile, S., Leong, P., Hyams, S., "The nature and distribution of errors in the localization of sounds by humans," Hearing Research, 114, pp. 179-196, 1997.

7. Chen, J., Van Veen, B. D., Hecox, K., "A spatial feature extraction and regularization model for the head-related transfer function," J. Acoust. Soc. Am., 97, pp. 439-452, 1995.

8. Choi, I., "Pseudolite research at UNSW," 1st Hong Kong Symposium on Satellite Positioning System Applications, Hong Kong, December, 1999.

9. Cohen, M., "Augmented audio reality: design for a spatial audio GPS PGS," In Proceedings of Center on Disabilities Virtual Reality Conference (1994).

10. Dai, L., Rizos, C., Wang, J., "The role of pseudo-satellite signals in precise GPS-based positioning," Journal of Geospatial HK Inst. of Engineering Surveyors, 3(1), pp. 33-44, 2001.

11. Ditlea, S. "Augmented Reality," Popular Science, pp. 36-43, February, 2002.

12. Drascic, D., Grodski, J., Milgram, P., Ruffo, K., Wong, P., and Zhai, S., "ARGOS: A display system for augmented reality," Video Proceedings of INTERCHI’93: Human Factors in Computing Systems, (Amsterdam, the Netherlands), 24-29 April 1993.

13. Evans, M., Tew, A., Angus, J., "Spatial audio teleconferencing – which way is better?" In Proceedings of the Fourth International Conference on Auditory Display (Palo Alto, CA), November 1997.

14. Feiner, S., MacIntyre, B., and Seligmann, D. Knowledge-based augmented reality. Communications of the ACM, 36(7), pp. 52-62, 1993.

15. Feiner, S., Webster, A., Kreuger III, T., MacIntyre, B., Keller, E., "Architectural anatomy," Presence: Teleoperators and Virtual Environments 4(3), pp. 318-325, 1995.

16. Foster, S., Wenzel, E., "Virtual acoustic environments: the convolvotron," Computer Graphics, 25(4), p. 386, 1991, [Demonstrations system at the 18th ACM Conference on Computer Graphics and Interactive Techniques].

17. Fullerton, R., "EVA Considerations," Human Exploration of Mars Workshop NASA, Dec.13, 2000.

18. Golay, M.J.E., "Complementary Series," IRE Transactions on Information Theory, 7, pp. 82-87, 1961.

19. Jenison, R.L., Fissell, K., "A spherical basis function neural network for modeling auditory space," Neural Computation, 8, pp. 115-128, 1996.

20. Jin, C., Leong, P., Leung, J., Corderoy A., Carlile, S., "Enabling individualized virtual auditory space using morphological measurements," Proceedings of the First IEEE Pacific-Rim Conference on Multimedia (2000 International Symposium on Multimedia Information Processing), pp. 235-238, 2000.

21. Jin, C., Corderoy, A., Carlile, S., van Schaik, A., "Spectral cues in human sound localization," Advances in Neural Processing Systems 13, edited by Solla, S., Leen, T., Muller, K., (MIT Press) pp. 768-774, 2000.

22. Jin, C., Best, V., Carlile, S., "Localisation of broadband versus low-pass speech stimuli," In Proceedings of the Australian Neuroscience Society, 13, 2001.

23. Jin, C. "Spectral analysis and resolving spatial ambiguities in human sound localization," Ph.D. thesis, The University of Sydney, 2001.

24. Jin, C., Best, V., Carlile, S., Baer, T., and Moore, B., "Speech Localisation," In Proceedings of the 112th Convention of the Audio Engineering Society, (May, 2002).

25. Kistler, D., Wightman, F. L., "A model of head-related transfer functions based on principle components analysis and minimum-phase reconstruction," Journal of the Acoustical Soc. of America, 3, pp. 1637-1647, 1992.

26. Koizumi, N., Cohen, M., and Aoki, S., "Design of virtual conferencing in audio telecommunication," In Proceedings of 92nd Audio Engineering Society Convention, Wien, Austria, preprint 3304, 1992.

27. Kulkarni, A., Colburn, H. S., "Role of spectral detail in sound-source localization," Nature, 396, pp. 747-749, 1998.

28. Langendijk, E.H.A., Bronkhorst, A.W., "Fidelity of three-dimensional sound reproduction using a virtual auditory display," J. Acoust. Soc. Am., 107, pp. 528-537, 2000.

29. Maes, P., "Artificial life meets entertainment: Lifelike autonomous agents", Communications of the ACM, 38, pp. 108-114, 1995.

30. Martens, W.L., "Principle components analysis and resynthesis of spectral cues to perceived direction," ICMC, pp. 274-281, 1987.

31. Mehrgardt, S. and Mellert, V., "Transformation characteristics of the external human ear," J. Acoustical Society of America, 61(6), pp. 1567-1576, 1977.

32. Moller, H., Sorensen, M. F., Hammershoi, D., "Head-related transfer functions of human subjects," Journal of the Audio Eng. Soc., 43, pp. 300-321, 1995.

33. Oppenheim, A.V., Schafer, R.W., "Digital Signal Processing," Prentice-Hall (New York), 1975.

34. Piekarski, W., Thomas, B., Hepworth, D., Gunther, B., and Demczuk, V., "An architecture for outdoor wearable computers to support augmented reality and multimedia applications," In Proceedings of the Third International Conference on Knowledge-Based Intelligent Information Engineering Systems, 1999.

35. Piekarski, W., Thomas, B., "Augmented reality with wearable computers running linux," In Proceedings of Linux.Conf.Au 2001 Sydney, January 16-21, 2001.

36. Sawhney, N., Schmandt, C., "Design of spatialized audio in nomadic environments," In Proceedings of the Fourth International Conference on Auditory Display (Palo Alto, CA), November 1997.

37. Stone, J., LeMaster, E., Powell, J., Rock, S., "GPS pseudolite transceivers and their applications," Proceedings of the Institute of Navigation GPS-99, San Diego, CA, January 1999.

38. Tan, T., Yan, H., "Object recognition based on fractal neighbour distance," Signal Processing, 81, pp. 2105-2129, 2001.

39. Tan, T., Yan H., "The fractal neighbour distance measure," Pattern Recognition, 35, pp. 1371-1387, 2002.

40. Thomas, B.H., Demczuk, V., Piekarski, W., Hepworth, D., and Gunther, D., "A wearable computer system with augmented reality to support terrestrial navigation," In 2nd Int’l Symposium on Wearable Computers, pp. 168-172, Pittsburg, PA, Oct. 1998.

41. van Schaik, A., Jin, C., and Carlile, S., "Human localisation of band passed filtered noise," International Journal of Neural Systems, Vol. 9(5), October 1999, pp. 441-446.

42. Wenzel, E.M., Foster, S.H., "Perceptual consequences of interpolating head-related transfer functions during spatial synthesis," Proc. of the ASSP (IEEE) 1993 Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE, New York), 1993.

43. Wenzel, E., "Research in virtual acoustic displays at NASA," In Proceedings of SimTecT 96, The Simulation Technology and Training Conference, pp. 85-90, March 1996.

44. Wahba, G., "Spline interpolation and smoothing on the sphere," SIAM J. Sci. Statist. Comp., 2, pp. 5-16, 1981.