Information Retrieval and Visualization

Searching for information in large collections of data, such as the Internet or more specific information repositories, has become an everyday part of life. The Information Retrieval and Visualisation group investigates how to design search interfaces that allow people to search for information in ways that are in line with their natural ideas about information and searching, and how to visually present that information in such a way that people can understand it easily and intuitively.

An important aspect of Information Retrieval is understanding the semantic distinctions that language conveys, and determining similarity of documents, words and phrases in terms of meaning, not just words.  This thus connects our work on learning syntax (Leibbrandt and Powers, 2010-12) to our work on learning semantic relationships and understanding word similarity (Yang and Powers, 2005-10).  For more information on this linguistic aspect of Information Retrieval, see our work on Language and Learning Technology .

Current Projects

Human-Oriented Data Visualisation

This project examines ways to display complex, multi-dimensional data on a two-dimensional computer display in such a way that information users can rapidly understand the data. Our recent experiments have investigated how easily people are able to search for specific features of visual icons such as shape, colour and animation, and the ways in which these dimensions map intuitively onto semantic dimensions such as time, relevance and complexity.

A Human Factors approach to evaluation and testing is used, where we construct special purpose minimally functional interfaces that are designed to allow us to test the role of a specific variable.  Because having many different features in an interface, whether innovative or ubiquitous, can confuse the exploration of individual features we want to explore, by varying them in a controlled way, we deliberately keep them very simplistic.  This is also important in terms of performing a robust analysis of the significance of the relationships we find between interface attributes, task performance and other human factors. 

We have also explored the role of redundant and alternate forms of presentation, sequentially and simultaneously, the utility of popups and the impact of transparency, the naturalness or intuitiveness of the assignment of screen attributes to data attributes, the utility of graphical representations versus lists, expanding trees, and concept maps or clouds involving single words or complex terms.

Some of the results are expected, and some unexpected – all give pause for thought as we reevaluate the complexity of cues that we take for granted in the real world.  For example, a single item or related group of item flashing has high salience, but we are very bad at distinguish items or groups based on different ways or rates of flashing.  As another example, transparency is becoming common in user interfaces, but our results so far only show negative effects!

Automatic vs Human Indexing
Categorization and Annotation of Texts 

This project investigates how humans differ from automated systems in their ideas on how text documents are related to each other, and aims to bring the automated schemes for classifying, summarizing, annotating and retrieving texts closer in line with human intuition. Results from this project have already shown that people use different keywords to describe (to other people) what a document is about, compared to the terms they would use to search for the document with a search engine, and that the words typically used by search engines to distinguish documents from each other have little relation to the words that people think are relevant to the meaning of the text. 

Other interesting observations from the human factors analysis of subject surveys included significant differences between the performance of novice users/junior undergraduate and experienced users/postgrads/academics – this makes clear that we learn and adapt our searching technique to the technologies we are using. In addition, increased experience of conventional search makes it more difficult to take full advantage of new visual search paradigms and complicates the endeavour of improving these interfaces.  We also found that small changes in the precise equations used for clustering, dimension reduction, or standardizing words and documents (e.g. TFIDF), could have high impact on search effectiveness, as well as strong relationships to the kind of user.

For More Information...

For more information on these projects, or if you're interested in joining the group, please contact Dr Richard Leibbrandt.

Spin-outs

  • thereitis™ – 3D visualisation technology that leverages the human ability to sift quickly through large sets of related images and find objects of interest. Visit thereitis.com for a demo, and to find out more.  
  • YourAmigo™ – The world leader in organic search optimization and deep web search, Yahoo: “Nobody searches the deep web like YourAmigo”. Winner of numerous grants and industry awards with a worldwide presence and a clientelle of Fortune 500 companies.


  

Feature

Information Visualization

in the Real World

 

Information Retrieval and Visualization focuses on the problem of using Visualization techniques to improve and eventually replace or supplement the current page of hits and snippets that we are used to for web search or library search. But Information Visualization is much richer than this and is ideally used in every research discipline.  Graphs of results, whether histograms, line graphs or pie charts – these are all instances of information visualization.

Augmented Reality is a particular twist on Information VIsualization in which the information is integrated into our view of the real world, through the use of special goggles worn by humans or heads up displays as part of vehicle instrumentation.  This information is often retrieved based on a combination of location (GPS) and recognition (video or image processing for the most part).

Just as humans who are moving around the real world can do with heads up information when and where they need it, integrated into their view, so also human operators that are controlling a fleet of autonomous vehicles or semiautonomous robots can do with information that combines in static local information, with status information and sensor readings or images from the individual vehicles, and with dynamic local information relating to changing conditions – whether weather conditions or traffic conditions, the movements of robots or personnel, or the activities of enemy or terrorist forces.

It is also necessary for us to monitor communication reliability , strength and bandwidth, and this is also displayed graphically. The robots may be in the air, on the water or underwater, on the ground or underground.

One special kind of visualization is the Avatar or Talking Head and this is itself a major focus of the Centre for Knowledge and Interaction Technology.

But really, every area of Knowledge and Interaction Technology benefits from and contributes to the development of Information VIsualizations.

 

Publications

Book chapters

Anderson, T.A., Chen, Z., Wen, Y., Milne, M.K., Atyabi, A., Treharne, K., Matsumoto, T., Jia, X., Luerssen, M.H., Lewis, T.W., et al., 2012. Thinking Head MulSeMedia: A Storytelling Environment for Embodied Language Learning. In Multiple Sensorial Media Advances and Applications: New Developments in MulSeMedia. Hershey, Pennsylvania: IGI Global, pp. 182-203.

Refereed journal articles

David M.W. Powers, (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2(1), pp.37-63. [online]. http://www.bioinfo.in/uploadfiles/13031311552_1_1_JMLT.pdf.

Pfitzner, D.M., Leibbrandt, R.E., & Powers, D.M.W., 2009. Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge and Information Systems, 19(3), 361-394.

Pfitzner, D.M., Treharne, K., & Powers, D.M.W., 2008. User Keyword Preference: the Nwords and Rwords Experiments. International Journal of Internet Protocol Technology, 3(3), 149-158.

Refereed conference papers

Powers, D.M.W., Luerssen, M.H., Lewis, T.W., Leibbrandt, R.E., Milne, M.K., Pashalis, J., & Treharne, K., 2010. MANA for the Ageing. Proceedings of the 2010 Workshop on Companionable Dialogue Systems, ACL 2010, 7-12.

Treharne, K. & Powers, D.M.W., 2009. Search Engine Result Visualisation: challenges and opportunities. Proceedings of international symposium on web visualization, 633-638.

Treharne, K., Pfitzner, D.M., Leibbrandt, R.E., & Powers, D.M.W., 2008. A lean online approach to human factors research. Proceedings of the 1st International Conference on PErvasive Technologies Related to Assistive Environments (PETRA 2008), (57).

Leibbrandt, R.E., Luerssen, M.H., Matsumoto, T., Treharne, K., Lewis, T.W., Santi, M.L., & Powers, D.M.W., 2008. An immersive game-like teaching environment with simulated teacher and hybrid world. Animation, multimedia, IPTV and edutainment: proceedings of CGAT '08, 215-222.

Powers, David M.W.  (2012a). The Problem of Kappa. In Proceedings of EACL 2012. 13th Conference of the European Chapter of the Association for Computational Linguistics 2012. pp. 345-355. [online]. http://aclweb.org/anthology-new/E/E12/.
 
Powers, David M.W. (2012b). The Problem of Area Under the Curve. In Proceedings of 2012 IEEE International Conference of Information Science and Technology. ICIST 2012. pp. 567-573. [online]. http://dx.doi.org/10.1109/ICIST.2012.6221710.
 

Darius M. Pfitzner, Leibbrandt, R.E., & Powers, D.M.W. (2009). Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge and Information Systems, 19(3), pp.361-394. [online]. http://dx.doi.org/10.1007/s10115-008-0150-6.

Dongqiang Yang and Powers, D.M.W. (2006a). Word sense disambiguation using lexical cohesion in the context. In Robert Dale and Cécile Paris, ed. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. 21st International Conference on Computational Linguistics. pp. 929-936. [online]. https://www.aclweb.org/anthology-new/P/P06/P06-2119.pdf
 
Dongqiang Yang and Powers, D.M.W. (2006b).  Distributional similarity in the varied order of syntactic spaces. In Proceedings of the First International Conference on Innovative Computing, Information and Control. First International Conference on Innovative Computing, Information and Control ((ICICIC'06)). pp. 406-409. [online]. http://doi.ieeecomputersociety.org/10.1109/ICICIC.2006.439.
 
Dongqiang Yang and Powers, D.M.W. (2006c).  Verb similarity on the taxonomy of WordNet. In Petr Sojka, Key-Sun Choi, Christine Fellbaum, Piek Vossen, ed. Proceedings of the Third International WordNet Conference. The Third International WordNet Conference: GWC 2006. pp. 121-128. [online].
http://david.wardpowers.info/Research/AI/papers/200601-GWC-VerbSimWN.pdf [data] http://david.wardpowers.info/Research/AI/papers/200601-GWC-130verbpairs.txt
 

David M.W. Powers (2003). Recall and precision versus the bookmaker. Proceedings of the Joint International Conference on Cognitive Science, 529-534. [online] http://www.infoeng.flinders.edu.au/papers/20030007.doc or https://dl.dropboxusercontent.com/u/27743223/200302-ICCS-Bookmaker-2up+poster.pdf

Conference publications

Atyabi, A., Anderson, T.A., Treharne, K., & Powers, D.M.W., 2011. Magician Simulator. Eleventh International Conference on Control, Automation, Robotics and Vision (ICARCV 2010).

Luerssen, M.H., Leibbrandt, R.E., Lewis, T.W., Pashalis, J., Treharne, K., Pfitzner, D.M.W., & Powers, D.M., 2009. MANA - An Embodied Calendar for the Aged. Thinking Systems Joint Symposium, 127-127.

Pfitzner, D.M., Hobbs, V.M., & Powers, D.M., 2003. A unified taxonomic framework for information visualization. Proceedings of the 2nd Australian Symposium on Information Visualization, 24, 57-66.