Video and Image Processing

Darwin’s Finch (© Dan Goldwater)

The Video and Image Processing group considers all aspects of video and image processing, including development and experimentation with new or novel sensors, and alternate imaging technologies, including temperature and depth sensing devices.

This group effectively spans all of the school major centres and labs: with MDRI and FMSL including a focus on approaches that have a biomedical flavour and depend on mathematical analysis; and KIT having a focus on brain imaging and multimodal communication .

Current projects conducted by the Video and Image Processing (VIP) research group include:

  • Colour Filter Array Demosaicking

Most digital cameras on the market use a single image sensor with a Bayer colour filter array to capture colour images. Colour filter array demosaicking refers to the determination of missing colour values at each pixel to produce a full colour image. Our VIP research group has been active in this research area, and several algorithms have been developed and published in a number of international papers. The results have been shown to outperform many existing techniques.

  • Image Denoising

Due to the limitation of the size of an image sensor, the area for each pixel is getting smaller as the resolution is increasing. As a result, the signal to noise ratio of the captured images will decrease as the resolution increases. Image denoising within CFA demosaicking will become a crucial factor in the production of high quality and high resolution images. Our research group has been developing multi-dimensional image filtering techniques for denoising an image without blurring it.

Another problem with high resolution image sensors is that with millions of image pixels in a sensor, it is highly probable that it will contain a few defective pixels due to errors in the fabrication process. While these bad pixels would normally be mapped out in the manufacturing process, more defective pixels, known as hot pixels, could appear over time with camera usage. Since some hot pixels can still function at normal settings, they need not be permanently mapped out because they will only appear on a long exposure and/or at high ISO settings. A number of innovative techniques have been developed to tackle the problems.

  • Super-Resolution within Colour Filter Array Demosaicking for underwater colour image sensors

Images captured underwater have poor signal to noise ratio due to poor lighting conditions. This will adversely affect the CFA demosaicking process that follows and result in colour artifacts. Moreover, underwater images have a blue/green cast and thus underwater colour correction is required for more natural?looking colour images. Another important capability to be developed for underwater applications is super-resolution image reconstruction for detailed fault detection such as for improved visualization of cracks in underwater pipeline inspection. Our proposed approach is a unified method of combining underwater colour correction, denoising, and super-resolution image reconstruction within CFA demosaicking. This unified single stage approach will significantly improve the efficiency and performance of colour image sensors for underwater applications.

  • A High Dynamic Range Imaging video capture device using standard image sensors

Through High Dynamic Range Imaging (HDRI, e.g. 32 bits per colour per pixel), the full range of tones undersea can be captured in order to reproduce human perception down to the finest detail. This will provide a new dimension of image details which have not been seen undersea before. Moreover, the extra details will improve higher accuracy for feature detection and other identification techniques for undersea applications, e.g. undersea pipeline inspection. As true HDR image sensors are still under development, it would be an important advancement on undersea imaging if a HDRI video capture device can be realized using standard off the shelf image sensors. Our proposed approach is to use high speed image sensors so that images captured at various apertures and exposures are combined at high speed in the reconstruction of a high dynamic range video footage.

  • Non-invasive Lizard Identification

The aim of the project is to identify individual Pygmy Bluetongue Lizards from their digital images using image identification techniques. Each lizard has a unique scale pattern like finger prints and the development of an image recognition system to identify over 500 individual lizards is underway. This is a project in collaboration with the School of Biological Sciences.

  • Darwin Finch beak Morphology and Photo Identification

The aim of this research is to determine the beak volume and curvature of the small, medium, and large tree finch (Darwin's finches from the Galapagos Islands) from the digital images, and to test some hypotheses from the data.

  • An Expert System for Living Creature Research

An expert system has been successfully developed for our other project to correctly identify lizards based on their individual features. Other capabilities of the expert system under development include the symmetrical analysis of the lizard scale patterns and its relationship to genetic behaviour. Our proposed expert system could also be modified and enhanced to identify other species, such as the Malleefowl, for study and research within their habitat. It would be particularly useful for the study and tracking of smaller living creatures which cannot be tagged or are unable to carry a GPS device.

  • Intraocular Pressure Monitoring System

Glaucoma is the second leading cause of blindness in the world and the leading cause of blindness amongst Africans. A major risk factor in Glaucoma is elevated intraocular pressure (IOP). All current treatments, whether medical, laser or surgical, primarily act by lowering IOP. Thus, the accurate monitoring of IOP is an essential clinical facet in glaucoma care. Another alternative is the use of a glaucoma drainage implant (GDI) that provides an alternative pathway for the drainage of aqueous humor to reduce IOP. A common problem encountered during GDI implantation is hypotony, which has therefore led to the incorporation of valves into a GDI. The aim of this project is to investigate the problems in current glaucoma treatment, namely IOP measurement and regulation of IOP.

  • Face and Facial Feature Tracking and Interpretation
A major focus of the KIT Artificial Intellience and Language Technologies Program is the understanding and modeling of human faces, and their expressions and gestures.  This includes interpreting the emotions from a visual face, reading lips, and tracking eyes to determine what they are looking at,  comparing audio and video information to determine who is talking, and modeling speech, emotions and faces accurately with our talking heads.  For this work we use combinations of colour information, depth information, and conventional image processing techniques in multiple colour spaces, in particular using Red Incusion to find the blood-hue dominated skin, and Red Exclusion to find the shadow-blue dominated features.
  • Body, Body Part and Object Tracking and Interpretation
For our work in intelligent robotics, vehicles and prosthetics (e.g. wheelchairs and ride-on toys) and the KIT work in grounded language and teaching heads (that is learning or teaching what words actually refer to in the real world) we need to be able to recognize objects and their functions. For this work we tend to use simple laboratory/building scenes, simlistic worlds with children's blocks and other toys as props, whilst much of our concentration is on understanding the human body, its body language, and use of gesture. For this work we use combinations of colour information, depth information, and we derive information about lighting, shadows and horizons, as well as using and fusing conventional image processing techniques in multiple colour spaces. We also make use of the Kinect for depth information.
  • Point of Interest Tracking
A more general task still is to automatically determine and track points of interest in an unsupervised way, rather than requiring specific objects or features to be pointed out and labelled for supervised learning. This was originally developed as an alternative to Active Appearance Models (AAM) for body, face and feature tracking, but is also being explored for speeding up optical flow and stereopsis calculations.
  • Estimating Camera Pose Using Vanishing Points

Real-time video processing is an essential aspect of augmented reality, as the virtual world or visualization needs to be overlaid accurately over the real world video.  This requires accurate determination of camera pose, for which we use perspective information in the form of vanishing points, as explained in detail for the augmented reality project .

Further information

We are looking for collaborators and postgraduate research students to join our research group.  We would also be happy to provide more information about the School's research programs, the opportunities for higher degree study and scholarship information.  For more information, please contact the research group leader Dr Jimmy Li.



Refereed journal articles

Fitzgibbon, S.P., DeLosAngeles, D., Lewis, T.W., Powers, D.M., Grummett, T.S., Whitham, E.M., et al. (2016). Automatic determination of EMG-contaminated components and validation of independent component analysis using EEG during pharmacologic paralysis. Clinical Neurophysiology, 127(3) pp. 1781-1793.
[10.1016/j.clinph.2015.12.009] [Scopus]

Jia, X., Huang, H., Sun, Y., Yuan, J. and Powers, D.M. (2016). A novel edge detection approach using a fusion model. MULTIMEDIA TOOLS AND APPLICATIONS, 75(2) pp. 1099-1133.
[10.1007/s11042-014-2359-6] [10.1007/s11042-014-2359-6] [Scopus]

Ali, H., Powers, D.M., Jia, X.B. and Zhang, Y. (2015). Extended Non-negative Matrix Factorization for Face and Facial Expression Recognition. International Journal of Machine Learning and Computing, 5(2) pp. 142-147.
[10.7763/IJMLC.2015.V5.498] [10.7763/IJMLC.2015.V5.498]

Jia, X.B., Zhang, Y., Powers, D.M. and Ali, H. (2014). Multi-classifier fusion based facial expression recognition approach. KSII Transactions on Internet and Information Systems, 8(1) pp. 196-212.
[10.3837/tiis.2014.01.012] [Scopus] [Web Link]

Jia, X., Du, H., Han, Y. and Powers, D. (2014). Analysis and Determination of Inner Lip texture Descriptors for Visual Speech Representation. Journal of Computers, 9(7) pp. 1628-1638 [10.4304/jcp.9.7.1628-1638]

Ali, H. and Powers, D.M. (2014). Fusion Based FastICA Method: Facial Expression Recognition. Journal of Image and Graphics, 2(1) pp. 1-7. [Web Link]

Jia, X.B., Liu, S. and Powers, D.M. (2013). Dynamic Feature Extraction for Facial Expression Recognition Based on Optical Flow. Information Technology Journal, 12(23) pp. 7305-7311. [10.3923/itj.2013.7305.7311] [Scopus]

Fitzgibbon, S.P., Lewis, T., Powers, D., Whitham, E., Willoughby, J. and Pope, K. (2013). Surface Laplacian of Central Scalp Electrical Signals is Insensitive to Muscle Contamination. IEEE Transactions On Biomedical Engineering, 60(1) pp. 4-9. [10.1109/TBME.2012.2195662] [Scopus]

Jia, X.B., Bao, X., Powers, D.M. and Yujian, L. (2013). Facial expression recognition based on block Gabor wavelet fusion feature. Journal of Convergence Information Technology, 8(5) pp. 282-289. [10.4156/jcit.vol8.issue5.33]

Jia, X.B., Du, H., Han, Y., Zhang, K. and Powers, D.M. (2013). Audio/Visual Speech based Pronunciation Automatic Evaluation Algorithm and Comparison Platform. International Journal of Intelligent Information Processing, 4(1) pp. 98-104. [Web Link]

Fitzgibbon, S.P., Lewis, T.W., Powers, D.M., Whitham, E.M., Willoughby, J.O., & Pope, K., 2012. Surface Laplacian of Central Scalp Electrical Signals is Insensitive to Muscle Contamination. IEEE Transactions on Biomedical Engineering, 5001(618), 1-6.

Li, J.S. & Randhawa, S., 2009. Color filter array demosaicking using high-order interpolation techniques with a weighted median filter for sharp color edge preservation. IEEE Transactions on Image Processing, 18(9), 1946-1957.

Kakaday, T., Hewitt, A.W., Voelcker, N.H., Li, J.S., & Craig, J., 2009. Advances in telemetric continuous intraocular pressure assessment. British Journal of Ophthalmology, 93(8), 992-996.

Pope, K., Fitzgibbon, S.P., Lewis, T.W., Whitham, E.M., & Willoughby, J.O., 2009. Relation of gamma oscillations in scalp recordings to muscular activity. Brain Topography, 22(1), 13-17.

Ma, F., Bajger, M., Slavotinek, J.P., & Bottema, M.J., 2007. Two graph theory based methods for identifying the pectoral muscle in mammograms. Pattern Recognition, 40(9), 2592-2602.

Badiei, A., Bottema, M.J., & Fazzalari, N., 2006. Expected and Observed Changes to Descriptors of Trabecular Architecture with Aging - A Comparison of Measurement Techniques. Australasian Physical and Engineering Sciences in Medicine, 29(1), 48-52.

Lewis, T.W. & Powers, D.M., 2003. Audio-Visual Speech Recognition using Red Exclusion an Neural Networks. Journal of Research and Practice in Information Technology, 35(1), 41-64.

Refereed conference papers

Ali, H. and Powers, D.M. (2014). Multi-Feature Fusion based Non Negative Matrix Factorization: Facial Expression Recognition from Imaging Sensors. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. ACM Digital Library, New york, USA: ACM. 2nd Workshop on Machine Learning for Sensory Data Analysis. Gold Coast, Australia. Dec 2014, pp. 25-32.

Ali, H. and Powers, D.M. (2013). Facial Expression Recognition Based on Weighted All Parts Accumulation and Optimal Expression-Specific Parts Accumulation. In P de Souza, U Engelke & A Rahman, ed. 2013 DICTA Proceedings. Piscataway, USA: IEEE. International Conference on Digital Image Computing: Techniques and Applications. Hobart, TAS. Nov 2013.
[10.1109/DICTA.2013.6691497] [Scopus]

Nematzadeh, N., Lewis, T.W. and Powers, D. (2015). Bioplausible Multiscale Filtering in Retinal to Cortical Processing as a Model of Computer Vision. In Proceedings of the International Conference on Agents and Artificial Intelligence (ICAART-2015) Portugal: SCITE PRESS. International Conference on Agents and Artificial Intelligence (ICAART-2015) Lisbon, Portugal. Jan 2015, pp. 305-316.
[10.5220/0005186203050316] [Scopus]

Sikos, L.F. and Powers, D.M. (2015). Knowledge-Driven Video Information Retrieval with LOD: From Semi-Structured to Structured Video Metadata. In Balog, K., Dalton, J., Doucet, A., Ibrahim, Y., ed. Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval. New York: ACM. 24th ACM International Conference on Information and Knowledge Management. Melbourne. Oct 2015.

Lang, S.R., Luerssen, M.H. and Powers, D.M. (2013). Automated evaluation of interest point detectors. In Proceedings of the 2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013. IEEE Xplore. 2013 IEEE/ACIS 12th International Conference on Computer and Information Science. Niigata. Jun 2013, pp. 443-447.

Lang, S.R., Luerssen, M.H. and Powers, D.M. (2013). Repeatability Measurements for 2D Interest Point Detectors on 3D Models. In Robert Burduk, Konrad Jackowski, Marek Kurzynski, Michal Wozniak and Andrzej Zolnierek, ed. Proceedings of CORES 2013. Switzerland: Springer International Publishing. 8th International Conference on Computer Recognition Systems CORES 2013. Milkow, Poland. May 2013, pp. 361-370.
[10.1007/978-3-319-00969-8_35] [Scopus]

Atyabi, A., Luerssen, M., Fitzgibbon, S. and Powers, D. (2012). Dimension Reduction in EEG Data using Particle Swarm Optimization. In A Abbass, D Essam & R Sarker, ed. Proceedings of the IEEE Congress on Evolutionary Computation. New York, USA: IEEE. IEEE CEC 2012. Brisbane, QLD. Jul 2012.
[10.1109/CEC.2012.6256487] [Scopus]

Jia, X.B., Zhang, K., Han, Y. and Powers, D.M. (2012). Pronunciation Quality Evaluation Approach Based on Bimodal Fusion with Noise Adaptive Weight. In Dr. Kae Dal Kwack, Prof. Shigeo Kawata, Dr. Soonwook Hwang, Dr. Dongsoo Han, Dr. Franz Ko, ed. 7th International Conference on Computing and Convergence Technology. Seoul, South Korea: IEEE. International Conference on Computing and Convergence Technology. Seoul, South Korea. Dec 2012, pp. 1244-1247.

Lang, S.R., Luerssen, M.H. and Powers, D.M. (2012). Evolutionary Feature Preselection for Viola-Jones Classifier Training. In Spring World Congress on Engineering and Technology: Proceedings. New York, USA: IEEE. SCET2012. Xi'an, China. May 2012.
[10.1109/SCET.2012.6342142] [Scopus]

Li, J.S. & Randhawa, S., 2011. Dynamic Range Bad Pixel Correction within Edge-Preserving Colour Filter Array Demosaicking. Proceedings of IVCNZ Image and Vision Computing New Zealand 2011, 357-362.

Randhawa, S. & Li, J.S., 2011. Adaptive Order Spline Interpolation for Edge-Preserving Colour Filter Array Demosaicking. 2011 International Conference on Digital Image Computing: Techniques and Applications, 666-671.

Li, J.S. & Randhawa, S., 2010. Blind Reverse CFA Demosaicking for the Reduction of Colour Artifacts from Demosaicked Images. Proceedings of 25th International Conference of Image and Vision Computing New Zealand.

Li, J.S. & Randhawa, S., 2010. Reduction of Colour Artifacts using Inverse Demosaicking. Proceedings of 2010 Digital Image Computing: Techniques and Applications, 105-110.

Li, J.S. & Randhawa, S., 2009. Image Quality Comparison between 3CCD Pixel Shift Technology and Single-sensor CFA Demosaicking. 2009 Digest of Technical Papers International Conference on Consumer Electronics.

Li, J.S. & Randhawa, S., 2009. Adaptive order-statistics multi-shell filtering for bad pixel correction within CFA demosaicking. Proceedings of TENCON2009, 1-6.

Kakaday, T., Plunkett, M., McInnes, S.J., Li, J.S., Voelcker, N.H., & Craig, J., 2009. Development of a wireless intra-ocular pressure monitoring system for incorporation into a therapeutic glaucoma drainage implant. Progress in Biomedical Optics and Imaging - Proceedings of SPIE, 7270(72700O).

Kakaday, T., Plunkett, M., McInnes, S.J., Li, J.S., Voelcker, N.H., & Craig, J., 2009. Design of a Wireless Intraocular Pressure Monitoring System for a Glaucoma Drainage Implant. IFMBE Proceedings 13th International Conference on Biomedical Engineering, 23, 198-201.

Li, J.S. & Randhawa, S., 2009. Work integrated learning for engineering students at Flinders University. Proceedings of the 20th Annnual Conference for the Australasian Association for Engineering Education, 1002-1007.

Li, J.S., Tohl, D., Randhawa, S., Shamiminoori, L., & Bull, C.M., 2009. Non-invasive lizard identification using signature curves. Proceedings of TENCON2009: IEEE Region 10 Conference.

Li, J.S. & Randhawa, S., 2008. Weighted median based colour filter array demosaicking. 2008 23rd International Conference Image and Vision Computing New Zealand, IVCNZ, (4762092), 1-6.

Li, J.S. & Randhawa, S., 2007. Adaptive Colour Filter Array (CFA) Demosaicking with Mixed Order of Approximation. Conference Proceedings of 2007 Information, Decision and Control, 326-331.

Li, J.S. & Randhawa, S., 2007. Colour Filter Array Demosaicking Using Cubic Spline Interpolation. Proceedings of 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1(4217217), I865-I868.

Randhawa, S. & Li, J.S., 2007. CFA demosaicking with improved colour edge preservation. A meeting place for converging technologies and people: proceedings of the TENCON IEEE region 10 annual international conference, 2007(4084977), 1-5.

Li, J.S. & Randhawa, S., 2006. A structural approach to improved colour filter array demosaicking for the Bayer pattern. Proceedings of the 8th IASTED International Conference on Signal and Image Processing, SIP 2006, 157-161.

Li, J.S. & Randhawa, S., 2006. CFA demosaicking by adaptive order of approximation. Proceedings of the 1st International Conference on Computer Vision Theory and Applications, 1, 5-10.

Li, J.S. & Randhawa, S., 2005. High order extrapolation using Taylor series for color filter array demosaicing. Lecture Notes in Computer Science series (Springer LNCS), 3656/2005, 703-711.

Li, J.S. & Randhawa, S., 2005. Improved accuracy for colour filter array demosaicking using high order extrapolation. Proceedings of the Eighth International Symposium on Signal Processing and its Applications 2005, 1(1580263), 331-334.

Li, J.S. & Namati, E., 2004. A novel shape descriptor based on empty morphological skeleton. Proceedings of the International Symposium on Intelligent Multimedia, Video and Speech Processing, 446-449.

Li, J.S. & Randhawa, S., 2004. Improved video mosaic construction by selecting a suitable subset of video images. Proceedings of the Twenty-Seventh Australasian Computer Science Conference, 143-149.

Li, J.S. & Randhawa, S., 2002. Morphological Edge Detection by Successive Segmentation for Thermal Images. IEEE Proceedings of Information, Decision and Control (IDC 2002), 365-369.

Relevant CSEM Courses and Topics

The following awards and topics link directly to the research of the AI and CogSci Group and final year projects are supervised for the courses shown (for further degree combinations see the individual course descriptions):

Bachelor of Computer Science (Honours)
Bachelor of Behavioural Science (Psychology)/Bachelor of Computer Science
Bachelor of Engineering (Biomedical)
Bachelor of Engineering (Robotics)
Bachelor of Information Technology (Honours)
Bachelor of Information Technology (Digital Media) (Honours)
Bachelor of Science (Honours)
Master of Engineering (Biomedical)
Master of Engineering (Electronics)
COMP3722 Theory and Practice of Computation
COMP3742 Intelligent Systems
COMP3751 Interactive Computer Systems
COMP3752 Computer Game Development
COMP4712 Embodied Conversational Agents
COMP4715 Computational Intelligence
COMP4716 Information Retrieval and Text Processing
COMP4717 Mobile Application Development
ENGR3721 Signal Processing
ENGR3741 Physiological Measurement
ENGR3771 Robotic Systems
ENGR4722 Haptic Enabled Systems
ENGR4761 Image Processing