Reference Publications

Face Detection and Landmark Localization

Viola, P., & Jones, M. J. 2004. Robust real-time face detection. International journal of computer vision, 57(2), 137-154.

  This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. It provides three key contributions : “Integral Image” which allows the features used by our detector to be computed very quickly, the second is a simple and efficient classifier which is built using the AdaBoost learning algorithm,the third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.

Huang, C., Ai, H., Li, Y., & Lao, S. 2007. High-performance rotation invariant multiview face detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(4), 671-686.

  Rotation invariant multiview face detection (MVFD) aims to detect faces with arbitrary rotation-in-plane (RIP) and rotation-off-plane (ROP) angles in still images or video sequences. MVFD is crucial as the first step in automatic face processing for general applications since face images are seldom upright and frontal unless they are taken cooperatively.

Zhao, X., Shan, S., Chai, X., & Chen, X. 2013. Cascaded shape space pruning for robust facial landmark detection. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1033-1040). IEEE.

  This paper proposes a novel cascaded face shape space pruning algorithm for robust facial landmark detection. Through progressively excluding the incorrect candidate shapes, algorithm can accurately and efficiently achieve the globally optimal shape configuration. Specifically, individual landmark detectors are firstly applied to eliminate wrong candidates for each landmark. Then, the candidate shape space is further pruned by jointly removing incorrect shape configurations. To achieve this purpose, a discriminative structure classifier is designed to assess the candidate shape configurations. Based on the learned discriminative structure classifier, an efficient shape space pruning strategy is proposed to quickly reject most incorrect candidate shapes while preserve the true shape.

Li, H., Lin, Z., Brandt, J., Shen, X., & Hua, G. 2014. Efficient boosted exemplar-based face detection. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1843-1850). IEEE.

  In this method, exemplars as weak detectors are discriminatively trained and selectively assembled in the boosting framework which largely reduces the number of required exemplars. Notably, they propose to include non-face images as negative exemplars to actively suppress false detections to further improve the detection accuracy. They verify their approach over two public face detection benchmarks and one personal photo album, and achieve significant improvement over the state-of-the-art algorithms in terms of both accuracy and efficiency.

Ong, E. J., Lan, Y., Theobald, B., Harvey, R., & Bowden, R. 2009. Robust facial feature tracking using selected multi-resolution linear predictors. In computer vision, 2009 IEEE 12th international conference on (pp. 1483-1490). IEEE.

  This paper proposes a learnt data-driven approach for accurate, real-time tracking of facial features using only intensity information. Constraints such as a-priori shape models or temporal models for dynamics are not required or used.

Kalal, Z., Mikolajczyk, K., & Matas, J. 2010. Face-tld: Tracking-learning-detection applied to faces.

  A novel system for long-term tracking of a human face in unconstrained videos is built on Tracking-Learning-Detection (TLD) approach. The system extends TLD with the concept of a generic detector and a validator which is designed for real-time face tracking resistent to occlusions and appearance changes.

Zhu, X., & Ramanan, D. 2012. Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 2879-2886). IEEE.

  They present a unified model for face detection, pose estimation, and landmark estimation in real-world, cluttered images. The model is based on a mixtures of trees with a shared pool of parts; they model every facial landmark as a part and use global mixtures to capture topological changes due to viewpoint.

Yang, H., & Patras, I. 2013. Sieving regression forest votes for facial feature detection in the wild. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1936-1943). IEEE.

  In this paper they propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features.

Burgos-Artizzu, X. P., Perona, P., & Dollár, P. 2013. Robust face landmark estimation under occlusion. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1513-1520). IEEE.

  They propose a novel method, called Robust Cascaded Pose Regression (RCPR) which reduces exposure to outliers by detecting occlusions explicitly and using robust shape-indexed features.

Yu, X., Huang, J., Zhang, S., Yan, W., & Metaxas, D. N. 2013. Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1944-1951). IEEE.

  This paper addresses the problem of facial landmark localization and tracking from a single camera. They present a two-stage cascaded deformable shape model to effectively and efficiently localize facial landmarks with large head pose variations. For face detection, we propose a group sparse learning method to automatically select the most salient facial landmarks. By introducing 3D face shape model, they use procrustes analysis to achieve pose-free facial landmark initialization.

Dhall, A., Goecke, R., Joshi, J., Wagner, M., & Gedeon, T. 2013. Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 509-516). ACM.

  The Emotion Recognition In The Wild Challenge and Workshop (EmotiW) 2013 Grand Challenge consists of an audiovideo based emotion classification challenges, which mimics real-world conditions

Dhall, A., Goecke, R., Joshi, J., Sikka, K., & Gedeon, T. 2014. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. InProceedings of the 16th International Conference on Multimodal Interaction (pp. 461-466). ACM.

  The Second Emotion Recognition In The Wild Challenge (EmotiW) 2014 consists of an audio-video based emotion classification challenge, which mimics the real-world conditions.

Amberg, B., & Vetter, T. 2011. Optimal landmark detection using shape models and branch and bound. In Computer Vision (ICCV), 2011 IEEE International Conference on (pp. 455-462). IEEE.

  They propose a method to solve the combinatorial problem of selecting out of a large number of candidate landmark detections the configuration which is best supported by a shape model. Their method, as opposed to previous approaches, always finds the globally optimal configuration.

Zhou, F., Brandt, J., & Lin, Z. 2013. Exemplar-based graph matching for robust facial landmark localization. In Computer Vision (ICCV), 2013 IEEE International Conference on (pp. 1025-1032). IEEE.

  Localizing facial landmarks is a fundamental step in facial image analysis. However, the problem is still challenging due to the large variability in pose and appearance, and the existence of occlusions in real-world face images. In this paper, they present exemplar-based graph matching (EGM), a robust framework for facial landmark localization.

Smith, B. M., Brandt, J., Lin, Z., & Zhang, L. 2014. Nonparametric context modeling of local appearance for pose-and expression-robust facial landmark localization. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1741-1748). IEEE.

  We propose a data-driven approach to facial landmark localization that models the correlations between each landmark and its surrounding appearance features. At runtime, each feature casts a weighted vote to predict landmark locations, where the weight is precomputed to take into account the feature's discriminative power. The feature votingbased landmark detection is more robust than previous local appearance-based detectors.

Wu, Y., & Ji, Q. 2014. Discriminative Deep Face Shape Model for Facial Point Detection. International Journal of Computer Vision, 1-17.

  Facial point detection is an active area in computer vision due to its relevance to many applications. It is a nontrivial task, since facial shapes vary significantly with facial expressions, poses or occlusion. In this paper, they address this problem by proposing a discriminative deep face shape model that is constructed based on an augmented factorized three-way Restricted Boltzmann Machines model.

Zhang, Z., Luo, P., Loy, C. C., & Tang, X. 2014. Facial landmark detection by deep multi-task learning. In Computer Vision–ECCV 2014 (pp. 94-108). Springer International Publishing

  Facial landmark detection has long been impeded by the problems of occlusion and pose variation. Instead of treating the detection task as a single and independent problem, we investigate the possibility of improving detection robustness through multi-task learning.

Sun, Y., Wang, X., & Tang, X. 2013. Deep convolutional network cascade for facial point detection. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on (pp. 3476-3483). IEEE.

  They propose a new approach for estimation of the positions of facial keypoints with three-level carefully designed convolutional networks. At each level, the outputs of multiple networks are fused for robust and accurate estimation.

Dantone, M., Gall, J., Fanelli, G., & Van Gool, L. 2012. Real-time facial feature detection using conditional regression forests. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 2578-2585). IEEE

  Although facial feature detection from 2D images is a well-studied field, there is a lack of real-time methods that estimate feature points even on low quality images. Here they propose conditional regression forest for this task. While regression forest learn the relations between facial image patches and the location of feature points from the entire set of faces, conditional regression forest learn the relations conditional to global face properties.

Cao, X., Wei, Y., Wen, F., & Sun, J. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177-190.

  They present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment.

Liang, L., Xiao, R., Wen, F., & Sun, J. 2008. Face alignment via component-based discriminative search. In Computer Vision–ECCV 2008 (pp. 72-85). Springer Berlin Heidelberg

  In this paper, we propose a component-based discriminative approach for face alignment without requiring initialization. Unlike many approaches which locally optimize in a small range, our approach searches the face shape in a large range at the component level by a discriminative search algorithm.

Zhang, Jie, et al. "Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment." European Conference on Computer Vision. Springer International Publishing, 2014.

  Accurate face alignment is a vital prerequisite step for most face perception tasks such as face recognition, facial expression analysis and non-realistic face re-rendering. It can be formulated as the nonlinear inference of the facial landmarks from the detected face region

Zafeiriou, Lazaros, et al. "Joint unsupervised face alignment and behaviour analysis." European Conference on Computer Vision. Springer International Publishing, 2014.

  The predominant strategy for facial expressions analysis and temporal analysis of facial events is the following: a generic facial landmarks tracker, usually trained on thousands of carefully annotated examples, is applied to track the landmark points, and then analysis is performed using mostly the shape and more rarely the facial texture. This paper challenges the above framework by showing that it is feasible to perform joint landmarks localization (i.e. spatial alignment) and temporal analysis of behavioural sequence with the use of a simple face detector and a simple shape model. To do so, they propose a new component analysis technique, which we call Autoregressive Component Analysis (ARCA), and we show how the parameters of a motion model can be jointly retrieved. The method does not require the use of any sophisticated landmark tracking methodology.

Mathias, Markus, et al. "Face detection without bells and whistles." European Conference on Computer Vision. Springer International Publishing, 2014.

  Face detection is a mature problem in computer vision. While diverse high performing face detectors have been proposed in the past, they present two surprising new top performance results. First, we show that a properly trained vanilla DPM reaches top performance, improving over commercial and research systems. Second, we show that a detector based on rigid templates - similar in structure to the Viola&Jones detector - can reach similar top performance on this task. Importantly, we discuss issues with existing evaluation benchmark and propose an improved procedure.

Chen, Dong, et al. "Joint cascade face detection and alignment." European Conference on Computer Vision. Springer International Publishing, 2014.

  They present a new state-of-the-art approach for face detection. The key idea is to combine face alignment with detection, observing that aligned face shapes provide better features for face classification. To make this combination more effective, our approach learns the two tasks jointly in the same cascade framework, by exploiting recent advances in face alignment.

Yu, Xiang, et al. "Consensus of regression for occlusion-robust facial feature localization." European Conference on Computer Vision. Springer International Publishing, 2014.

  They address the problem of robust facial feature localization in the presence of occlusions, which remains a lingering problem in facial analysis despite intensive long-term studies. Recently, regression-based approaches to localization have produced accurate results in many cases, yet are still subject to significant error when portions of the face are occluded. To overcome this weakness, we propose an occlusion-robust regression method by forming a consensus from estimates arising from a set of occlusion-specific regressors.

Smith, Brandon M., and Li Zhang. "Collaborative facial landmark localization for transferring annotations across datasets." European Conference on Computer Vision. Springer International Publishing, 2014.

  In this paper we make the first effort, to the best of our knowledge, to combine multiple face landmark datasets with different landmark definitions into a super dataset, with a union of all landmark types computed in each image as output.

Zhang, Zhanpeng, et al. "Facial landmark detection by deep multi-task learning." European Conference on Computer Vision. Springer International Publishing, 2014.

  Facial landmark detection has long been impeded by the problems of occlusion and pose variation. Instead of treating the detection task as a single and independent problem, they investigate the possibility of improving detection robustness through multi-task learning.

Zhao, Xiaowei, Xiujuan Chai, and Shiguang Shan. "Joint face alignment: Rescue bad alignments with good ones by regularized re-fitting." Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012. 616-630.

  in this paper, starting from an initial face alignment results, they propose to enhance the alignments by a fundamentally novel idea: rescuing the bad alignments with their well-aligned neighbors. In our method, a discriminative alignment evaluator is well designed to assess the initial face alignments and separate the well-aligned images from the badly-aligned ones.

Smith, Brandon M., and Li Zhang. "Joint face alignment with non-parametric shape models." European Conference on Computer Vision. Springer Berlin Heidelberg, 2012.

  They present a joint face alignment technique that takes a set of images as input and produces a set of shape- and appearanceconsistent face alignments as output.

Le, Vuong, et al. "Interactive facial feature localization." European Conference on Computer Vision. Springer Berlin Heidelberg, 2012.

  They address the problem of interactive facial feature localization from a single image. Their goal is to obtain an accurate segmentation of facial features on high-resolution images under a variety of pose, expression, and lighting conditions.

Yang, Shuo, et al. "From facial parts responses to face detection: A deep learning approach." Proceedings of the IEEE International Conference on Computer Vision. 2015.

  In this paper, we propose a novel deep convolutional network (DCN) that achieves outstanding performance on FDDB, PASCAL Face, and AFW. Specifically, our method achieves a high recall rate of 90.99% on the challenging FDDB benchmark, outperforming the state-of-the-art method by a large margin of 2.91%.

Zhang, Jie, et al. "Leveraging Datasets with Varying Annotations for Face Alignment via Deep Regression Network." Proceedings of the IEEE International Conference on Computer Vision. 2015.

  In this work, we propose a deep regression network coupled with sparse shape regression (DRN-SSR) to predict the union of all types of landmarks by leveraging datasets with varying annotations, each dataset with one type of annotation.

Peng, Xi, et al. "Piefa: Personalized incremental and ensemble face alignment." Proceedings of the IEEE International Conference on Computer Vision. 2015.

  Face alignment, especially on real-time or large-scale sequential images, is a challenging task with broad applications. Both generic and joint alignment approaches have been proposed with varying degrees of success. However, many generic methods are heavily sensitive to initializations and usually rely on offline-trained static models, which limit their performance on sequential images with extensive variations. On the other hand, joint methods are restricted to offline applications, since they require all frames to conduct batch alignment. To address these limitations, we propose to exploit incremental learning for personalized ensemble alignment.

Wu, Yue, and Qiang Ji. "Robust facial landmark detection under significant head poses and occlusion." Proceedings of the IEEE International Conference on Computer Vision. 2015.

  In this work, we propose a unified robust cascade regression framework that can handle both images with severe occlusion and images with large head poses. Specifically, the method iteratively predicts the landmark occlusions and the landmark locations. For occlusion estimation, instead of directly predicting the binary occlusion vectors, we introduce a supervised regression method that gradually updates the landmark visibility probabilities in each iteration to achieve robustness.

Scherbaum, Kristina, et al. "Fast face detector training using tailored views." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  This paper takes a look into the automated generation of adaptive training samples from a 3D morphable face model. Using statistical insights, the tailored training data guarantees full data variability and is enriched by arbitrary facial attributes such as age or body weight.

Wu, Baoyuan, et al. "Simultaneous clustering and tracklet linking for multi-face tracking in videos." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  We describe a novel method that simultaneously clusters and associates short sequences of detected faces (termed as face tracklets) in videos. The rationale of our method is that face tracklet clustering and linking are related problems that can benefit from the solutions of each other.

Burgos-Artizzu, Xavier P., Pietro Perona, and Piotr Dollár. "Robust face landmark estimation under occlusion." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  Human faces captured in real-world conditions present large variations in shape and occlusions due to differences in pose, expression, use of accessories such as sunglasses and hats and interactions with objects (e.g. food). Current face landmark estimation approaches struggle under such conditions since they fail to provide a principled way of handling outliers. We propose a novel method, called Robust Cascaded Pose Regression (RCPR) which reduces exposure to outliers by detecting occlusions explicitly and using robust shape-indexed features.

Li, Haoxiang, et al. "Probabilistic elastic part model for unsupervised face detector adaptation." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  They propose an unsupervised detector adaptation algorithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy. The core of our detector adaptation algorithm is a probabilistic elastic part (PEP) model, which is offline trained with a set of face examples. It produces a statisticallyaligned part based face representation, namely the PEP representation.

Zhou, Feng, Jonathan Brandt, and Zhe Lin. "Exemplar-based graph matching for robust facial landmark localization." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  Localizing facial landmarks is a fundamental step in facial image analysis. However, the problem is still challenging due to the large variability in pose and appearance, and the existence of occlusions in real-world face images. In this paper, they present exemplar-based graph matching (EGM), a robust framework for facial landmark localization.

Zhao, Xiaowei, et al. "Cascaded shape space pruning for robust facial landmark detection." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  In this paper, they propose a novel cascaded face shape space pruning algorithm for robust facial landmark detection. Through progressively excluding the incorrect candidate shapes, our algorithm can accurately and efficiently achieve the globally optimal shape configuration

Yang, Heng, and Ioannis Patras. "Sieving regression forest votes for facial feature detection in the wild." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features.

Yu, Xiang, et al. "Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  This paper addresses the problem of facial landmark localization and tracking from a single camera. They present a two-stage cascaded deformable shape model to effectively and efficiently localize facial landmarks with large head pose variations.

Chen, Yen-Lin, et al. "Accurate and robust 3D facial capture using a single rgbd camera." Proceedings of the IEEE International Conference on Computer Vision. 2013.

  This paper presents an automatic and robust approach that accurately captures high-quality 3D facial performances using a single RGBD camera. The key of our approach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration techniques for 3D facial reconstruction.

Tzimiropoulos, Georgios, Stefanos Zafeiriou, and Maja Pantic. "Robust and efficient parametric face alignment." 2011 International Conference on Computer Vision. IEEE, 2011.

  They propose a correlation-based approach to parametric object alignment particularly suitable for face analysis applications which require efficiency and robustness against occlusions and illumination changes. their algorithm registers two images by iteratively maximizing their correlation coefficient using gradient ascent.

Tzimiropoulos, Georgios. "Project-out cascaded regression with an application to face alignment." 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015.

  Cascaded regression approaches have been recently shown to achieve state-of-the-art performance for many computer vision tasks. They propose regression to learn a sequence of averaged Jacobian and Hessian matrices from data, and from them descent directions in a fashion inspired by Gauss-Newton optimization.

Lee, Donghoon, Hyunsin Park, and Chang D. Yoo. "Face alignment using cascade gaussian process regression trees." 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015.

  In this paper, they propose a face alignment method that uses cascade Gaussian process regression trees (cGPRT) constructed by combining Gaussian process regression trees (GPRT) in a cascade stage-wise manner.

Zhu, Shizhan, et al. "Face alignment by coarse-to-fine shape searching." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

  They present a novel face alignment framework based on coarse-to-fine shape searching. Unlike the conventional cascaded regression approaches that start with an initial shape and refine the shape in a cascaded manner, our approach begins with a coarse search over a shape space that contains diverse shapes, and employs the coarse solution to constrain subsequent finer search of shapes.

Li, Haoxiang, et al. "A convolutional neural network cascade for face detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

  In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability, while maintaining high performance.

Ren, Shaoqing, et al. "Face alignment at 3000 fps via regressing local binary features." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

  This paper presents a highly efficient, very accurate regression approach for face alignment. Our approach has two novel components: a set of local binary features, and a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independently.

Parkhi, Omkar M., et al. "A compact and discriminative face track descriptor." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

  Their goal is to learn a compact, discriminative vector representation of a face track, suitable for the face recognition tasks of verification and classification. To this end, they propose a novel face track descriptor, based on the Fisher Vector representation, and demonstrate that it has a number of favourable properties.

Kazemi, Vahid, and Josephine Sullivan. "One millisecond face alignment with an ensemble of regression trees." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

  This paper addresses the problem of Face Alignment for a single image. They show how an ensemble of regression trees can be used to estimate the face's landmark positions directly from a sparse subset of pixel intensities, achieving super-realtime performance with high quality predictions. They present a general framework based on gradient boosting for learning an ensemble of regression trees that optimizes the sum of square error loss and naturally handles missing or partially labelled data.

Xing, Junliang, et al. "Towards multi-view and partially-occluded face alignment." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

  They present a robust model to locate facial landmarks under different views and possibly severe occlusions. To build reliable relationships between face appearance and shape with large view variations, they propose to formulate face alignment as an L1-induced Stagewise Relational Dictionary (SRD) learning problem.

Ghiasi, Golnaz, and Charless C. Fowlkes. "Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model.

Li, Haoxiang, et al. "Efficient boosted exemplar-based face detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

  Despite the fact that face detection has been studied intensively over the past several decades, the problem is still not completely solved. Challenging conditions, such as extreme pose, lighting, and occlusion, have historically hampered traditional, model-based methods. In contrast, exemplar-based face detection has been shown to be effective, even under these challenging conditions, primarily because a large exemplar database is leveraged to cover all possible visual variations.

Pedersoli, Marco, et al. "Using a deformation field model for localizing faces and facial points under weak supervision." 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014.

  Face detection and facial points localization are interconnected tasks. Recently it has been shown that solving these two tasks jointly with a mixture of trees of parts (MTP) leads to state-of-the-art results. However, MTP, as most other methods for facial point localization proposed so far, requires a complete annotation of the training data at facial point level. This is used to predefine the structure of the trees and to place the parts correctly. In this work we extend the mixtures from trees to more general loopy graphs.

Tzimiropoulos, Georgios, and Maja Pantic. "Gauss-newton deformable part models for face alignment in-the-wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

  Arguably, Deformable Part Models (DPMs) are one of the most prominent approaches for face alignment with impressive results being recently reported for both controlled lab and unconstrained settings. Fitting in most DPM methods is typically formulated as a two-step process during which discriminatively trained part templates are first correlated with the image to yield a filter response for each landmark and then shape optimization is performed over these filter responses.

Asthana, Akshay, et al. "Incremental face alignment in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

  This paper deals with the problem of updating a discriminative facial deformable model, a problem that has not been thoroughly studied in the literature. In particular, they study for the first time, to the best of our knowledge, the strategies to update a discriminative model that is trained by a cascade of regressors.

Shen, Xiaohui, et al. "Detecting and aligning faces by image retrieval." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.

  Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, they present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning.

Wu, Yue, Zuoguan Wang, and Qiang Ji. "Facial feature tracking under varying facial expressions and face poses based on restricted boltzmann machines." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.

  Facial feature tracking is an active area in computer vision due to its relevance to many applications. It is a nontrivial task, since faces may have varying facial expressions, poses or occlusions. In this paper, they address this problem by proposing a face shape prior model that is constructed based on the Restricted Boltzmann Machines (RBM) and their variants.

Xiong, Xuehan, and Fernando De la Torre. "Supervised descent method and its applications to face alignment." Proceedings of the IEEE conference on computer vision and pattern recognition. 2013.

  Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2 nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computer vision, 2 nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, this paper proposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function.

Sun, Yi, Xiaogang Wang, and Xiaoou Tang. "Deep convolutional network cascade for facial point detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.

  We propose a new approach for estimation of the positions of facial keypoints with three-level carefully designed convolutional networks. At each level, the outputs of multiple networks are fused for robust and accurate estimation. Thanks to the deep structures of convolutional networks, global high-level features are extracted over the whole face region at the initialization stage, which help to locate high accuracy keypoints.

Cao, Xudong, et al. "Face alignment by explicit shape regression." International Journal of Computer Vision 107.2 (2014): 177-190.

  They present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment.

Luo, Ping, Xiaogang Wang, and Xiaoou Tang. "Hierarchical face parsing via deep learning." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

  This paper investigates how to parse (segment) facial components from face images which may be partially occluded. They propose a novel face parser, which recasts segmentation of face components as a cross-modality data transformation problem, i.e., transforming an image patch to a label map.

Dantone, Matthias, et al. "Real-time facial feature detection using conditional regression forests." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

  Although facial feature detection from 2D images is a well-studied field, there is a lack of real-time methods that estimate feature points even on low quality images. Here they propose conditional regression forest for this task. While regression forest learn the relations between facial image patches and the location of feature points from the entire set of faces, conditional regression forest learn the relations conditional to global face properties.