2018 ShanghaiTech Symposium on Information Science and Technology

Distinguished Academic Speakers

ShanghaiTech Symposium on Information and Science and Technology

Rama Chellappa

University of Maryland

Distinguished Professor and Chair

IEEE Fellow

Speech details

Mingming Cheng

Nankai University

Professor

Speech details

Irfan Essa

Georgia Institute of Technology

Professor and Associate Dean

IEEE Fellow

Speech details

Hong Jiang

University of Texas at Arlington

Professor, Department Chair of Computer Science & Engineering

IEEE Fellow

Speech details

 

 

 

 

Mark Johnson

Macquarie University

Professor, Department of Computing

ACL Fellow

Speech details

Hongdong Li

Australian National University

Associate Professor

Speech details

Patrick A. Naylor

Imperial College London

Professor of Electronic & Electrical Engineering

IET Fellow

Speech details

img

Ivan Edward Sutherland
View homepage

Ivan Edward Sutherland

Portland State University

Member of NAS and NAE

Turing Award (1988)

Speech details

 

 

 

 

Rene Vidal

Johns Hopkins University

Professor

IEEE Fellow

Speech details

Liang Wang

Chinese Academy of Sciences

Researcher

杰青

Speech details

Wei Wang

University of California, Los Angeles

Leonard Kleinrock Chair Professor

Speech details

Yongtian Wang

Beijing Institute of Technology

Professor

杰青,长江

Speech details

 

 

 

 

Dong Xu

The University of Sydney

Professor and Chair in Computer Enengeering

IEEE/IAPR Fellow

Speech details

Xiangyang Xue

Fudan University

Professor, Dean

上海市科技进步一等奖

Speech details

Ming-Hsuan Yang

UC Merced / Google Cloud

Professor

NSF CAREER Award

Speech details

Xiaokang Yang

Shanghai Jiao Tong University

Professor, Associate Dean

杰青

Speech details

 

 

 

 

Alan L. Yuille

Johns Hopkins University

Bloomberg Distinguished Professor of Cognitive Science and Computer Science

IEEE Fellow

Speech details

Changxi Zheng

Columbia University

Associate Professor

Speech details

Rui Zheng

ShanghaiTech University

Assistant Professor, School of Information Science and Technology

Speech details

Jie Zhou

Tsinghua University

Professor, Dean

杰青

Speech details

 

 

 

 

Distinguished Industrial Speakers

ShanghaiTech Symposium on Information and Science and Technology

Gang Hua

Microsoft

Principal Researcher

IAPR Fellow

Speech details

Jiaya Jia

Tencent

Distinguished AI Scientist

IEEE Fellow

Speech details

Dinggang Shen

United Imaging Intelligence

CO-CEO

IEEE Fellow

Speech details

Jian Sun

Face++

Chief Scientist

Speech details

 

 

 

 

Jing Xiao

Pingan

Chief Scientist

国家千人

Speech details

Shuicheng Yan

Qihoo 360

VP, AI Institute Director

IEEE/IAPR Fellow

Speech details

 

 

 

 

 

 

 

 

ShanghaiTech Symposium on Information and Science and Technology

Speakers and Speeches Information

Rama Chellappa

University of Maryland

Title: Deep Representations, Adversarial Learning and Domain Adaptation for Some Computer Vision Problems

Abstract:  Recent developments in deep representation-based methods for many computer vision problems have knocked down many research themes pursued over the last four decades. In this talk, I will discuss methods based on deep representations for designing robust computer vision systems with applications in unconstrained face and action verification and recognition, expression recognition, subject clustering and attribute extraction. The face and action recognition system being built at UMD is based on fusing multiple deep convolutional neural networks (DCNNs) trained using publicly available still and video face data sets and task appropriate loss functions. I will then discuss some new results on generative adversarial learning and domain adaptation for improving the robustness of computer vision systems.

Bio:  Prof. Rama Chellappa is a Distinguished University Professor, a Minta Martin Professor of Engineering and Chair of the ECE department at the University of Maryland. His current research interests span many areas in image processing, computer vision, machine learning and pattern recognition. Prof. Chellappa is a recipient of an NSF Presidential Young Investigator Award and four IBM Faculty Development Awards. He received the K.S. Fu Prize from the International Association of Pattern Recognition (IAPR). He is a recipient of the Society, Technical Achievement and Meritorious Service Awards from the IEEE Signal Processing Society. He also received the Technical Achievement and Meritorious Service Awards from the IEEE Computer Society. Recently, he received the inaugural Leadership Award from the IEEE Biometrics Council. At UMD, he received college and university level recognitions for research, teaching, innovation and mentoring of undergraduate students. In 2010, he was recognized as an Outstanding ECE by Purdue University. He received the Distinguished Alumni Award from the Indian Institute of Science in 2016. Prof. Chellappa served as the Editor-in-Chief of PAMI. He is a Golden Core Member of the IEEE Computer Society, served as a Distinguished Lecturer of the IEEE Signal Processing Society and as the President of IEEE Biometrics Council. He is a Fellow of IEEE, IAPR, OSA, AAAS, ACM and AAAI and holds six patents.

Mingming Cheng

Nankai University

Title: Learning Pixel Accurate Image Semantics from Web

Abstract:  Understanding pixel level image semantic is the foundation of many important computer vision and computer graphics applications. Although the related research has achieved rapid development in recent years, the existing state-of-the-art solutions heavily dependent on mass and pixel accurate image annotation. In contrast, humans can autonomously learn how to perform high-precision semantic recognition and target extraction without difficulty through online search. Inspired by this phenomenon, we started with the category-independent semantic feature extraction techniques such as saliency object detection, image segmentation, and edge extraction. Next, we will introduce how to use this category-independent image semantic feature to reduce the reliance on accurate annotation in the semantic learning process, and then implement an image semantic understanding technique that does not require any explicate manual annotation.

Bio:  Ming-Ming Cheng is a professor at Nankai University. He received his Ph.D. degree from Tsinghua University in 2012. Then he worked as a research fellow for 2 years, working with Prof. Philip Torr in Oxford. Dr. Cheng’s research primarily focuses on algorithmic issues in image understanding and processing, including salient object detection, semantic segmentation, low-level vision techniques, image manipulation, etc. He has published over 30 papers in leading journals and conferences, such as IEEE TPAMI, ACM TOG, ACM SIGGRAPH, IEEE CVPR, and IEEE ICCV. He has designed a series of popular methods and novel systems, indicated by 9000+ paper citations (2000+ citations to his first author paper on salient object detection).

Irfan Essa

Georgia Institute of Technology

Title: Computational Video: Technologies for Scaleable Analysis, Creation and Enhancement of Video

Abstract:  Computational technologies have had a huge impact on how video is captured, created, processed, distributed and shared. Content creation is (a) essentially multimodal, (b) driven by capture and mixing, (c) becoming collaborative, immersive, mobile, and (d) possible by not just experts, but by anyone. Distribution and sharing of content have changed too, leading to new forms of consumption of content. These trends require a newer set of computational tools to support the analysis, creation, and enhancement of video content. In this talk about, I will make a series of observations about where these technologies are going, and showcase a few examples of my team’s work that are aimed at video creation, video summarization, video content analysis, video enhancement and collaboration with video. I will specifically show work on enhancing the quality of online videos (now running on YouTube and mobile phones), summarizing videos for sports and personal vacations, supporting the automated production of content, extracting content and improving the quality of information from/in online videos leading up to dynamics scene analysis from video. This talk will be an overview of several recent research papers by team and I will present a series of tools we have developed and also discuss some ongoing project in this area.

Bio:  Irfan Essa is a Distinguished Professor in the School of Interactive Computing (iC) and an Associate Dean of Research in the College of Computing (CoC), at the Georgia Institute of Technology (GA Tech), in Atlanta, Georgia, USA. He is serving as the Inaugural Director of the new Interdisciplinary Research Center for Machine Learning at Georgia Tech (ML@GT). Currently, he is on leave from Georgia Tech, and is working as a Research Scientist at Google, in the Google AI/Perception Team in Mountain View, CA. Professor Essa works in the areas of Computer Vision, Machine Learning, Computer Graphics, Computational Perception, Robotics, Computer Animation, and Social Computing, with potential impact on Autonomous Systems, Video Analysis, and Production (e.g., Computational Photography & Video, Image-based Modeling and Rendering, etc.) Human Computer Interaction, Artificial Intelligence, Computational Behavioral/Social Sciences, and Computational Journalism research. He has published over 150 scholarly articles in leading journals and conference venues on these topics and several of his papers have also won best paper awards. He has been awarded the NSF CAREER and was elected to the grade of IEEE Fellow. He has held extended research consulting positions with Disney Research and Google Research and also was an Adjunct Faculty Member at Carnegie Mellon’s Robotics Institute. He joined GA Tech Faculty in 1996 after his earning his MS (1990), Ph.D. (1994), and holding research faculty position at the Massachusetts Institute of Technology (Media Lab) [1988-1996].

Gang Hua

Microsoft

Title: Deep Visual Patterns Beyond Recognition

Abstract:  In the past several years, end-to-end deep learning has become the dominant paradigm for pattern recognition. Unlike conventional pattern recognition methods, which rely on hand-crafted design of invariant features to represent patterns, deep learning methods learn the representations by directly regressing the output from the input. One criticism for such an end-to-end learning paradigm is that the working mechanism is poorly understood. There had been some work strives for better understanding by visualizing the intermediate-layer feature representations. Such post-mortem sheds lights to more interpretable models. Nevertheless, I will argue that through better design of the deep structure that brings domain knowledge from, e.g., signal processing or natural language processing, we can learn more interpretable representations. Instead of focusing on the visual recognition tasks, I will illustrate the idea using examples at the intersection of media and arts, and language and vision. I will conclude my talk with some of my personal reflections on pattern analysis and recognition.

Bio:  Gang Hua is a Principal Researcher/Research Manager in Microsoft Cloud & AI Division, managing the Machine Perception and Cognition Group in the Core Computer Vision Technology Center. He was an Associate Professor of Computer Science in Stevens Institute of Technology between 2011 and 2015, while holding an Academic Advisor position at IBM T. J. Watson Research Center. Before that, he was a Research Staff Member at IBM Research T. J. Watson Center from 2010 to 2011, a Senior Researcher at Nokia Research Center, Hollywood from 2009 to 2010, and a Scientist at Microsoft Live Labs Research from 2006 to 2009. He received the Ph.D. degree in Electrical and Computer Engineering from Northwestern University in 2006, and a M.S. in pattern recognition and intelligence system from Xi'an Jiaotong University (XJTU) in 2002. He was selected to the Special Class for the Gifted Young of XJTU in 1994 and received a B.S. in Electrical Engineering in 1999. He is the recipient of the 2015 IAPR Young Biometrics Investigator Award. He is an IAPR Fellow, an ACM Distinguished Scientist, and a Senior Member of the IEEE. He has published over 140 peer reviewed papers in top journals and conferences. To date, he holds 19 US patents and has 15 more patents pending.

Jiaya Jia

Tencent

Title: Recent Development in Image Generation and Processing

Abstract:  This talk covers progress of low-level computer vison research in past one year in my group regarding the important tasks of image superresolution, deblurring and image generation. It shows these traditionally well-studied problems still find a lot of room to improve using the new deep learning structures. A lot of notable results are included. In the end of the talk, I will also introduce X-Lab of Tencent and its achievements.

Bio:  Jiaya Jia is an IEEE fellow. He is now the Distinguished Scientist and Director of X-Lab, Tencent, where X-Lab is the core research facility in Tencent focusing on cutting-edge computer vision and sound technologies. Jiaya Jia takes the professorship also at Department of Computer Science and Engineering, The Chinese University of Hong Kong (CUHK). Jiaya Jia published 100+ papers in top conferences and journals where most of them are with new practical techniques for computational imaging, and gave 30+ keynote talks in academia and industry. His papers received 13,000+ citations in total. His team also released 20+ open-source systems and freeware. PhDs and masters from this group made significant contributions, and become leaders, such as CEO of startups and professors, in a variety of fields. He is in the editorial boards of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and International Journal of Computer Vision (IJCV). He served as area chairs for ICCV and CVPR for several times. He was on (technical paper) program committees of major conferences in graphics and computational imaging, including ICCP, SIGGRAPH, and SIGGRAPH Asia, and co-chaired the Workshop on Interactive Computer Vision, in conjunction with ICCV 2007. He was selected to National “Thousand Talents Program”.

Hong Jiang

University of Texas at Arlington

Title: Software/Hardware Co-design for Flash and Flash-based Disk Arrays

Abstract: 

Bio:  Hong Jiang received the B.Sc. degree in Computer Engineering from Huazhong University of Science and Technology, Wuhan, China; the M.A.Sc. degree in Computer Engineering from the University of Toronto, Toronto, Canada; and the PhD degree in Computer Science from the Texas A&M University, College Station, Texas, USA. He is currently Chair and Wendell H. Nedderman Endowed Professor of Computer Science and Engineering Department at the University of Texas at Arlington. Prior to joining UTA, he served as a Program Director at National Science Foundation (2013.1-2015.8) and he was at University of Nebraska-Lincoln since 1991, where he was Willa Cather Professor of Computer Science and Engineering. His present research interests include computer architecture, computer storage systems and parallel I/O, high-performance computing, big data computing, cloud computing, performance evaluation. He has graduated 16 Ph.D. students and supervised about 20 post-doc fellows who now work in either major IT companies or academia. He has over 250 publications in major journals and international Conferences in these areas. Dr. Jiang is a Fellow of IEEE, and Member of ACM.

Mark Johnson

Macquarie University

Title: What Can Deep Learning Tell Us About Natural Language Understanding?

Abstract:  Deep Learning has revolutionised Computational Linguistics and Natural Language Understanding. It has been startlingly successful for supervised machine learning on complex end-to-end NLP tasks such as image captioning or machine translation, but with less spectacular progress in semi-supervised and unsupervised learning. This talk reviews how the field has changed over the last few years, what has stayed the same, and speculates on the impact of deep learning for the larger science of language.

Bio:  Mark Johnson is a Professor of Language Science in the Dept of Computing at Macquarie University, and the Chief Scientific Officer of Voicebox Technologies Australia, an R&D lab on the Macquarie University campus. Mark has worked on a wide range of topics in computational linguistics, but his main area of research is natural language understanding, especially syntactic parsing and semantic analysis, and their applications to text and speech processing. He is a past president of Association for Computational Linguistics (ACL) and currently is an Editor in Chief for the Transactions of the ACL (TACL).

Hongdong Li

Australian National University

Title: Monocular camera 3D depth recovery for dynamic scene structure from motion

Abstract:  In this talk, I will describe some of our recent work on monocular perspective camera based reconstruction of the non-rigid 3D shape of a complex dynamic scene. We aim to answer an open question in 3D vision: “is it possible to recover the 3D shape of a dynamic deformable object with a single moving camera?”. Traditional methods for dynamic 3D reconstruction often employ stereo-vision, or assume the scene (with deformable object) follows certain simple low-order linear model. Our work removes such restrictions and shows that, under certain mild assumptions, monocular 3D reconstruction of a dynamic shape scene is possible, and achieved superior performance on standard benchmarks datasets, including “Sintel”—the open source animated movie. If times allows, I will also cover a recent work on 3D dynamic human pose recovery using structured movements.

Bio:  Hongdong Li is currently a Reader/Associate Professor (Tenured Professor equivalent) of ANU (Australian National University). He is a Chief Investigator and AA2 Sub-Program Leader for the Australia ARC Centre of Excellence for Robotic Vision. He taught undergradaute course of "computer vision" and "robotics" at ANU sicne 2005. His research interests include 3D Computer Vision, Camera Calibration, Robot navigation, autonomous driving, as well as applications of optimization in vision. During 2009-2010 he was a senior researcher with NICTA (Canberra Labs) working on the “Australia Bionic Eyes” project. He was a visiting professor with Carnegie Mellon University in 2017. He served as the Area Chair for CVPR, ICCV, ECCV, BMVC and 3DV in the past; Associate Editor for IEEE Transactions on PAMI (T-PAMI); Program Co-Chair for ACCV (Asian Conference on Computer Vision) 2018. Jointly with co-workers and PhD students he won a number of prestigious awards in computer vision research including the IEEE CVPR Best Paper Award and ICCV Marr Prize-Honourable Mention.

Patrick A. Naylor

Imperial College London

Title: Modulation-domain Multichannel Kalman Filtering for Speech Enhancement

Abstract:  In space-time multichannel signal processing, there are opportunities to exploit simultaneously the spatial structure of the signals captured by multiple microphones and also the temporal structure of speech signals. However, many existing speech enhancement methods neglect the temporal structure and generally rely only on the spatial information from multichannel observations in, for example, beamforming. It is well-known that a speech signal can be modelled as an autoregressive process, and based on linear prediction (LP), single-channel Kalman filtering (KF) based speech enhancement algorithms have been developed. In this talk, a multichannel Kalman filter (MKF) for speech enhancement is derived to consider jointly the multichannel spatial information and the temporal correlation of speech. The temporal evolution of speech is modelled in the modulation domain, and by integrating the spatial information, an optimal MKF gain is derived in the short-time Fourier transform (STFT) domain. It is also shown that the proposed MKF reduces to the conventional multichannel Wiener filter (MWF) if the LP information is discarded. Experimental simulation results demonstrate the effectiveness of the proposed method.

Bio:  Patrick Naylor is a member of academic staff in the Department of Electrical and Electronic Engineering at Imperial College London. He received the BEng degree in Electronic and Electrical Engineering from the University of Sheffield, UK, and the Ph.D. degree from Imperial College London, UK. His research interests are in the areas of speech, audio and acoustic signal processing. He has worked in particular on adaptive signal processing for dereverberation, blind multichannel system identification and equalization, acoustic echo control, speech quality estimation and classification, single and multi-channel speech enhancement and speech production modelling with a particular focus on the analysis of the voice source signal. In addition to his academic research, he enjoys several fruitful links with industry in the UK, USA and in Europe. He is the past-Chair of the IEEE Signal Processing Society Technical Committee on Audio and Acoustic Signal Processing, director and president-elect of the European Association for Signal Processing (EURASIP) and Senior Area Editor of IEEE Transactions on Audio Speech and Language Processing.

Dinggang Shen

United Imaging Intelligence

Title: Deep Learning in Medical Image Synthesis, Quantification, and Prediction

Abstract:  This talk will summarize our recently developed deep learning methods for medical image synthesis (including reconstruction), quantification and prediction. These methods have been applied to automatic brain quantification for the first-year brain images with the goal of early detection of autism such as before 6 months old, as well as MRI-guided diagnosis of Alzheimer's Disease (AD) with the goal of possible early treatment. Also, some of these methods have been applied to automatic delineation of organs for cancer radiotherapy, especially for synthesis of CT from MRI for MRI-based cancer radiotherapy by a novel Generative Adversarial Networks (GAN). The applications of deep learning methods in product development in Shanghai United Imaging Intelligence Co., Ltd. will be also briefly introduced.

Bio:  Dinggang Shen is Jeffrey Houpt Distinguished Investigator, and a Professor of Radiology, Biomedical Research Imaging Center (BRIC), Computer Science, and Biomedical Engineering in the University of North Carolina at Chapel Hill (UNC-CH). He is currently directing the Center for Image Analysis and Informatics, the Image Display, Enhancement, and Analysis (IDEA) Lab in the Department of Radiology, and also the medical image analysis core in the BRIC. He was a tenure-track assistant professor in the University of Pennsylvanian (UPenn), and a faculty member in the Johns Hopkins University. Dr. Shen's research interests include medical image analysis, computer vision, and pattern recognition. He has published more than 800 papers in the international journals and conference proceedings. He serves as an editorial board member for eight international journals. He has also served in the Board of Directors, The Medical Image Computing and Computer Assisted Intervention (MICCAI) Society, in 2012-2015. He will be General Chair for MICCAI 2019. He is Fellow of IEEE, Fellow of The American Institute for Medical and Biological Engineering (AIMBE), and Fellow of The International Association for Pattern Recognition (IAPR).

Ivan Edward Sutherland

Portland State University

Title: Stop the Clock

Abstract:  This talk identifies three essential ingredients of good research: stable support, a worthy goal, and leadership. I give examples from my past half-century of research. Looking towards the future, I offer a fresh research goal. Today, most logic designers use the clocked logic design paradigm, but clocked design will soon stop being useful, as my title suggests, because advances in integrated circuit technology raise the cost of data transport relative to logic. The clocked design paradigm gains simplicity by ignoring data transport delay, but modern chips are so large and logic so fast that data transport delay has become a major concern in chip design. Its new importance will force us to abandon the false simplicity gained by ignoring data transport delay. We must replace it with the new simplicity of a self-timed design paradigm that gives data transport as much importance as logic. Giving data transport a role equal to that of logic lets the designer reason about logic modules separated in both time and space. Instead of marching in step, each logical element will act whenever and as soon as it can. A designer will combine existing independent logic modules into a cooperating whole. I anticipate a day when domain-specific high-level languages will compile directly into self-timed chips.

Bio:  Ivan Sutherland received his Ph.D. degree from MIT in 1963, with a well- known thesis called “Sketchpad” for which he has been called the “father of computer graphics.” His career has included government service, private industry, venture capital and professorships at Harvard, the University of Utah, and Caltech. With his research partner and wife, Marly Roncken, he joined Portland State University in 2009 to found the Asynchronous Research Center (ARC). The ARC will soon graduate its 4th PhD student. Ivan holds more than 70 US patents, and is author of numerous publications and lectures. Ivan’s 1999 book, Logical Effort, describes mathematics for designing fast transistor circuits. Sutherland holds the 1988 ACM Turing Award, the 2012 Kyoto Prize, and the 1998 IEEE John von Neumann Medal. He is a Fellow of the ACM and a Member of both the US National Academy of Engineering and the US National Academy of Sciences. Now 80 years of age, Ivan devotes full time to research, lectures, and writing.

Rene Vidal

Johns Hopkins University

Title: Automatic Methods for the Interpretation of Biomedical Data

Abstract:  In this talk, I will overview our recent work on the development of automatic methods for the interpretation of biomedical data from multiple modalities and scales. At the cellular scale, I will present a structured matrix factorization method for segmenting neurons and finding their spiking patterns in calcium imaging videos, and a shape analysis method for classifying embryonic cardiomyocytes in optical imaging videos. At the organ scale, I will present a Riemannian framework for processing diffusion magnetic resonance images of the brain, and a stochastic tracking method for detecting Purkinje fibers in cardiac MRI. At the patient scale, I will present dynamical system and machine learning methods for recognizing surgical gestures and assessing surgeon skill in medical robotic motion and video data.

Bio:  Rene Vidal is a Professor of Biomedical Engineering and the Innaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. His current major research focus is understanding the mathematical foundations of deep learning and its applications in computer vision and biomedical data science. He has pioneered the development of methods for dimensionality reduction and clustering, such as Generalized Principal Component Analysis and Sparse Subspace Clustering, and their applications to face recognition, object recognition, motion segmentation and action recognition. He has also created new technologies for a variety of biomedical applications, including detection, classification and tracking of blood cells in holographic images, classification of embryonic cardio-myocytes in optical images, and assessment of surgical skill in surgical videos. Dr. Vidal is recipient of numerous awards for his work, including the Jean D'Alembert Faculty Fellowship (2017), IAPR Fellowship (2016), IEEE Fellowship (2014), J.K. Aggarwal Prize (2012), ONR Young Investigator Award (2009), Sloan Fellowship (2009), NSF CAREER Award (2004), as well as best paper awards for his work in machine learning, computer vision, medical imaging, and controls.

Liang Wang

Chinese Academy of Sciences

Title: Bridging the Visual-semantic Gap With Mid-level Attributes for Deep Multimodal Learning

Abstract:  Deep learning methods have been widely used for multimodal data analysis, but it remains challenging due to the large visual-semantic discrepancy. This mainly arises from that the representation of low-level visual pixels usually lacks of high-level semantic information as in its corresponding sentences. In this talk, we will detail our recent work on leveraging mid-level attributes to bridging the visual-semantic gap, and introduce its applications to cross-modal retrieval, video captioning, and referring expression and generation.

Bio:  Prof. Liang Wang received both the BEng and MEng degrees from Anhui University, in 1997 and 2000, respectively, and the PhD degree from the Institute of Automation, Chinese Academy of Sciences (CASIA), in 2004. From 2004 to 2010, he was a research assistant at Imperial College London, United Kingdom, and Monash University, Australia, a research fellow with the University of Melbourne, Australia, and a lecturer with the University of Bath, United Kingdom, respectively. Currently, he is a full professor of the Hundred Talents Program at the National Lab of Pattern Recognition, CASIA. His major research interests include machine learning, pattern recognition, and computer vision. He has widely published in highly ranked international journals such as the IEEE TPAMI and the IEEE TIP, and leading international conferences such as CVPR, ICCV, and ICDM. He is a senior member of the IEEE and a fellow of the IAPR.

Wei Wang

University of California, Los Angeles

Title: Bridging the Knowledge Gap: Big Data Analytics in Science and Beyond

Abstract:  Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms are seeking ways to efficiently explore the pattern space, exploiting the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in scientific fields and the current approaches taken to tackle these challenges.

Bio:  Wei Wang is the Leonard Kleinrock Chair Professor of Computer Science at University of California, Los Angeles and the director of the Scalable Analytics Institute (ScAi). She is a co-director of the NIH BD2K Centers-Coordination Center. She received her PhD degree in Computer Science from the University of California, Los Angeles in 1999. She was a professor in Computer Science at the University of North Carolina at Chapel Hill from 2002 to 2012, and was a research staff member at the IBM T. J. Watson Research Center between 1999 and 2002. Dr. Wang's research interests include big data analytics, data mining, bioinformatics and computational biology, and databases. She has filed seven patents, and has published one monograph and more than one hundred seventy research papers in international journals and major peer-reviewed conference proceedings and multiple best paper awards. Dr. Wang received the IBM Invention Achievement Awards in 2000 and 2001. She was the recipient of an NSF Faculty Early Career Development (CAREER) Award in 2005. She was named a Microsoft Research New Faculty Fellow in 2005. She was honored with the 2007 Phillip and Ruth Hettleman Prize for Artistic and Scholarly Achievement at UNC. She was recognized with an IEEE ICDM Outstanding Service Award in 2012, an Okawa Foundation Research Award in 2013, and an ACM SIGKDD Service Award in 2016. Dr. Wang has been an associate editor of the IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Big Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, ACM Transactions on Knowledge Discovery in Data, Journal of Computational Biology, Journal of Knowledge and Information Systems, Data Mining and Knowledge Discovery, and International Journal of Knowledge Discovery in Bioinformatics. She serves on the organization and program committees of international conferences including ACM SIGMOD, ACM SIGKDD, ACM BCB, VLDB, ICDE, EDBT, ACM CIKM, IEEE ICDM, SIAM DM, SSDBM, RECOMB, BIBM. She was elected to the Board of Directors of the ACM Special Interest Group on Bioinformatics, Computational Biology, and Biomedical Informatics (SIGBio) in 2015.

Jing Xiao

Pingan

Title: AI+financial services

Abstract:  60 years ago, at the Dartboard Conference, artificial intelligence was officially born as a discipline. After two ups and downs, artificial intelligence has entered the third outbreak period. At present, cross-border speech recognition, image recognition, natural language understanding and many other fields have achieved breakthrough results. Recently, Alphago has been a new milestone of artificial intelligence , which marks “artificial intelligence+” will become new subversion artifact after “Internet +” . It means a lot to us for the reason that AI has been bringing great changes to the traditional industry, which has been verified in Ping An financial field exploration . Combining artificial intelligence technology and financial big data, we can make financial services more secure, efficient and low-cost, in a word, more enjoyable. I am honored to be here to give a speech about the latest applications of artificial intelligence in financial area and to what degree it can make a difference in financial industry.

Bio:  Dr. Xiao Jing, professorate senior engineer, an expert in China’s 1000 Talents Plan and Chief Scientist at Ping An Group, Dean of Ping An Technology Research Institute. He obtained his Ph.D degree in computer science from Carnegie Mellon University(CMU), USA. His current research interests include Big Data, Artificial Intelligence, and Robotics.

Dong Xu

The University of Sydney

Title: Visual Domain Adaptation

Abstract:  In many computer vision applications, the domain of interest (i.e., the target domain) contains very few or even no labelled samples, while an existing domain (i.e., the auxiliary/source domain) is often available with a large number of labelled examples. For example, millions of loosely labelled Flickr photos or YouTube videos can be readily obtained by using keywords based search. On the other hand, users may be interested in retrieving and organizing their own multimedia collections of images and videos at the semantic level, but may be reluctant to put forth the effort to annotate their photos and videos by themselves. This problem becomes furthermore challenging because the feature distributions of training samples from the web domain and consumer domain may differ tremendously in statistical properties. To explicitly cope with the feature distribution mismatch for the samples from different domains, in this talk I will describe several our recent works for domain adaptation under different settings as well as their interesting applications in computer vision.

Bio:  Dong Xu is Chair in Computer Engineering at the School of Electrical and Information Engineering, The University of Sydney, Australia. He received the B.Eng. and PhD degrees from University of Science and Technology of China, in 2001 and 2005, respectively. While pursuing the PhD degree, he worked at Microsoft Research Asia and The Chinese University of Hong Kong for more than two years. He also worked as a postdoctoral research scientist at Columbia University from 2006 to 2007 and a faculty member at Nanyang Technological University from 2007 to 2015. His current research interests include computer vision, multimedia and machine learning. He has published more than 100 papers in IEEE Transactions and top tier conferences. His co-authored work (with his former PhD student Lixin Duan) received the Best Student Paper Award in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) in 2010. His co-authored work (with his former PhD student Lin Chen) won the IEEE Transactions on Multimedia Prize Paper Award in 2014. He was awarded the IEEE Computational Intelligence Society Outstanding Early Career Award in 2017. He is/was on the editorial boards of T-PAMI, T-TIP, T-NNLS, T-MM and T-CSVT. He also served as a guest editor of seven special issues in T-NNLS, T-CYB, T-CSVT, IJCV, ACM TOMM, CVIU and IEEE Multimedia. Moreover, he served as a steering committee member of ICME (2016-2017), a program co-chair of ICME 2014, as well as an area chair of CVPR 2012, ECCV 2016 and ICCV 2017. He is a Fellow of the IEEE and the IAPR.

Xiangyang Xue

Fudan University

Title: Video Content Analysis With Deep Learning

Abstract:  Nowadays people produce a huge number of images and videos, many of which are uploaded to the Internet on social media sites. There is a strong need to develop automatic solutions for analyzing the contents of these images and videos. Potential applications of such techniques include effective video content management, retrieval and recommendation, video surveillance, etc. In this talk, I will introduce our recent works on image and video content analysis. I will start by introducing a few recently constructed Internet video datasets. After that I will introduce several recent approaches developed by our team, with a focus on deep learning based methods tailored for image and video analysis.

Bio:  Xiangyang XUE received the B.S., M.S., and Ph.D. degrees from the Xidian University, Xi’an, China, in 1989, 1992, and 1995, respectively, all in information and communication engineering. He is currently a Professor with the School of Computer Science, Fudan University. His research interests include computer vision, video analysis and deep learning, and he has co-authored about 200 technical papers, as well as more than 20 granted patents. He serves as an Associate Editor for IEEE Transaction on Cognitive and Developmental Systems, Journal of Computer Research and Development and Journal of Frontiers of Computer Science & Technology.

Ming-Hsuan Yang

UC Merced / Google Cloud

Title: Learning to Track and Segment Objects in Videos

Abstract:  In this talk, I will present our recent results on visual tracking and video object segmentation. The tracking-by-detection framework typically consists of two stages, i.e., drawing samples around the target object in the first stage and classifying each sample as the target object or as background in the second stage. The performance of existing trackers using deep classification networks is limited by two aspects. First, the positive samples in each frame are highly spatially overlapped, and they fail to capture rich appearance variations. Second, there exists extreme class imbalance between positive and negative samples. This VITAL algorithm aims to address these two problems via adversarial learning. To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes. With the use of adversarial learning, our network identifies the mask that maintains the most robust features of the target objects over a long temporal span. In addition, to handle the issue of class imbalance, we propose a high-order cost sensitive loss to decrease the effect of easy negative samples to facilitate training the classification network. Extensive experiments on benchmark datasets demonstrate that the proposed tracker performs favorably against the state-of-the-art approaches. Online video object segmentation is a challenging task as it entails to process the image sequence timely and accurately. To segment a target object through the video, numerous CNN-based methods have been developed by heavily fine-tuning on the object mask in the first frame, which is time-consuming for online applications. In the second part, we propose a fast and accurate video object segmentation algorithm that can immediately start the segmentation process once receiving the images. We first utilize a part-based tracking method to deal with challenging factors such as large deformation, occlusion, and cluttered background. Based on the tracked bounding boxes of parts, we construct a region-of-interest segmentation network to generate part masks. Finally, a similarity-based scoring function is adopted to refine these object parts by comparing them to the visual information in the first frame. Our method performs favorably against state-of-the-art algorithms in accuracy on the DAVIS benchmark dataset, while achieving much faster runtime performance.

Bio:  Ming-Hsuan Yang is a professor of Electrical Engineering and Computer Science at University of California, Merced, and a researcher at Google Cloud. He received the PhD degree in Computer Science from the University of Illinois at Urbana-Champaign in 2000. He serves as an area chair for several conferences including IEEE Conference on Computer Vision and Pattern Recognition, IEEE International Conference on Computer Vision, European Conference on Computer Vision, Asian Conference on Computer, and AAAI National Conference on Artificial Intelligence. He serves as a program co-chair for IEEE International Conference on Computer Vision in 2019 as well as Asian Conference on Computer Vision in 2014, and general co-chair for Asian Conference on Computer Vision in 2016. He serves as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (2007 to 2011), International Journal of Computer Vision, Computer Vision and Image Understanding, Image and Vision Computing, and Journal of Artificial Intelligence Research. Yang received the Google faculty award in 2009, and the Distinguished Early Career Research award from the UC Merced senate in 2011, the Faculty Early Career Development (CAREER) award from the National Science Foundation in 2012, and the Distinguished Research Award from UC Merced Senate in 2015.

Xiaokang Yang

Shanghai Jiao Tong University

Title: Deep Process Learning

Abstract:  In the digital era, massive behavior data has been recorded in the process of industrial production and human activity. The analysis and utilization of behavior data is of great significance to understand the latent dynamics which inherently govern these seemly “random” data, to adaptively optimize the process, making it more efficient and beneficial to human being. Methods of analyzing such data include broadly model-driven approach and data-driven approach. Model-driven approach attempts to design the explicit models through the abstraction of the dynamic process. Such approach has two drawbacks: On one hand, the model selection requires high expert experience and domain knowledge, which is expensive and time-consuming; On the other hand, the generalization ability of task-specific model cannot adapt to emerging phenomenon in digital era. The increasing availability of behavior data from Internet and IoT provides new opportunities for data-driven modeling of such process. Deep learning with its high flexibility and powerful generalization ability can be embedded into process analysis, which makes end-to-end learning of the process dynamics possible. In the report, we give the motivation of introducing the concept of "deep process learning", and then provides some insights of deep process learning. Finally preliminary results of our group are reported in the area of social media analysis and computer vision.

Bio:  Xiaokang YANG received the B. S. degree from Xiamen University, Xiamen, China, in 1994, the M. S. degree from Chinese Academy of Sciences, Shanghai, China, in 1997, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2000. He is currently Changjiang (Yangtze River) Distinguished Professor in School of Electronic Information and Electrical Engineering, and the Deputy Executive Dean of Artificial Intelligence Institute, Shanghai Jiao Tong University, China. From September 2000 to March 2002, he worked as a Research Fellow in Centre for Signal Processing, Nanyang Technological University, Singapore. From April 2002 to October 2004, he was a Research Scientist in the Institute for Infocomm Research (I2R), Singapore. From August 2007 to July 2008, he visited the Institute for Computer Science, University of Freiburg, Germany, as an Alexander von Humboldt Research Fellow. He has published over 200 refereed papers, and has filed 60 patents. His current research interests include pattern recognition and machine learning, visual signal processing and communication. He is Associate Editor of IEEE Transactions on Multimedia and Senior Associate Editor of IEEE Signal Processing Letters. He was Series Editor of Springer CCIS, and a member of Editorial Board of Digital Signal Processing. He is a member of APSIPA, a senior member of IEEE, a member of VSPC Technical Committee of IEEE Circuits and Systems Society, a member of MMSP Technical Committee of IEEE Signal Processing Society, Chair of Multimedia Big Data Interest Group of MMTC Technical Committee of IEEE Communication Society.

Alan L. Yuille

Johns Hopkins University

Title: Deep Networks and Beyond

Abstract:  Deep networks are very successful for computer vision applications provided there are large annotated datasets enabling supervised learning and testing. But there remain important challenges. Firstly, "unrepresentative datasets", where the deep networks are sensitive to adversarial attacks, changes in context, and to rare or hazardous events. Secondly, "limited supervised training data" which requires transfer learning to deal with few training examples and weak supervision. Finally, "architecture design", where the goal is to automatically search over deep network architectures or to couple deep networks with other machine learning techniques such as random forests. This talk will address all these issues using state of the art computer vision applications.

Bio:  Alan Yuille is a Bloomberg Distinguished Professor of Cognitive Science and Computer Science, Johns Hopkins University. He is a mathematician and computer scientist studying the biology of vision. His research has been focused on the development of computational models for vision, development of mathematical models to explain cognition, artificial intelligence and neural networks in which he is now a world reference. He is developing mathematical models of vision and cognition that allow us to build computers that, when given images or videos, can reconstruct the 3D structure of a scene. These models also serve as computational models of biological vision which can be tested by behavioral, invasive, and non-invasive techniques. He has published more than 300 publications including three books (one co-edited). Dr. Yuille is a recipient of many awards including the Bloomberg Distinguished Professorship in 2016; Helmholtz Test of Time Award in 2013; Marr Prize, ICCV 2013. His work reaches across the computer vision, vision science, and neuroscience communities at Johns Hopkins, particularly in the schools of Arts and Sciences and Engineering.

Changxi Zheng

Columbia University

Title: Audiovisual Computing: From Virtual Reality to Tangible Reality

Abstract:  Over the last decades, the success in the field of visual computing has revolutionized our digital visual experience --- from special effects in Hollywood movies, face recognition on smartphones, to the stunning promise offered by VR/AR goggles. Yet, in this grand picture, one piece remains missing. Our real world has never been silent. Not only is it colorful to our eyes, its sound is also rich and vivid to our ears. In current paradigms, visual computing is often performed in isolation from its audio counterparts. In this talk, I will propose audiovisual computing, a research area that renders, analyzes, and processes audiovisual information. I will first introduce our recent works in this area on physics-based audiovisual models from first principles, and then illustrate audiovisual processing using our work on 360 videos. In the second part of the talk, I’ll discuss the implication of these models on improving the physical world --- namely how to harness the computational audiovisual models to enable tangible forms and objects that offer unprecedented new functionalities. I’ll close my talk by briefly discuss the extensions of our methods beyond the audiovisual modalities, in such fields as food engineering, nanophotonic devices, and wireless communications.

Bio:  Co-director of Columbia’s Computer Graphics Group, Changxi Zheng is currently an Associate Professor in the Department of Computer Science at Columbia University, working on audiovisual processing, computer graphics, acoustic and optical engineering, and scientific computing. He received his Ph.D. from Cornell University with the Best Dissertation Award and his B.S. from Shanghai Jiaotong University. He currently serves as an associated editor of ACM Transactions on Graphics. He was a Conference Chair for SCA in 2017, has won several Best Paper awards, a NSF CAREER Award, and was named one of Forbes’ “30 under 30” in science and healthcare in 2013.

Rui Zheng

ShanghaiTech University

Title: The Application of Ultrasound Spine Imaging Techniques on the Diagnosis and Treatment of Scoliosis

Abstract:  Scoliosis is a 3D spine deformity characteristic with lateral curvature and axial vertebral rotation. It is usually diagnosed in the late juvenile and adolescent period, and the prevalence of idiopathic scoliosis is 2-3% in children. Ultrasound (US) spine imaging techniques have been developed to reconstruct full spine 3D images based on the continuous reflection intensity images from vertebra in accordance with the obtained location and orientation information. It can be applied to the measurement of curve severity, vertebra rotation and the assessment of bone quality. The results measured from US images showed good consistency and reliability in comparison to the conventional radiographic method on clinical trials. The mean absolute difference between the two modalities was lower than the clinical acceptance error (<3°) and the correlation was high (R2>0.9). The US spine imaging techniques are fast, robust, accurate, and easy to operate, and as a radiation-free, mobile and cost-effective method, it showed good potential to be used as a supplemental tool for the diagnosis, monitoring and treatment of scoliosis.

Bio:  Dr. Rui Zheng received her bachelor and master degree from Department of Engineering Physics in Tsinghua University, Beijing, China, in 2000 and 2002 respectively. She received her PhD degree in Physics and Biomedical engineering from University of Alberta, Alberta, Canada in 2011. Between May, 2012 and April 2013, she was a postdoctoral fellow in Laboratory of Mechanics and Acoustics–CNRS, Marseille, France. In 2013-2017, she worked as research associate in Department of surgery, University of Alberta and Glenrose Rehabilitation Hospital, Alberta Health Services, Alberta, Canada. In March of 2018, she joined in the School of Information Science and Technology in ShanghaiTech University as a tenure track assistant professor, PI.