“Exploring the Power of Taxonomy and Embedding in Text Mining”
The real-world big data are largely dynamic, interconnected and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from text data. Such approaches, unfortunately, may not be scalable, especially when such texts are domain-specific and nonstandard (such as social media). We envision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with domain-independent and domain-dependent knowledge-bases, we can explore the power of massive data to transform unstructured data into structured knowledge. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including joint spherical text embedding, discriminative topic mining, taxonomy construction, and taxonomy-guided knowledge mining. We show that data-driven approach could be promising at transforming massive text data into structured knowledge.
About the Speaker:
Jiawei Han is the Michael Aiken Chair Professor in the Department of Computer Science, University of Illinois at Urbana-Champaign. He received the ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Japan's Funai Achievement Award (2018). He is a Fellow of ACM and Fellow of IEEE and served as the Director of Information Network Academic Research Center (INARC) (2009-2016) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of the U.S. Army Research Lab, and co-Director of KnowEnG, a Center of Excellence in Big Data Computing (2014-2019), funded by the NIH Big Data to Knowledge (BD2K) Initiative.
About the Series:
The Penn State Center for Socially Responsible Artificial Intelligence Distinguished Lecture Series highlights world-renowned scholars of repute who have made fundamental contributions to the advancement of socially responsible artificial intelligence. The series aims to provoke attendees and participants to have thoughtful conversations and to facilitate discussion among students, faculty, and industry affiliates of the Center.