The world of data has become really BIG: starting with the desktop, through the mobile phone, and up to the emerging domain of voice AI devices, we are on the cusp of everyday appliances producing continuous stream of rich data. Such abundance of data poses significant technical challenges and has driven massive innovation, along a wide debate in the society on the fairness of it use, privacy and legal issues.
The BIG Track at the Web Conference 2019 will provide a forum to hear the latest innovations and state of the art approaches from leaders in the big data space, from both academia and industry. BIG (Big Data Innovators Gathering) series of events are co-located with the Web Conference series (2014, 2015, 2016, 2017, 2018). The day will be composed of a set of keynotes, invited talks and panels.
Vanja Josifovski Airbnb
Liane Lewin-Eytan Amazon
- 9:00 - 9:10
Reimagining Conversational Search with Voice-Enabled Devices
Eugene Agichtein • Professor of CS at Emory University and Amazon Scholar
Dr. Eugene Agichtein is a Winship Associate Professor of Computer Science at Emory University in Atlanta, USA, where he founded and leads the Intelligent Information Access Laboratory (IR Lab). Starting January 2019, he is also a part-time "Amazon Scholar" (principal scientist) at Amazon. Eugene's research spans the areas of information retrieval, natural language processing, data mining, and human computer interaction, most recently focusing on conversational search. Together with many colleagues he published over 100 papers, recognized by multiple awards, including the A.P. Sloan fellowship and the 2013 Karen Spark Jones Award from the British Computer Society, and "test of time" awards from the SIGIR and WSDM conference. Eugene was Program Co-Chair of the WSDM 2012 and WWW 2017 conferences. More information is available at http://www.cs.emory.edu/~eugene/.
- 9:10 - 10:00
Driving Network Growth at LinkedIn: Past, Present and Future
Hema Raghavan • Director of Engineering, LinkedIn
At LinkedIn our mission is to use AI to connect every member of the global workforce to make them more productive and successful. The social network is the backbone for professionals to engage with each other at every stage of their career. LinkedIn’s “People You May Know” (PYMK) product has fueled viral growth to connect the world’s professionals to each other to form an active community. PYMK has also driven the innovation in technologies and platforms like Kafka and Voldemort/Venice. More recently LinkedIn’s PYMK recommendations have moved to a near real time graph walk platform - GAIA. The talk will showcase how the trinity of Artificial Intelligence, Big Data Platforms and Product Innovation are closely coupled.
Hema Raghavan is a Director of Engineering at Linkedin currently heading the team that builds AI solutions for fueling the professional social network’s growth. At LinkedIn she has led the “People You May Know” team through a multi-year transformation of near real time computation. Her team also builds the decision making engine for LinkedIn’s notification. She has several years of work experience in Applied AI settings in industry and academia solving hard problems and leading teams to significant impact. Prior to LinkedIn, she was a Research Staff Member at IBM T.J Watson building question answering systems for the DARPA GALE project. She started her career in the industry in Yahoo Labs as a research scientist where she transformed the retrieval engine for Search Advertising leading to significant revenue lifts. She received her PhD from the University of Massachusetts, Amherst in Information Retrieval and Machine Learning. She has several research publications in top-tier conferences like WWW, SIGIR, etc. She serves on the program committee for several conferences like WWW, KDD and others.
- 10:00 - 10:30
- 10:30 - 11:00
Machine Learning-Powered Search Ranking of Airbnb Experiences
Mihajlo Grbovic • Senior Machine Learning scientist, Airbnb
Airbnb Experiences are handcrafted activities designed and led by expert hosts that offer a unique taste of local scene and culture. Since the launch of Airbnb Experiences in November 2016 we managed to bring Experiences to more than 1,000 destinations worldwide, including unique places like Easter Island, Tasmania, and Iceland. As the marketplace grew, Search & Personalization became very important factors for the continued rapid growth and success of the marketplace. In this talk, I will describe the steps we took to develop a Machine Learning powered Search Ranking framework at different growth stages of the marketplace, from small to mid-size and large. Over the course of one and a half years, we ran more than 20 experiments iterating on the algorithm and were able to collectively improve bookings by more than 20%. In addition to driving overall bookings we focused on secondary objectives as well, such as high-quality experiences and new promising experiences. I will present details about the features we built out over time, such as user personalization, real-time model scoring, personalized navigation, ranking dashboards for explaining the predictions and optimization of secondary objectives.
Mihajlo Grbovic is a Principal Machine Learning Scientist at Airbnb. He holds a Ph.D in Machine Learning from Temple University in Philadelphia. He has more than 10 years of technical experience in applied Machine Learning, acting as a Science Lead in a portfolio of projects at Yahoo and now at Airbnb. During his time at Yahoo, he worked on integrating Machine Learning in various Yahoo Products, such as Yahoo Mail, Search, Tumblr & Ads. Some of his biggest accomplishments include building Machine Learning-powered Ad Targeting for Tumblr, being one of the key developers of Email Classification for Yahoo Mail and introducing the next generation of query-ad matching algorithms to Yahoo Search Ads. Dr. Grbovic joined Airbnb in 2016. His work focuses mostly on Search & Recommendation problems for Airbnb Homes, Experiences and Locations. Most recently, he worked on building out a Machine Learning-powered Search for Airbnb Experiences. Dr. Grbovic published more than 50 peer-reviewed publications at top Machine Learning and Web Science Conferences, and co-authored more than 10 patents. He was awarded the Best Paper Award at KDD 2018 Conference. His work was featured in Wall Street Journal, Scientific American, MIT Technology Review, Popular Science, Harvard Business Review and Market Watch.
- 11:00 - 11:30
BIG Public Data for Social Good: Opportunities and Technical Challenges
Moderator: Alex Jaimes
Rich data is continuously generated today, in real-time, by many sources, and by different kinds of sensors. At scale, the wealth of public data generated by people and devices, can be representative of events in the real world and used in a variety of positive ways in many industries- from urban planning and transportation, to healthcare, emergency response, and many others. In this panel we'll discuss the opportunities in leveraging such data, as well as the major technical challenges.
Alex Jaimes • SVP of AI & Data Science at Dataminr
Alex is SVP of AI & Data Science at Dataminr. Alex is a scientist, keynote speaker, and engineering executive with 15+ years of intl. experience in research (Columbia U., KAIST) and product impact at scale (Nauto, DigitalOcean, Yahoo, Telefónica, IDIAP-EPFL, Fuji Xerox, IBM, Siemens, and AT&T Bell Labs) in the USA, Japan, Chile, Switzerland, Spain, and South Korea. He has published 100+ technical papers in top-tier conferences and journals in diverse topics in AI and has been featured widely in the press (MIT Tech review, CNBC, Vice, TechCrunch, Yahoo! Finance, etc.). He has given 50+ invited talks all over the world, incl. talks at several O’Reilly conferences (AI, Strata, Velocity), the Deep Learning Summit (Re-Work), Tech Open Air, and Stanford, Cornell, & Columbia Universities. Alex was an early voice in Human-Centered AI (Computing)- his technical work focuses on machine learning, mixing qualitative and quantitative methods to gain insights on user behavior for product innovation. He’s been a professor (KAIST), and an executive at Yahoo, and at several startup companies. Alex holds a Ph.D. from Columbia University.
George Azzari • CTO, Atlas AI
Dave Thau • Data and Technology Global Lead Scientist, WWF
Elizabeth Goodman • Director of Design, 18F
Ed Chi • Principal Scientist, Google
- 11:30 - 12:20
- 12:30 - 14:00
Recommending and Searching: Research @ Spotify
Mounia Lalmas • Director of Research at Spotify and Head of Tech Research in Personalization
One of Spotify’s mission is “to match fans and artists in a personal and relevant way.” This talk will share some of the (research) work the Personalization mission is doing to achieve this, from using machine learning to metric validation, illustrated through examples within the context of home and search. It will draw on the so-called push and pull paradigms in information access, and show how these two are related in some contexts.
Mounia Lalmas is a Director of Research at Spotify, and the Head of Tech Research in Personalization. Mounia also holds an honorary professorship at University College London. Before that, she was a Director of Research at Yahoo, where she led a team of researchers working on advertising quality for Gemini, Yahoo native advertising platform. She also worked with various teams at Yahoo on topics related to user engagement in the context of news, search, and user generated content. Prior to this, she held a Microsoft Research/RAEng Research Chair at the School of Computing Science, University of Glasgow. Before that, she was Professor of Information Retrieval at the Department of Computer Science at Queen Mary, University of London. Her work focuses on studying user engagement in areas such as native advertising, digital media, social media, search, and now music. She has given numerous talks and tutorials on these and related topics. She is regularly a senior programme committee member at conferences such as WSDM, WWW and SIGIR. She was co-programme chair for SIGIR 2015 and WWW 2018. She is also the co-author of a book written as the outcome of her WWW 2013 tutorial on "measuring user engagement". She was also co-chair of BIG 2016.
- 14:00 - 14:50
Break and transition
- 14:50 - 15:00
Graph Neural Networks for Reasoning over Multimodal Content
Jure Leskovec • Chief Scientist at Pinterest and Associate Professor at Stanford University
Online applications are full diverse heterogeneous and multimodal content and it remains a challenge to build retrieval and recommendation systems that fuse heterogeneous data and allow for efficient large-scale learning and deployment. In this talk we describe a large-scale deep learning engine that we developed and deployed at Pinterest. We develop a data efficient Graph Convolutional Network (GCN) algorithm PinSage, which incorporates both graph structure as well as textual and visual node information. We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higher-quality recommendations than comparable computer vision and graph-based alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.
Jure Leskovec is Associate Professor of Computer Science at Stanford University, Chief Scientist at Pinterest, and investigator at Chan Zuckerberg Biohub. His research focuses on machine learning and data mining large social, information, and biological networks, their evolution, and the diffusion of information over them. Computation over massive data is at the heart of his research and has applications in computer science, social sciences, marketing, and biomedicine. This research has won several awards including a Lagrange Prize, Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, and numerous best paper awards. Leskovec received his bachelor's degree in computer science from University of Ljubljana, Slovenia, and his PhD in in machine learning from the Carnegie Mellon University and postdoctoral training at Cornell University.
- 15:00 - 15:30
- 15:30 - 16:00
Detecting Misconduct and Malfeasance within Financial Institutions
Panos Ipeirotis • Professor, New York University; Founder, Detectica
Misbehavior in the online world manifests itself in several forms, and often depends on the domain at hand. In the financial domain, firms have the regulatory obligation to self-monitor the activities of their employees (e.g., emails, chats, phone calls), in order to detect any form of misconduct. Some forms of misconduct are illegal activities (e.g., insider trading, bribery) while others are various forms of policy violations (e.g., following improper security practices, or inappropriate language use). Traditionally, and due to ease of understanding and implementation, firms deployed relatively archaic, rule-based systems for employee surveillance. Such rule-based systems generate a large number of false positive alerts and are hard to adapt in changing environments. More recent techniques aimed at solving the problem by simply transitioning from simple rule-based techniques to statistical machine learning approaches, trying to treat the problem of misconduct detection as a single-document classification problem. We discuss why approaches that try to identify misconduct within single documents are destined to fail, and we present a set of approaches that focus on actors, connections among actors, and on cases of misconduct. Furthermore, we highlight the importance of having a ``human in the loop'' approach, where humans are both guided and guide the system at the same time, in order to detect malfeasance faster, and also adapt to changing environments; we also show how humans can play an important role in detecting shortcomings of existing machine-learning-based malfeasance-detection systems, and how humans can be incentivized to detect such shortcomings. Our multifaceted approach has been used in real environments within both big, multinational and smaller financial institutions; we discuss the practical constraints and lessons learned by operating in such non-tech, highly regulated environments.
Panos Ipeirotis is a Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. He received his Ph.D. degree in Computer Science from Columbia University in 2004. He has received nine “Best Paper” awards and nominations, a CAREER award from the National Science Foundation, and is the recipient of the 2015 Lagrange Prize in Complex Systems, for his contributions in the field of social media, user-generated content, and crowdsourcing. For the last 3 years, he has been working on Detectica, which works with Wall Street institutions to deploy solutions that automate the process of detecting and investigating acts of malfeasance and misconduct.
- 16:00 - 16:30
Recommender Systems in a RTB Platform
Suju Rajan • SVP, Head of the Criteo AI Lab
In a Real Time Bidding ad platform, figuring out the right ad to show to an user needs to happen in <100ms. Specifically, we need to recommend a set of products from a combined catalog of ~13B products for more than 4 billion users. In this talk, I will introduce the recommender system in our RTB platform, the constraints under which it operates and speak to some of the approaches we have experimented with. I will also present some of the challenges we have faced and highlight the research work in solving these problems.
Suju Rajan is the SVP, Head of the Criteo AI Lab . At Criteo, her team works on all aspects of performance driven computational advertising, including, real-time bidding, large-scale recommendation systems, auction theory, reinforcement learning, online experimentation, metrics and scalable optimization methods. Prior to Criteo, she was the Director of the Personalization Sciences at Yahoo Research where her team worked on personalized recommendations for several Yahoo products.
- 16:30 - 17:00
Understanding News Using the Bloomberg Knowledge Graph
Edgar Meij • Team Lead in the Bloomberg Artificial Intelligence group
News and the global capital markets are inextricably linked. Recent developments in machine learning, knowledge graphs, and language technology have enabled increasingly intelligent ways to obtain a market advantage based on real-world events. This talk details how Bloomberg uses these technologies to quickly understand and respond to major world events in order to predict when or how breaking business news will move markets – and why.
Edgar Meij is a team lead in the Bloomberg Artificial Intelligence group, managing a broad range of projects that leverage knowledge graph technology to drive advanced financial insights. He holds a PhD in computer science from the University of Amsterdam and has an extensive track record in knowledge graphs, information retrieval, natural language processing, and machine learning. Before joining Bloomberg, Edgar worked at Yahoo Labs on all aspects related to entities in the context of web search.
- 17:00 - 17:30
- 17:30 - 17:40