Skip to content Skip to navigation

2016

Spring 2016

Jan. 22

No seminar

Abstract

TBA

Biography

TBA

Jan. 29

Monitoring Entire HPC Centers: the Sonar Project at LLNL, Todd Gamblin, Lawrence Livermore National Laboratory

Abstract

Increasingly, performance variability is an obstacle to understanding the throughput of large-scale supercomputers. Two runs of the same code, on the same system, may yield vastly different runtimes, depending on compiler flags, system noise, dynamic scheduling, and shared resources such as memory, filesystems and networks. Understanding an application's performance characteristics requires an increasingly large number of trial runs and measurements. Analyzing performance measurements from such runs is a data-intensive task. To address these issues, Livermore Computing is deploying Sonar, a "big data" cluster that will store and analyze performance data from LLNL’s entire HPC center. Sonar aggregates measurements from the network fabric, filesystem nodes, cluster nodes, applications. It will serve as a central data warehouse for measurements collected by tools. We will give an overview of the Sonar cluster and the tools we have integrated with it. We will also discuss some early techniques for analyzing performance data gathered from this system.

Biography

Todd is a computer scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. His research focuses on scalable tools for measuring, analyzing, and visualizing performance data from massively parallel applications. Todd is also involved with many production projects at LLNL. He works with Livermore Computing’s Development Environment Group to build tools that allow users to deploy, run, debug, and optimize their software for machines with million-way concurrency. Todd received his Ph.D. in computer science from the University of North Carolina at Chapel Hill in 2009. His dissertation investigated parallel methods for compressing and sampling performance measurements from hundreds of thousands of concurrent processors. He received his B.A. in Computer Science and Japanese from Williams College in 2002. He has also worked as a software developer in Tokyo and held research internships at the University of Tokyo and IBM Research.

Feb. 5

MAGIC: bringing lawn irrigation into the IoT movement, Daniel Winkler, UC Merced

Abstract

Lawns make up the largest irrigated crop by surface area in North America, and carries with it a demand for over 9 billion gallons of freshwater each day. Despite recent developments in irrigation control and sprinkler technology, state-of-the-art irrigation systems do nothing to compensate for areas of turf with heterogeneous water needs. In this work, we overcome the physical limitations of the traditional irrigation system with the development of a sprinkler node that can sense the local soil moisture, communicate wirelessly, and actuate its own sprinkler based on a centrally-computed schedule. A model is then developed to compute moisture movement from runoff, absorption, and diffusion. Integrated with an optimization framework, optimal valve scheduling can be found for each node in the space. In a turf area covering over 10,000 square feet, two separate deployments spanning a total of 7 weeks show that MAGIC can reduce water consumption by 23.4% over traditional campus scheduling, and by 12.3% over state-of-the-art evapotranspiration systems, while substantially improving conditions for plant health. In addition to environmental, social, and health benefits, MAGIC is shown to return its investment in 16-18 months based on water consumption alone.

Biography

Daniel Winkler received his BS in Computer Science Engineering with honors from UC Merced in 2013. An ACM member, he since has been pursuing his PhD under advisement of Dr. Alberto Cerpa in UC Merced's ANDES Lab. Although his current research focuses on intelligent design and management of turf irrigation systems through the use of embedded devices, Daniel also has a growing interest in general resource management applications.

Feb. 10

Perceiving and Interacting with Images, Ming-Ming Cheng, Nankai University

Abstract

In this talk, I will introduce our latest research in image scene understanding and interactive technologies. Our first line of research aims at rapid image scene understanding based on visual attention mechanism (IEEE TPAMI 2015, IEEE CVPR 2014 Oral). This is an area where people often have diverse feelings: some researchers believe that it is a principled research direction, while others might doubt its robustness. Instead of specific algorithm design, I would like to highlight how these algorithms can be robustly used in various applications, including image composition, photo montage, image retrieval, object detection, semantic segmentation, and even deep learning. Our second line of research aims at intelligent image manipulation mechanism. We try to explore smart image manipulation techniques for easily obtaining annotated data during users’ nature interaction with the real world (ACM TOG 2014, ACM TOG 2015), which is partially motivated by the growing requirement of high quality labeled training data (expensive to be collected) for scene understanding.

Biography

Ming-Ming Cheng is an associate professor with Department of Computer Science, Nankai University, China. He received his PhD degree from Tsinghua University, China, in 2012 under supervise of Prof. Shi-Min Hu, and working closely with Prof. Niloy Mitra. Then he spent 2 years as research fellow in UK, working with Prof. Philip Torr in Oxford. Dr. Cheng’s research primarily focus on algorithmic issues in image scene understanding, including image segmentation, salient object detection, image editing, objectness proposal, etc. He has published several highly cited papers in ACM TOG, IEEE TPAMI etc. See also: http://mmcheng.net

Feb. 12

How to Get Your CVPR Paper Rejected?, Ming-Hsuan Yang, UC Merced

Abstract

In this talk, I will share my experience in how to publish papers in top conferences and journals. In particular, I will discuss the pitfalls and common mistakes of submitted papers from the perspectives of area chairs and associate editors.

Biography

Ming-Hsuan Yang is an associate professor in Electrical Engineering and Computer Science at University of California, Merced. He received the PhD degree in Computer Science from the University of Illinois at Urbana-Champaign in 2000. He serves as an area chair for several conferences including IEEE Conference on Computer Vision and Pattern Recognition, IEEE International Conference on Computer Vision, European Conference on Computer Vision, Asian Conference on Computer, IEEE International Conference on Pattern Recognition, and AAAI National Conference on Artificial Intelligence. He serves as a program co-chair for IEEE International Conference on Computer Vision in 2019, Asian Conference on Computer Vision in 2014 and general co-chair in 2016. He serves as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (2007 to 2011), International Journal of Computer Vision, Computer Vision and Image Understanding, Image and Vision Computing and Journal of Artificial Intelligence Research. Yang received the Google Faculty Award in 2009, and the Faculty Early Career Development (CAREER) award from the National Science Foundation in 2012.

Feb. 19

Vertical Partitioning for Query Processing over Raw Data Weijie Zhao UC Merced

Abstract

Traditional databases are not equipped with the adequate functionality to handle the volume and variety of ``Big Data''. Strict schema definition and data loading are prerequisites even for the most primitive query session. Raw data processing has been proposed as a schema-on-demand alternative that provides instant access to the data. When loading is an option, it is driven exclusively by the current-running query, resulting in sub-optimal performance across a query workload. In this talk, we investigate the problem of workload-driven raw data processing with partial loading. We model loading as fully-replicated binary vertical partitioning. We provide a linear mixed integer programming optimization formulation that we prove to be NP-hard. We design a two-stage heuristic that comes within close range of the optimal solution in a fraction of the time. We extend the optimization formulation and the heuristic to pipelined raw data processing, scenario in which data access and extraction are executed concurrently. We provide three case-studies over real data formats that confirm the accuracy of the model when implemented in a state-of-the-art pipelined operator for raw data processing.

Biography

Weijie Zhao is a Ph.D. student in the EECS department of UC Merced, working with Prof. Florin Rusu. He received his B.S. from East China Normal University, China. His research interest includes scientific data processing and database theories.

Feb. 26

Topological methods for motion planning and trajectory analysis, Florian Pokorny, UC Berkeley

Abstract

One key open problem in robotics is the question of how a robot can reason about the space of possible trajectories. In particular, how can a robot determine not just a single shortest path between two points, but develop an understanding of continuous deformation classes (homotopy classes) of trajectories in configuration space? Over the last 5 years, computational geometry techniques for computing topological information from data have advanced dramatically. We will discuss our recent work on using persistent homology to determine a collection of homotopy inequivalent trajectories in robot configuration spaces and will present recent work on topologically clustering large databases of trajectories into consistent clusters with applications to the learning of robot control policies and motion primitives.

Biography

Florian Pokorny received a BSc (Honours) Mathematics from the University of Edinburgh, UK in 2005. He then obtained a Master of Advanced Studies in Mathematics (Part III of the Mathematical Tripos) from the University of Cambridge, UK before completing his PhD in pure mathematics under supervision of Prof. Michael Singer at the University of Edinburgh in 2011 in the field of differential geometry. Following his PhD, he has refocused his research on Robotics and Machine learning problems, and particularly topological methods, robotic manipulation and motion planning and joined the Center for Autonomous Systems at KTH Royal Institute of Technology, Stockholm, Sweden working with Prof. Danica Kragic. Since May 2015, he has joined the AMPLab & Automation Lab at UC Berkeley conducting postdoctoral research with Prof. Ken Goldberg and his group.

March 4

EECS/CITRIS: Robot Intelligence in a Cloud-Connected World, James Kuffner, Toyota Research Institute

Abstract

Robotics is currently undergoing a dramatic transformation. High-performance networking and cloud computing has radically transformed how individuals and businesses manage data, and is poised to disrupt the state-of-the-art in the development of intelligent machines. This talk explores the long-term prospects for the future evolution of robot intelligence based on search, distributed computing, and big data. Ongoing research on autonomous cars and humanoid robots will be discussed in the context of how cloud-connectivity will enable future robotic systems to be more capable and useful.

Biography

James Kuffner is a Roboticist at the Toyota Research Institute and an Adjunct Associate Professor at the Robotics Institute, Carnegie Mellon University. He received a Ph.D. from the Stanford University Dept. of Computer Science Robotics Laboratory in 1999, and was a Japan Society for the Promotion of Science (JSPS) Postdoctoral Research Fellow at the University of Tokyo working on software and planning algorithms for humanoid robots. He joined the faculty at Carnegie Mellon University's Robotics Institute in 2002. He has published over 125 technical papers, holds more than 40 patents, and received the Okawa Foundation Award for Young Researchers in 2007. In 2009, James joined Google as part of the initial engineering team building Google’s self-driving car. He is known for introducing the term "Cloud Robotics" in 2010 to describe how network-connected robots could take advantage of distributed computation and data stored in the cloud. In 2014, he was appointed head of Google’s Robotics division, which he co-founded along with Andy Rubin. In 2016, he joined the newly created Toyota Research Institute as the Area Lead for Cloud Computing.

March 11

EECS/CITRIS: We are all makers, Dale Doughtery, Maker Media

Abstract

Maker Media is a global platform for connecting Makers with each other, with products and services, and with our partners. Through media, events and ecommerce, Maker Media serves a growing community of Makers who bring a DIY mindset to technology. Whether as hobbyists or professionals, Makers are creative, resourceful and curious, developing projects that demonstrate how they can interact with the world around them. The launch of Make: magazine in 2005, followed by Maker Faire in 2006, jumpstarted a worldwide Maker Movement, which is transforming innovation, culture and education. Located in San Francisco, CA, Maker Media is the publisher of Make: magazine and the producer of Maker Faire. It also develops “getting started” kits and books that are sold in its Maker Shed store as well as in retail channels.

Biography

Dale Dougherty is the founder and Executive Chairmen of Maker Media Inc. In 2005, Maker Media launched Make Magazine and Maker Faire, which held its first events in San Francisco in 2006. He has developed a maker ecosystem, serving the needs of makers as they seek out product support, startup advice, and funding avenues. His idea for Make Magazine came from his experiences with the Hacks Books and then recognized that hackers were playing with hardware and more broadly, they were looking at how to hack the world, not just computers.

March 18

Dot-Product Join: An Array-Relation Join Operator for Big Model Analytics, Chengjie Qin, UC Merced

Abstract

Big Data analytics has been approached exclusively from a data-parallel perspective, where data are partitioned to multiple workers – threads or separate servers – and model training is executed concurrently over different partitions, under various synchronization schemes that guarantee speedup and/or convergence. The dual – Big Model – problem that, surprisingly, has received no attention in database analytics, is how to manage models with millions if not billions of parameters that do not fit in memory. In this talk, I will introduce the first secondary storage array-relation dot-product join operator between a set of sparse arrays and a dense relation. The dot-product join operator incurs minimal overhead when sufficient memory is available and gracefully degrades when memory resources become scarce. Overall, the dot-product join operator achieves an order of magnitude reduction in execution time for Big Model analytics over alternative in-database solutions.

Biography

Chengjie Qin is a PhD candidate in EECS department advised by Florin Rusu. His research focuses on supporting large-scale data analytics in databases. He received his Bachelor of Science degree in Computer Science in 2011 from Fuzhou University, China.

March 25

No seminar (Chesar Chavez Holiday)

Abstract

TBA

Biography

TBA

April1

EECS/CITRIS: Platypus Cooperative Robotic Boats: Learning to Balance R&D and Productization, Paul Scerri, Platypus LLC

Abstract

After nearly 20 years in academia featuring several papers with "real-world" in the title, I recently left academia to found a company that commercializes one of our robots. The small, autonomous watercraft have the ability to dramatically change how water data is collected. In this talk, I will describe some of the challenges and issues I've encountered as we take cutting edge technology from this community and put it in the hands of end users. In the course of the effort, I've learned that research and commercial success are not always compatible, but that some of the same creative skills and ability to deal with failure are essential. Over time, we've found a balance between research and commercialization that helps both move forward in parallel, as well as finding different business models that work for technology on the edge of research.

Biography

Dr. Scerri is co-Founder President of Platypus, LLC. and the Director of the Perceptronics Solutions Robotics Lab. Prior to this he was an Associate Research Professor Carnegie Mellon University Robotics Institute. The focus of his research while at CMU was multi-agent and multi-robot systems. In his current roles in industry, the emphasis is on taking state-of-the-art research and applying it to real problems, with a specific focus on making the collection of important data about an environment less expensive, more reliable and more accessible.

April 7

Making Information Retrieval Easier: Directing Exploratory Search over 50 Million Documents by Interactive Intent Modeling, Jaakko Peltonen, University of Tampere and Aalto University

Abstract

Researchers must navigate big data. Current scientific knowledge includes 50 million published articles. How can a system help a researcher find relevant documents in their field? We introduce SciNet, an interactive search system that anticipates the user's search intents by estimating them from the user's interaction with the interface. The estimated intents are visualized on an intent radar, a radial layout that organizes potential intents as directions in the information space. The system assists users to direct their search by allowing feedback to be targeted on keywords representing the potential intents. Users can provide feedback by moving the keywords on the intent radar. The system then learns and visualizes improved estimates and corresponding documents. The resulting user models are explicit open user models curated by the user during the interactive information seeking. SciNet has been shown to significantly improve users' task performance and the quality of retrieved information without compromising task execution time. We also show how user models learned in SciNet can be used to help cold-start recommendation in another system, the CoMeT talk management system, by cross-system user model transfer across the systems.

Biography

Jaakko Peltonen is an Associate Professor of statistics (data analysis) at the School of Information Sciences, University of Tampere, Finland where he leads the Statistical Machine Learning and Exploratory Data Analysis group; he is also currently an academy research fellow at Aalto University, Finland, where he is a PI of the Probabilistic Machine Learning research group. He is an associate editor of Neural Processing Letters and an editorial board member of Heliyon. He has served in organizing committees of seven international conferences and one international summer school, has served in program committees of 31 international conferences/workshops and has performed referee duties for numerous international journals and conferences. He is an expert in statistical machine learning methods for exploratory data analysis, visualization of data, and learning from multiple sources.

April 8

Complex-valued Linear Layers for Deep Neural Network-based Acoustic Models for Speech Recognition, Zak Shafran, Google

Abstract

In recent years, deep neural networks have proven to be highly effective for acoustic modeling in speech recognition. However, the input to the acoustic model consists of hand-crafted features, namely, logarithm of the energy of the Mel-weighted filter bank (log-mel). Mel-weighted filters were developed about 4 decades ago and were inspired by human perception. Apart from the possibility that they may not be optimal features for automatic speech recognition, the log-mel features strip information from the speech signal that may be useful especially for jointly modeling de-reverberation and beam-forming within the neural networks. As an alternative to log-mel features, we investigate using complex-valued frequency transform of the speech frames directly as inputs to the acoustic models and to utilize the complex-valued inputs we employ complex-valued linear layers whose parameters are learned jointly with the rest of the acoustic model. In this talk, we will discuss the properties of these complex-valued layers and demonstrate their advantage on a large speech recognition task.

Biography

Izhak Shafran is a speech and machine learning researcher, who has been working on acoustic modeling for speech recognition. Before joining Google, he was an Associate Professor and a member of the Center for Spoken Language Processing at OHSU, where he also focused on medical application of spoken language technology. He graduated from University of Washington in Seattle in 2001 and subsequently worked at AT\&T Research Labs at Florham Park with the speech algorithms group. In summer of 2006, he was a visiting professor at University of Paris-South, working at LIMSI. Subsequently, he was a research faculty at the Center for Language and Speech Processing (CLSP) in Johns Hopkins University. He received an NIH Career Development Award in 2010.

April 15

Online Aggregation On Raw Data, Yu Cheng, UC Merced

Abstract

Traditional in-situ data processing systems support immediate SQL querying over raw files but their performance across a query workload is limited, though, by the speed of full scans, tokenizing, and parsing of the entire file. Online aggregation (OLA) has been introduced as an efficient method for data exploration that identifies uninteresting patterns faster by continuously estimating the result of a computation during the actual processing---the computation can be stopped as early as the estimate is accurate enough to be deemed uninteresting. However, building an efficient OLA system has a high upfront cost of randomly shuffling and loading the data. In this talk, I introduce OLA-RAW, a novel system for in-situ processing over raw files that integrates data loading and online aggregation seamlessly while preserving their advantages---generating accurate estimates as early as possible and having zero time-to-query. We design an accuracy-driven bi-level sampling process over raw files and define and analyze corresponding estimators. The samples are extracted and loaded adaptively in random order based on the current system resource utilization. We implement OLA-RAW starting from a state-of-the-art in-situ data processing system and evaluate its performance across a variety of datasets and file formats. Our results show that OLA-RAW maximizes resource utilization across a query workload and dynamically chooses the optimal sampling and loading plan that minimizes each query's execution time while guaranteeing the required accuracy. The end result is a focused data exploration process that avoids unnecessary work and discards uninteresting data.

Biography

Yu Cheng is a computer science PhD candidate at UC Merced advised by Prof. Florin Rusu. His research focuses on in-situ data processing. He received his BS in Computer Science in 2005 from Wuhan University of Technology, Wuhan, China, and MS in 2008 in Computer Engineer from Huazhong University of SciTech , Wuhan, China. Since 2011, he has been working towards his PhD degree in large-scale data processing. He has published several research papers in the areas of Database system(in SIGMOD, TODS, SSDBM etc). He is a recipient of the Graduate Dean's Dissertation Fellowship 2016 and several fellowship awards during his Ph.D. study at UC Merced.

April 22

Exploring New Approaches for Mechanical Fruit Harvesting via Model-based Design, Stavros Vougioukas, UC Davis

Abstract

Mechanizing the hand harvesting of fresh market crops constitutes one of the biggest challenges to the sustainability of the U.S. fruit and vegetable industry. Depending on the crop, labor contributes up to 60% of the variable production cost, and recent labor shortages have led to loss of production and reduction of planted acreage in several crops. Innovation is desperately needed in the design of mass – shake-and-catch - harvesters, and selective fruit-picking robotic harvesters. This seminar will present the challenges related to mechanized harvesting and how concepts and tools from model-based design and robotics can be used to provide solutions. Regarding robotic fruit harvesters, most developed prototypes utilize multiple-degree-of-freedom arms, often kinematically redundant. The hypothesis is that as branches constrain fruit reachability, redundancy is necessary to navigate through branches and reach fruits inside the canopy. Modern commercial orchards increasingly adopt trees of SNAP architectures (Simple, Narrow, Accessible, and Productive). In this seminar results will be presented from a recent simulation study on linear fruit reachability (LFR) on high-density, trellised pear trees, when linear only motion was used to reach the fruits. Results based on digitized geometric tree models and fruit locations showed that 91.1% of the fruits were reachable after three “harvesting passes” with proper approach angles. This implies that for trees of SNAP-type architectures fruit reachability may not require complex and expensive arms with many degrees of freedom. For shake-and-catch harvesting, results based on a physics-based simulation of falling fruits will be shown, which suggest that when fruit-intercepting rods are inserted optimally into the tree canopies during shaking, the percentage of fruits hitting branches can be lowered by more than 50%. Such designs could enable mass - harvesting with low fruit damage, and, hence, provide mechanized harvesting solutions for some crops.

Biography

Dr. Stavros Vougioukas is an Assistant Professor of Biological and Agricultural Engineering at the University of California, Davis. He joined the Department in 2012 and his research group focuses on the development of robotic and automation systems for agricultural applications, with emphasis on mechanized harvesting of specialty crops. Dr. Vougioukas earned his Diploma in Electrical Engineering (1989) at Aristotle University, Greece. He undertook graduate studies in the US under a Fulbright Fellowship. He completed his MS (1991) at SUNY Buffalo and PhD (1995) at Rensselaer Polytechnic Institute, in Electrical, Computers and Systems engineering. His PhD thesis addressed force-guided assembly and robotic fine motion planning. He was a post-doctoral researcher for one year at the University of Parma, Italy. After his army service he became faculty at Aristotle University, Greece, where he worked on agricultural automation for 10 years.

April 29

No seminar

Abstract

TBA

Biography

TBA

May 6
 

Fall 2016

Aug. 26

Combining Virtual Reality, Psychology, Theater and Learning Sciences for Training and Assessment, Arjun Nagendran, University of Central Florida & Mursion

Abstract

Technological advancements over the last decade have opened up a plethora of possibilities for ``blue-skies'' research. Today, we live in a world where multi-disciplinary teams collaborate to create novel platforms that enhance every aspect of our life. We now have the foundations to inject our cross-disciplinary ideas across traditional fields of study. Identifying voids and applying our expertise across domains results in powerful products that can have a significant impact in society. This talk will be centered around an application that leverages the ever-diminishing boundaries between virtual reality, psychology and the learning sciences. The concepts of ``Avatars'' and ``Inhabiting'' will be introduced, following which real-world applications of their use will be demonstrated. In particular, the talk will focus on how human-assisted virtual avatars can be used for the purposes of training and assessment across several fields such as healthcare, counselling, hospitality and education. The effectiveness of these systems in riding the upcoming wave of Virtual Reality devices including the Oculus RIFT, Gear VR and the Microsoft HoloLens will be discussed. The talk will conclude with potential futuristic applications of the concept of ``inhabiting''.

Biography

Arjun Nagendran is the Co-Founder and Chief Technology Officer at Mursion, Inc., a San-Francisco-based startup specializing in the application of virtual reality technology for the purposes of training and assessment. He completed his Ph.D. in Robotics from the University of Manchester, UK, specializing in landing mechanisms for Unmanned Air Vehicles. Prior to Mursion, he worked for several years as an academic researcher including leading the ground vehicle systems for Team Tumbleweed, which was one of the six-finalists at the Ministry of Defense (UK) Grand Challenge. Arjun's research interests include coupling psychology and learning sciences with technological advancements in remote-operation, virtual reality, and control system theory to create high-impact applications. During his academic career, he has served as a committee member and reviewer for several conferences and journals including the International Conference on Intelligent Robots and Systems (IROS) and IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

Sept. 2

A Parallel Sorting Algorithm for 130K CPU Cores, Bin Dong, Lawrence Berkeley National Lab

Abstract

Parallel sorting is a fundamental algorithm in computer science and it becomes apparently more important in the big data era. Utilizing supercomputers to perform sorting is attractive since their large amount of CPU cores have the potential power to sort terabyte or even exabyte data per minute. However, developing a parallel sorting algorithm that is efficient and scalable for a supercomputer is a challenge because of the load imbalance caused by the data skewness and also the complex communication patterns caused by multi-core architectures. In this talk, I present our experience in developing and scaling a parallel sorting algorithm named SDS-Sort on a 2.57 petaflops/sec supercomputer.

Biography

Bin Dong is currently a research scientist in the Scientific Data Management group at LBNL. His research interests are in scalable scientific data management, parallel storage systems, and parallel computing. More specifically, he is exploring new algorithms and data structures for storing, organizing, sorting, indexing, searching, and analyzing big scientific data (mostly multi-dimensional arrays) with supercomputers. Bin Dong earned his Ph.D. in Computer Science and Technology from Beihang University, China in 2013. Then, he joined the Scientific Data Management group at LBNL as a postdoc until 2016.

Sept. 9

Robot Motion Planning Considering Multiple Costs and Multiple Task Specifications, Shams Feyzabadi, UC Merced

Abstract

With the recent dramatic growth in robotic commercial applications in all fields, expectations from robotic systems have escalated as well. For example, robots are tasked with increasingly more complex missions featuring multiple costs that should be accounted for. In addition, with robots operating for extended periods of times in unstructured environments, it is often convenient to task the robot with multiple objectives at once and let the system determine a control strategy jointly considering all of them. In this talk we propose a planner to solve sequential stochastic decision making problems where robots are subject to multiple cost functions and are tasked to complete more than one goal specified using a subset of linear temporal logic operators. Each subgoal is associated with a desired satisfaction probability that will be met in expectation by the policy executed by the controller. The planner builds upon the theory of constrained Markov Decision Processes and on techniques coming from the realm of formal verification. Our method is validated both in simulation and in outdoor tasks in which the robot autonomously traveled more than 7.5km.

Biography

Shams Feyzabadi is currently a PhD candidate at UC Merced working under the supervision of Prof. Carpin. His field of interest is mobile robotics and more specifically he focuses on motion planning considering multiple cost functions in non-deterministic environments. He received his M.Sc. from Jacobs University Bremen in Germany in 2010 and his B.Sc. from Iran University of Science and Technology in 2007.

Sept. 16

Building the Enterprise Fabric for Big Data with Vertica and Spark Integration, Jeff LeFevre, HPE Vertica

Abstract

Enterprise customers increasingly require greater flexibility in the way they access and process their Big Data. Their needs include both advanced analytics and access to diverse data sources. However, they also require robust, enterprise-class data management for their mission-critical data. This work describes our initial efforts toward a solution that satisfies the above requirements by integrating the HPE Vertica enterprise database with Apache Spark's open source computation engine. In this talk I will focus on our methods for fast and reliable bulk data transfers between Vertica and Spark with exactly once semantics. I will first describe the architectures of both systems, the challenges to guarantee exactly once semantics for data transfers, and the interesting tradeoffs among these challenges for our design. Specifically, our design enables parallel data transfer tasks that can tolerate task failures, restarts, and speculative execution; we show how this can be done without an external scheduler coordinating the reliable transfer between the two independent systems under these conditions. We believe this approach generalizes to the class of MapReduce systems. Lastly I will present performance results across several system configurations and datasets. Our integration provides a fabric on which our customers can get the best of both worlds: robust enterprise-class data management and analytics provided by Vertica, and flexibility in accessing and processing Big Data with Vertica and Spark.

Biography

Jeff LeFevre is a Software Engineer with HPE Vertica Big Data R&D in Sunnyvale, CA where he focuses on the integration with Spark. He joined Vertica in 2014 after completing his PhD from the Database Group at UC Santa Cruz. His dissertation focuses on physical design tuning for data management systems in the cloud. Prior to that he received an MS from the Systems Group at UC San Diego, and completed internships at Teradata, Google, and NEC Labs.

Sep 23

Enabling Analytics at AWS, Mehul Shah, Amazon Web Services

Abstract

With the ubiquity of data sources and cheap storage, today's enterprises want to collect and store a wide variety of data, even before they know what to do with it. Examples include IOT streams, application monitoring logs, point-of-sale transactions, ad impressions, mobile events, and more. This data is typically a mix of structured and unstructured, streaming and static, with varying degree of quality. Given this variety and the increasing need to be data-driven, customers want a choice of tools to leverage this data for business advantage. Towards this end, Amazon Web Services (AWS) offers a variety of fully-managed data services that can be easily composed given its service-oriented architecture. In this talk, we provide an overview of the breadth of data services available on AWS: storage, OLTP, data warehouse, and streaming. We give examples of how customers leverage and compose these to handle their big data use cases from traditional BI and analytics to real-time processing and prediction. Finally, we touch on some lessons from operating such services at scale.

Biography

Mehul is a software development manager in the Big Data division of AWS, contributing to the Redshift and Data Pipeline services. From 2011-2014, he was co-founder and CEO of Amiato, an ETL cloud service. Prior to that, he was a research scientist at HP Labs where his work spanned large-scale data management, distributed systems, and energy-efficient computing. He received his PhD in databases from UC Berkeley (2004), and MEng (1997) and BS in computer science and physics (1996) from MIT. He has received several awards including the NSDI 2016 Test of Time Award and SOSP 2007 best paper. In his spare time, he serves on the SortBenchmark committee.

Sept. 30

MacroBase: Analytic Monitoring for the Internet of Things, Peter Bailis, Stanford University

Abstract

An increasing proportion of data today is generated by automated processes, sensors, and devices—collectively, the Internet of Things (IoT). IoT applications’ rising data volume, demands for time-sensitive analysis, and heterogeneity exacerbate the challenge of identifying and highlighting important trends in IoT deployments. In response, we present MacroBase, a data analytics engine that performs statistically-informed analytic monitoring of IoT data streams by identifying deviations within streams and generating potential explanations for each. MacroBase is the first analytics engine to combine streaming outlier detection and streaming explanation operators, allowing cross-layer optimizations that deliver order-of-magnitude speedups over existing, primarily non-streaming alternatives. As a result, MacroBase can deliver accurate results at speeds of up to 2M events per second per query on a single core. MacroBase has delivered meaningful analytic monitoring results in production, including an IoT company monitoring hundreds of thousands of vehicles.

Biography

Peter Bailis is an assistant professor of Computer Science at Stanford University. Peter's research in the Future Data Systems group (http://futuredata.stanford.edu/) focuses on the design and implementation of next-generation, post-database data-intensive systems. His work spans large-scale data management, distributed protocol design, and architectures for high-volume complex decision support. He is the recipient of an NSF Graduate Research Fellowship, a Berkeley Fellowship for Graduate Study, best-of-conference citations for research appearing in both SIGMOD and VLDB, and the CRA Outstanding Undergraduate Researcher Award. He received a Ph.D. from UC Berkeley in 2015 and an A.B. from Harvard College in 2011, both in Computer Science.

Oct. 7

Apache SystemML: Declarative Machine Learning at Scale, Niketan Pansare, IBM Almaden Research Center

Abstract

Scalable machine learning is ubiquitous in virtually every industry ranging from insurance, manufacturing, finance, and health sciences. Expressing and and running machine learning algorithms for varying data characteristics and at scale is challenging. In this talk, we will discuss our experience in building Apache SystemML, peak at challenging optimization and implementation strategies in exploiting data-parallel platforms such as MapReduce and Spark, and provide performance and scalability insights.

Biography

Niketan Pansare works at IBM Research Almaden, on advanced information management systems that include analytics, distributed data processing platforms, hardware acceleration, as well as the application of it in mobile and cloud. At a high level, his research involves developing statistical models and building systems for analyzing large-scale and heterogenous data. Prior to joining IBM, Niketan was a PhD student at Rice University where he was advised by Dr. Chris Jermaine. His PhD thesis is titled "Large-Scale Online Aggregation Via Distributed Systems."

Oct. 14

Working around the CAP Theorem Vijayshankar Raman, IBM Almaden Research Center

Abstract

CAP theorem is a painful reality that all distributed systems have to deal with -- they must either assume a tightly coupled setting, or have inconsistent global state. But real-world applications never have tightly coupled components. Instead, they rely on an elaborate compensation logic that is built into the application program, usually outside the boundary of a database transaction. We present a method to achieve serializable consistency in a loosely coupled setting, by allowing such compensation logic to be part of the transaction itself, and serializing each transaction to a point_after its commit.

Biography

Vijayshankar Raman is a Research Staff Member in the database group at the IBM Almaden Research Center, working on hybrid transaction and analytic processing.

Oct. 21

Bridging the I/O Gap between Spark and Scientific Data Formats on Supercomputer, Jialin Liu, Lawrence Berkeley National Lab

Abstract

Spark has been tremendously powerful in performing Big Data analytics in distributed data centers. However, using the Spark framework on HPC systems to analyze large-scale scientific data has several challenges. For instance, the parallel file systems are shared among all computing nodes in contrast to shared-nothing architectures. Another challenge is in accessing data stored in scientific data formats, such as HDF5 and NetCDF, that are not natively supported in Spark. Our study focuses on improving I/O performance of Spark on HPC systems for reading large scientific data arrays, e.g., HDF5/netCDF. We select several scientific use cases to drive the design of an efficient parallel I/O API for Spark on supercomputer, called H5Spark. We optimize the I/O performance, taking into account the Lustre file system striping. We evaluate the performance of H5Spark on Cori, a Cray XC40 system, located at NERSC/LBNL and compared the I/O performance with MPI and NASA’s SciSpark. The developed H5Spark has enabled the success of the largest PCA run on supercomputer and been used by various national labs. It’s now endorsed by HDF company for further development.

Biography

Jialin Liu is a research engineer in Lawrence Berkeley National Lab. He joined LBNL shortly after receiving his Ph. D. in computer science from Texas Tech University in 2015. Before that, he got his B. S. in computer science in 2011. His research interests are parallel I/O and scientific data management (typically millions of files and TBs of data). Recently, he has been exploring object-based big science data management and I/O formats design for Astronomy dataset.

Nov. 4

Nonconvex Optimization by Complexity Progression, Hossein Mobahi, Google Research

Abstract

A large body of machine learning problems require minimization of a nonconvex objective function. For some of these problems, local optimization techniques (such as gradient descent, Newton method, etc.) may converge slowly or get stuck in suboptimal solutions. In this talk I describe an alternative approach for tackling nonconvex optimization. The idea is to start from a simpler optimization problem, and solving that. Then we progressively transform that objective function to the actual one, while tracking the path of the minimizer. While this general idea has been used for a long while, its construction has been quite heuristic. Specifically, there is no principled and theoretically justified answer about how to choose the initial (simplified) problem and how to transform that to the actual problem. The success of this technique drastically depends on these choices. In this talk I argue that Weierstrass transform (Gaussian convolution) is a sensible choice for creating simpler problems, and support this claim mathematically. I present the application of this method to problems in deep learning and image registration.

Biography

Hossein Mobahi (http://people.csail.mit.edu/hmobahi) is a research scientist at Google, Mountain View. His research interests include machine learning, optimization, computer vision, and especially the intersection of the three. Prior to Google, he was a postdoctoral researcher in the Computer Science and Artificial Intelligence Lab. (CSAIL) at MIT. He obtained his PhD from the University of Illinois at Urbana-Champaign (UIUC) in 2012.

Nov. 18

Bootstrap and Uncertainty Propagation: New Theory and Techniques in Approximate Query Processing, Kai Zeng, Microsoft Research

Abstract

Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)--an area of research that is now made more critical by the need for timely and cost-effective analytics over “Big Data”. The sheer amount of data and the complexity of analytics pose new challenges to sampling-based AQP, calling for innovations in various tech aspects. These include: how to estimate the errors of general SQL queries with ad-hoc user defined functions if computed on samples? How to better present the approximate query results to the user? How to build the database engines to be more suitable for approximate query processing? In this talk, I will present a series of my work which answers the important questions mentioned above. We will see: (1) An automated statistics technique--bootstrap can be integrated with relational algebra theory and database systems, and provides accuracy estimation support for general OLAP queries. (2) With bootstrap error estimation technique in combination with a novel uncertainty propagation theory, OLAP query processing can shift to an incremental execution engine, which provides a smooth trade-off between query accuracy and latency, and fulfills a full spectrum of user requirements from approximate but timely query execution to a more traditional accurate query execution.

Biography

Kai Zeng is a senior scientist at Cloud and Information Service Lab, Microsoft. His research interest lies in large scale data intensive systems. He received his PhD in Database from UCLA in 2014. He used to work at AMPLab UC Berkeley as a postdoc researcher. He has won several awards, including SIGMOD 2012 best paper award and SIGMOD 2014 best demo award.