CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining). CS246: Mining Massive Data Sets Winter 2020 §All solutions equivalent modulo the scale factor ¡ Additional constraint forces uniqueness: §# $ +# & + # ' = ) §Solution:# $ = * +, # & = * +, # ' =) + ¡ Gaussian elimination method works for small examples, but we need a better method for large web-size graphs ¡ We need a new formulation! The book, like the course, is designed at the undergraduate computer science level with no formal prerequisites. 1/27/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets 27 å å Î Î × = (;) (;) jNixij jNixijxj xi s sr r s ij…similarityofitemsiandj r xj…ratingofuserxonitemj N(i;x)…setitemswhichwereratedbyx andsimilartoi To support deeper explorations, most of the chapters are supplemented with further reading references. CS246 is the first part in a two part sequence CS246--CS341. CS246 will discuss methods and algorithms for mining massive data sets, while CS341: Project in Mining Massive Data Sets will be a project-focused advanced class with an unlimited access to a large MapReduce cluster. The importance of data to business decisions, strategy and behavior has proven unparalleled in recent years. Companies place true value on individuals who understand and manipulate large data sets to provide informative outcomes. The setting: ¡ Set of kchoices (arms) ¡ Each choice ais associated with unknown probability distribution P a supported in [0,1] ¡ We play the game for Trounds ¡ In each round t: § (1) We pick some arm a § (2)We obtain random sample X t from P a § Note reward is independent of previous draws ¡ Our goal is to maximize ∑ ¡ Problem: we don't know μ a!But every time we I am still playing CodingBat in Java, so the major solution posted here will be coded in Apache Spark Python, Python MrJob Hadoop and Ruby-Spark. The book is organised so a student can learn the fundamental ideas of probability from the first three chapters without reliance on calculus. 