Session 1 Overview & Introduction Course Objectives Data Information Decisions • Find, extract, organize and describe data • Quantify possible relationships and uncertainty • Develop spreadsheet models to analyze data and evaluate risk • Optimize decisions and justify a course of action Data Analysis Shows Up Everywhere • Behavioral targeting and customer segmentation • Donor development • Constructing a portfolio of investments • Demand forecasting • Planning and allocating resources A General Decision Making Framework 1. Define the business problem 2. Collect and organize the relevant data 3. Examine the relationship among different factors and the extent of uncertainty 4. Develop an evaluation model 5. Evaluate potential solutions 6. Recommend a course of action Example 1: Political Advertising Expenditures in 2008 Election 2012 Election “The 2012 presidential election is shaping up to be a multibillion-dollar contest. President Obama’s re-election committee is expected to raise at least $1 billion, and Republicans have high hopes that their nominee will reach the 10-figure level as well… ‘He [President Obama] would be a Top 100 advertiser,’ says Brad Adgate, the senior vice president for research at Horizon Media, a New York ad agency. ‘You know, it's what Home Depot spends, about a billion dollars a year.’” NPR, 2/18/2011 2010 Electoral Votes 538 votes, 270 to win Advertising Decisions How Do We Win the Voter? Example 2: Performing Arts Centers • Decisions to be made: – – – – – Performance schedules Pricing subscriptions and seating tiers Ticket bundling Fundraising campaigns Advertising mix How Do We Build the Audience? Example 3: Salesforce Management • How many (and which) employees need to be working at a particular time? • Which tasks should be completed by which employees? • Compensation packages to attract and retain top talent? Marketing to Consumers Course Goals • Conduct appropriate analysis of marketing data • Increase familiarity with Excel • Build useful tools to solve business problems QUESTIONS? Organizing Data • Data are often organized into a data table “Cases” “Records” “Observations” Variables Linking Data with a Relational Database Variable Types Categorical Quantitative No natural numerical meaning Natural numerical meaning May appear in a data table as a number Already a number Arithmetic makes no sense Some arithmetic makes sense Examples? Has an appropriate unit Examples? TV Advertising • Suppose the viewer shares for one hour of television are as follows: ABC CBS CW FOX NBC Other TV 4% 13% 5% 8% 7% 63% ABC CBS CW FOX NBC Other TV Demo Group 1 7% 8% 8% 4% 8% 65% Demo Group 2 1% 18% 2% 12% 6% 61% Overall 4% 13% 5% 8% 7% 63% Overall • Is this sufficient? The Motion Picture Industry • 2.4MM workers • Contributes $180B to the economy • Community as a whole pays more than $15B in state and federal taxes Virtual Stock Market for Movies Examining Categorical Variables in the Motion Picture Industry • What kind of movies do studios produce and which succeed? • Movie data compiled for 2001-2005 – Title – Adaptation of (graphic) novel – Basis in other media (e.g., TV show, video game, spin-off from another movie) – Tickets sold – Gross revenue – Production budget – Marketing budget – Release date – Genre – Studio – MPAA ratings Methods to Examine Categorical Data • Frequency Tables • Pie Charts • Bar/column charts • Contingency tables (cross-tabs) • Side-by-side and segmented bar charts Frequency and Relative Frequency Tables Movie Studio Universal 20th Century Fox Warner Bros. Buena Vista Sony Pictures Lionsgate Paramount Pictures Sony Pictures Classics New Line Dreamworks SKG Miramax Miramax/Dimension Focus Features Fox Searchlight Other TOTAL Movies Released in 2005 18 17 17 15 15 13 12 10 9 7 7 5 5 5 44 199 % of Releases 9.05% 8.54% 8.54% 7.54% 7.54% 6.53% 6.03% 5.03% 4.52% 3.52% 3.52% 2.51% 2.51% 2.51% 22.11% 100% How Many Movies Do Studios Release? • From a frequency table, we can generate the cumulative distribution Visualizing Data: Bar Charts • It’s often easier to look at a bar chart than at a frequency table Visualizing Data: Pie Charts • Differences compared to bar chart and cumulative distribution? Which is more useful? Caveats About Bar and Pie Charts • These figures are only appropriate if observations fall into only one of the categories – Pie charts should add to 100% – These visual representations focus on a single categorical variables; can be generalized to analyze combinations Examining Relationships Among Categorical Variables • Contingency tables (cross-tabs) let us examine patterns among multiple categorical variables • Do studios release the same types of movies? – Studio and genre – Studio and rating Studio and MPAA Ratings • There are 351 movies released by these 4 studios in our data • Of the 351, 23 were rated G (marginal distribution) • In our data, 20 G-rated movies were released by Buena Vista (individual cell) Studio and MPAA Ratings • Contingency tables can be formatted to show what fraction each cell is of the total Studio and MPAA Ratings • We want to know the ratings of movies that different studios produce • Can format the contingency table to display what proportion of each studio’s movies have different ratings Studio and MPAA Ratings • Which studios make the most movies of different MPAA ratings? • The contingency table can show us what percentage of R rated movies are made by each studio (conditional distribution) Displaying Conditional Distributions Displaying Conditional Distributions • Can also display as a segmented bar chart • If the conditional distribution is the same across different categories, the two variables are said to be independent Recap of Techniques • Frequency tables, bar chart, pie chart – Useful for looking at the distribution of a single categorical variable • Contingency tables, side-by-side and segmented bar charts – Useful for examining potential relationships among categorical variables • Present your results in a way that is consistent with what you need to know Analyzing Quantitative Variables • Numerical methods – Descriptive statistics – Correlation • Visual methods – Histograms – Box plots – Time series plots – Scatterplots “Risk” and “Return” • Investment performance • Customer valuation • Product demand • Employee performance Returns at Verizon • The histograms show the frequency with which different returns occur Steps in Constructing Histograms • Decide the width of each “bin” – This will depend on the range you observe in the data • Determine how many observations fall into each bin • Decide which bin observations on the border fall into – Typically assigned to the higher bin Common Descriptive Statistics for Quantitative Data • Measures of central tendency – Mean – the average value – Median – the “middle” value • Measures of dispersion – Standard deviation – Variance – Range – Inter-quartile Range (IQR) Excel Formulas for Measuring Central Tendencies • Mean: =average(data_range) N ∑y i y= i =1 N • Median: =median(data_range) – Unlike the mean, the median is not sensitive to extreme values Examining a Customer Portfolio • From an analysis of subscribers to a large US telecom provider • Mean CLV ~ $1200; Median CLV ~ $800 – A distribution that tails of to the right is right skewed – Who do you go after? Excel Formulas for Measuring Dispersion • Variance: =var(data_range) N s2 = 2 ( y − y ) ∑ i i =1 N −1 • Standard deviation: =stdev(data_range) N ∑ ( y − y) i s= i =1 N −1 2 Excel Formulas for Measuring Dispersion • Range = largest value – smallest value =max(data_range)-min(data_range) • IQR = range containing the middle 50% of the data =percentile(data_range,.75)-percentile(data_range,.25) Dealing with Outliers • Outliers are observations that stand apart from the majority of observations – Can heavily influence our analysis and conclusions – Might be errors – Should be noted in any conclusions drawn from the data Temporal Data • Time series plots can be used to see temporal patterns in the data Stock Performance Time Series and Forecasting Sales Trend Cycle Time Assessing Relationships between Quantitative Variables • Scatterplots let us examine the association between two variables – Consider the direction, form, and extent of dispersion • Daily returns of Verizon vs. S&P 500 in 2010 Assessing Relationships between Quantitative Variables • The correlation assesses the strength of the linear relationship between two quantitative variables =correl(data_X,data_Y) 1 × Σ( X − X )(Y − Y ) Cov( X , Y ) n −1 r= = σ ( X ) × σ (Y ) 1 1 2 2 Σ( X − X ) × Σ(Y − Y ) n −1 n −1 i i i i Correlation Analysis in Finance • Correlation matrix of daily returns for Verizon, Comcast, and AT&T Some Examples of Correlation r = +0.4 r = +0.9 Y Y Y r = -0.7 X X X Correlation can be misleading. Beware of... r = 0.8 r=0 Y Y outliers! non-linearities! X X Recap of Techniques • Examining individual variables – Descriptive statistics – Histograms – Time series plots • Examining potential relationships among multiple variables – Correlation (quantitative vs. quantitative data) – Scatterplots (quantitative vs. quantitative data)