The Legislative Agenda in Argentina. A Latent Dirichlet Allocation Application. Constanza F. Schibber Washington University in St. Louis This version: November 4, 2014 Classification of Bills. A legislator’s agenda is composed of the legislation she introduces. A number of studies rely on the bills introduced by legislators to measure their policy agenda. TaylorRobinson and David (2002) and Crisp et al. (2004) classify legislation introduced according to its geographic focus to show that personal vote-seeking incentives are more likely to introduce bills focusing on the electoral district rather than engage in national policies. To do this they rely on detailed coding rules and classify initiatives by looking at the title of the bills. (Gamm and Kousser, 2010), focusing on the legislation introduced in U.S. subnational legislatures, also classify the legislation according to whether it focuses on a broad issue or on particularistic issues. They achieve this by considering bills tailored to the district of the legislator as particularistic and policies that are not tailored to address the district’s concerns as broad. Other studies have only focused on particularistic legislation. For instance, Chasquetti and Micozzi (2014) and Micozzi (2013) classify bills sponsored by legislators in Uruguay and Argentina as targeting a district when the title mentions the name of the subnational unit the sponsor belongs to. They proceed to create a measure of how many targeted-bills a legislator sponsors during their term in office without considering the rest of the legislation introduced by the legislator. The Data. I rely on the text of the legislation introduced by legislators. More specifically, the text in which a sponsor explicitly says why she is presenting the bill. This text must be attached to each bill introduced in Congress and it allows legislators to signal to their supporters the topic of the proposal. As a result, by using this specific text we can tap into legislators’ motivations when drafting the proposal. I collected the text of all legislative initiatives introduced between 2000 and 2011 from the website for the lower chamber of the Congress of Argentina, along with individual level information about the primary sponsor – name, political party, electoral district, mandate. The total number of bills is 12,073 and, on average, each legislator introduced 21 bills. All legislators that served during this time period introduced at least one bill. The Model. In Latent Dirichlet Allocation (LDA), each document is viewed as a mixture of several topics Blei, Ng and Jordan (2003). Hence, the advantage of LDA is that it is not necessary to know a priori the topics/categories present in the documents.1 For instance, a public health bill might have words drawn from a topic related to seasons such as winter and words drawn from a topic related to illnesses, such as flu. This core assumption is that words w in documents are generated by a mixture of latent topics z, thus, the probability of observing a word depends on the probability that the word given the topic and the probability of the topic. Formally, X P (w) = P (w|z)P (z) z Furthermore, LDA follows the bag of words model, hence, making no assumptions about the order of the words in the documents. An LDA model consists of three levels: corpus, document, word. Following the notation from Blei, Ng and Jordan (2003), who initially proposed the LDA model, let’s assume K topics and each topic βk is drawn from a Dirichlet distribution. A draw from a k dimensional Dirichlet returns a k dimensional multinomial, θd in this case, where the k values must sum to one. Given this, the LDA process, for each document in a collection of document, is as follows, 1. Draw a topic distribution θd ∼ Dir(α), where Dir(·) is a draw from a uniform Dirichlet distribution with scaling parameter α. 2. For each word in the document: (a) Draw a specific topic zd,n ∼ M ulti(θd ) where M ulti(·) is a multinomial (b) Draw a word wd,n ∼ βzd,n The parameters α and β are are corpus level parameters. Then, θd,n is a document level variables sampled once per document. Last, zd,n and wd,n are word-level variables and are sampled once for each word in each document. Because the topic distribution is sampled repeatedly within a document, it allows documents to be associated with multiple topics rather than just one. The posterior distribution from this model is intractable so it cannot be computed directly; thus, I use a Variational Bayes (VB) optimization that has been shown to be as reliable as the partially collapsed Gibbs sampler but faster (Blei, Ng and Jordan, 2003). 1 In Political Science, (Quinn et al., 2010) and (Grimmer, 2010) have used variations of this model to study debates in congress and legislators’ press releases, for instance. 1 50 Topics. Remember that each documents is viewed as a mixture of several topics. In other words, given the 50 topics uncovered by the model, a document has a probability of containing each of these topics. The following two tables presenting the key words by topic. The first table includes the top words translated into English and the second table presents a original table in Spanish. I assigned the label for the topic based on top words and I also drew random samples of bills for which the probability that they included the topic was highest and read the text. I classify 8 topics as referring to credit-claiming activities – highlighted in blue in the tables - as they are particularism bills . These refer to the topics labeled: Public Works, Local Resources, Emergency Resources, Tourism, Provinces, Local-Historical Issues, Patagonia, Customs & Tariffs. Table 1: Top Words by Topic (in English) and Labels. Part 1. Top Words route works, road, roads, jumps, transit construction, roads, province, construction executive, provinces, resources, domain subsection, and constitutional problems, tips budget, emergency, finance, executive resources approval, maria, background, exposed tourism, student, runner, Bariloche, tourist guide, activity, montana, activities provinces, province, region, state, resources federal, provincial, san, santa, population city, county, san, aires, history, culture historical, Argentina, village province, southern islands, antarctica, fire Antarctic, land, Atlantic, Ushuaia, Patagonia insurance, customs, contract, insurance, insurance customs, duties, insured, civil Label Public Works Local Resources Emergency Resources Tourism Provincial Issues Local Issues Patagonian Issues Customs & Tariffs 2 Table 2: Top Words by Topic (in English) and Labels. Part 2. Top Words data, health, information, medical, organs, medicine, patient, donation, medical, medical workers, regimen, contributions, benefits, pensions, allowances, benefits, age, pension, beneficiaries weapons, disability, internet, fire, gun, access, disabled, states, approval, holding consumer, consumers, medicine, defense, consumer, users, health products, user information accidents, traffic, car, vehicle, alcohol, life, transport cases, drivers, identity politics, countries, world, life, water, fisheries, population, people, human, resources federal, control, city air, government, public, information, management, public, functions snuff, smoking, cigarettes, smokers, mining, consumer countries, advertising, tobacco, state agreement, international, parts, measurements, organization, countries, employment provisions, application, information criminal offenses, crime, violence, juvenile code, children, victims, freedom, type justice, judicial, process, judge, case, judgment, principle, procedure, rule, action women, woman, church, men, life, men, christ, gender, equality civil, code, time, responsibility, application, property agreement, property, cases, activities flag, air, blue, colors, Belgrano, provinces, white, blue, white debt, dollars, million, debt, government, foreign coins, financial, investment executive, congress, need, decrees, camera, laws, legislation, sanction, subsection, emergency workers, worker, labor, contract, employer, employment hours, relation, occupation, conditions tax funds, fiscal, financial, expenses, taxes, profits, income, resources, budget human, convention, family, child, protection discrimination, parents, wife, international health, life, prevention, case, disease, population, public, risk, action, diseases education, society, political, social, programs, resources, young, life, needs, training diversity, parts, biological, resource arbitration, contracting, variety, shelter, breeder, arbitration 3 Label Public Health Social Security Disabilities Health Insurance Traffic and Driving Fishing and Water Resources Bureaucracy Smoking Labor Legislation Crime Judicial Procedures Christian Values Civil Law Patriotism Public Debt Legislative Process Labor Conditions Taxes Children & Family Illnesses Education Arbitration Table 3: Programmatic Bills. Top Words by Topic (in English) and Labels. Part 3. Top Words transport, air, activity, aerial, aircraft, civil, aviation, passenger, flight, aviation women, marriage, marital, discrimination, equality, union, children, spouses, family, code sexual, energy, harassment, conduct harassment, victim, code, occupation, cases societies, society, business, sport, soccer entities, associations, broadcasting, end, tv parties, politicians, political, electoral, political, democracy, representation, candidates, participation, advice cooperatives, cooperative, music, theater, culture, property, business, identity, associated, owned industry, production, market, business, activity, gas products, economy, industry, countries waste, environmental, environment, treatment, disposal hazardous, minimum, final, budgets, authority media freedom, communication, information, press, expression environment, response, diffusion, activity address, necessary, consortia, consortium, horizontal, property evaluation, administrator court, justice, supreme court, law, constitutional, human, legal, federal, and international food products, production, producers, agriculture, food quality, livestock, market, tons college, education, universities, higher level, studies, training, population, students, professional bank, credit, payment, housing, credit, institutions debts, financial finance obligations energy, environment, natural, production, environmental, environment, species, water areas corruption, criminal, legal, society, justice, transparency, legal, institutions, events, economic wood, steel, beverages, steel, engine, hp, alcohol, drawers, alcoholic, table federal, court, justice, court, creation, customer, causes, competition, federal defender users, companies, prices, public, grant, contract, contracts, construction, prices, company turismo, paises, comercio, actividad, internacional, exterior, sector, millones, economia 4 Label Aviation Marriage & Civil Union Sexual Harassment Sports Elections Cooperatives Industry and Business Environment Freedom of Expression Property Judicial System 1 Agriculture & Livestock Higher Education Banking System Energy Transparency Alcoholic Beverages Judicial System 2 Public Utilities International Economics Table 4: Top Words by Topic (in Spanish) and Labels Top Words datos , salud , informacion , medico , organos , medicina , paciente , donacion , medica , medicos trabajadores , regimen , aportes , beneficio , previsional , prestaciones , beneficios , edad , pensiones , beneficiarios armas , discapacidad , internet , fuego , arma , acceso , discapacitados , estados , aprobacion , tenencia consumidor , consumidores , medicamentos , defensa , consumo , usuarios , salud , productos , usuario , informacion accidentes , transito , vehiculos , vehiculo , alcohol , vida , transportes , casos , conductores , identidad politica , paises , mundo , vida , agua , pesca , poblacion , pueblos , humano , recursos federal , control , ciudad , aires , gobierno , publica , informacion , administracion , publico , funciones tabaco , fumar , cigarrillos , fumadores , minera , consumo , paises , publicidad , tabaquismo , estados convenio , internacional , partes , medidas , organizacion , paises , empleo , disposiciones , aplicacion , informacion penal , delitos , delito , violencia , menores , codigo , ninos , victimas , libertad , tipo justicia , judicial , proceso , juez , casos , juicio , principio , procedimiento , norma , accion mujeres , mujer , iglesia , hombres , vida , varones , cristo , genero , igualdad civil , codigo , tiempo , responsabilidad , aplicacion , propiedad , acuerdo , bienes , casos , actividad bandera , aires , azul , colores , belgrano , provincias , blanco , celeste , blanca deuda , dolares , millones , deudas , gobierno , externa , pesos , financiero , inversiones ruta , obras , vial , rutas , salta , transito , obra , caminos , provincia , construccion ejecutivo , congreso , necesidad , decretos , camara , leyes , legislativo , sancion , inciso , urgencia trabajadores , trabajador , laboral , contrato , empleador , laborales , horas , relacion , empleo , condiciones impuesto , fondos , fiscal , financiamiento , gastos , impuestos , ganancias , ingresos , recursos , presupuesto humanos , convencion , familia , nino , proteccion , discriminacion , padres , mujer , internacional , ninos salud , vida , prevencion , casos , enfermedad , poblacion , publica , riesgo , accion , enfermedades educacion , sociedad , politicas , sociales , programas , recursos , jovenes , vida , necesidades , formacion diversidad , partes , biologica , recursos , arbitral , contratante , variedad , vivienda , obtentor , arbitraje transporte , aereo , actividad , aerea , aeronaves , civil , aviacion , pasajeros , vuelo , aeronautica mujer , matrimonio , civil , discriminacion , igualdad , union , hijos , conyuges , familia , codigo ejecutivo , provincias , recursos , dominio , inciso , materia , constitucional , consejo presupuesto , emergencia , hacienda , ejecutivo , recursos , aprobacion , maria , fondo , expuesto , alfredo sexual , energia , acoso , conducta , hostigamiento , victima , codigo , empleo , casos sociedades , sociedad , actividad , deporte , futbol , entidades , asociaciones , radiodifusion , fines , television partidos , politicos , politica , electoral , politico , democracia , representacion , candidatos , participacion , consejo cooperativas , cooperativa , musica , teatro , cultura , propiedad , empresa , identidad , asociados , participada sector , produccion , mercado , empresas , actividad , gas , productos , economia , industria , paises turismo , estudiantil , corredor , bariloche , turistico , guias , actividad , montana , actividades residuos , ambiental , ambiente , tratamiento , disposicion , peligrosos , minimos , final , presupuestos , autoridad provincias , provincia , region , provinciales , recursos , federal , provincial , san , santa , poblacion medios , libertad , comunicacion , informacion , prensa , expresion , medio , respuesta , difusion , actividad domicilio , necesario , consorcios , consorcio , horizontal , propiedad , evaluacion , administrador corte , justicia , suprema , tribunal , leyes , constitucional , humanos , judicial , federal , internacional ciudad , provincia , san , aires , historia , juan , cultura , historico , argentino , pueblo provincia , sur , islas , antartida , fuego , antartico , tierra , atlantico , ushuaia , provincial alimentos , productos , produccion , productores , agricultura , alimentaria , calidad , ganaderia , mercado , toneladas universidad , educacion , universidades , nivel , superior , estudios , formacion , poblacion , estudiantes , profesional banco , credito , pago , vivienda , creditos , entidades , deudas , financiero , finanzas , obligaciones seguros , aduanero , contrato , seguro , asegurador , aduana , tasas , asegurado , civil energia , ambiente , recursos , naturales , produccion , ambiental , medio , especies , agua , areas corrupcion , penal , juridica , sociedad , justicia , transparencia , juridicas , instituciones , actos , economico madera , acero , bebidas , inoxidable , motor , hp , alcohol , cajones , alcoholicas , mesa federal , juzgado , justicia , juzgados , creacion , cliente , causas , competencia , federales , defensor usuarios , empresas , tarifas , publicos , concesion , contratos , contrato , obras , precios , empresa 5 Label Public Health Social Security Disabilities Health Insurance Traffic and Driving Fishing and Water Resources Bureaucracy Smoking Labor Legislation Crime Judicial Procedures Christian Values Civil Law Patriotism Public Debt Public Work Legislative Process Labor Conditions Taxes Family and Children Illnesses Education Housing Aeronautical Transportation Family Local Resources Emergency Resources Sexual Harassment Sports Elections Cooperatives Industry and Business Tourism Environment Provinces Freedom of Expression Property Judicial System 1 Local - historical issues Patagonia Agriculture & Livestock Higher Education Banking System Customs and Tariffs Energy Transparency Alcohol Judicial System 2 Public Utilities One Topic by Bill. In order to assign one topic to each bill, I search, for each initiative, for the highest probability of belonging to a topic and assign that topic to the bill.2 Figure 1 shows the frequency distribution of topics. The most common topic is ‘Economic Activity’ (7.2% of the bills), which includes initiatives referring to the production of goods, companies, and industries. Other frequent topics are ‘Education’ (5.8%) and ‘Judicial Procedures’ (5.6%). The least frequent topics are ‘Sexual Harassment’ (0.17%) and ‘Transparency’ (0.23%). 300 200 0 100 Frequency 400 500 Figure 1: Frequency of Topics for 16,644 initiatives in the Lower Chamber of Argentina. 0 10 20 30 40 50 Topics 2 I also implemented a hierarchical clustering algorithm and a K-Means algorithms, but both drew unsatisfactory results. Comparing the topic assigned to the bill and the title of the bill resulted in many inconsistencies. 6 References Blei, David M, Andrew Y Ng and Michael I Jordan. 2003. “Latent dirichlet allocation.” the Journal of machine Learning research 3:993–1022. Carey, J.M. 2007. “Competing principals, political institutions, and party unity in legislative voting.” American Journal of Political Science pp. 92–107. Chasquetti, Daniel and Juan Pablo Micozzi. 2014. “The Subnational Connection in Unitary Regimes: Progressive Ambition and Legislative Behavior in Uruguay.” Legislative Studies Quarterly 39(1):87–112. Crisp, Brian F., Maria C. Escobar-Lemmon, Bradford S. Jones, Mark P. Jones and Michelle M. Taylor-Robinson. 2004. “Vote-Seeking Incentives and Legislative Representation in Six Presidential Democracies.” The Journal of Politics 66(03):823–846. Gamm, Gerald and Thad Kousser. 2010. “Broad bills or particularistic policy? Historical patterns in American state legislatures.” American Political Science Review 104(01):151–170. Grimmer, Justin. 2010. “A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases.” Political Analysis 18(1):1–35. King, Gary, Michael Tomz and Jason Wittenberg. 2000. “Making the most of statistical analyses: Improving interpretation and presentation.” American journal of political science pp. 347–361. Levendusky, Matthew S and Jeremy C Pope. 2010. “Measuring Aggregate-Level Ideological Heterogeneity.” Legislative Studies Quarterly 35(2):259–282. Micozzi, Juan Pablo. 2013. “Does Electoral Accountability Make a Difference? Direct Elections, Career Ambition, and Legislative Performance in the Argentine Senate.” The Journal of Politics 75(01):137–149. Quinn, Kevin M, Burt L Monroe, Michael Colaresi, Michael H Crespin and Dragomir R Radev. 2010. “How to analyze political attention with minimal assumptions and costs.” American Journal of Political Science 54(1):209–228. Taylor-Robinson, M.M. and SJ David. 2002. “Who Participates and Who Is Seen and Not Heard? Evidence From the Honduran Congress.” Journal of Legislative Studies 8(1):10–36. Tsai, Tsunghan and Jeff Gill. 2013. “Interactions in Generalized Linear Models: Theoretical Issues and and an Application to Personal Vote-Earning Attributes.” Social Science 2:91–113. 7